Ubuntu Tutorial: Ollama With AMD GPUs

0 - Introduction

Ollama is a powerful tool for running and managing AI models locally, but setting it up to leverage an AMD GPU on Ubuntu requires a few extra steps. Unlike NVIDIA GPUs, which have well-established CUDA support, AMD relies on ROCm (Radeon Open Compute) to enable GPU acceleration. In this guide, we’ll walk you through the process of configuring Ollama to take advantage of your AMD GPU, ensuring optimal performance for running AI models fast and efficiently.

To follow this article you will need to have Docker installed and ready to use on Ubuntu 24, to learn how to do it, check out this article. If you have a Nvidia GPU, you can learn how to setup ollama here.

1 - Setup AMD Drivers

Let’s start by running ‘apt update’ and then downloading the linux headers and python3, we will use these to run the ‘amdgpu-install’ script.

sudo apt update
sudo apt install "linux-headers-$(uname -r)" "linux-modules-extra-$(uname -r)"
sudo apt install python3-setuptools python3-wheel

Now we will use the ‘amdgpu-install’ package to help us setup the AMD drivers. You can get the commands to download and install the package at AMD ROCM’s docummentation, they should look something like the the ones below:

sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.3.2/ubuntu/noble/amdgpu-install_6.3.60302-1_all.deb
sudo apt install ./amdgpu-install_6.3.60302-1_all.deb

Now we need to add our user to the render and video groups to give us permission to use the GPU. After that we also need to install the ‘amdgpu-dkms’ and ‘rocm’ packages.

sudo apt update
sudo usermod -a -G render,video $LOGNAME # Add the current user to the render and video groups
sudo apt install amdgpu-dkms rocm

You should restart your computer now. After that we will ‘radeontop’, like ‘nvtop’ and ‘htop’, this is a command line tool that acts sort of like the task manager in windows, but only for your AMD GPU, it will show us the usage and ammount of VRAM being used.

sudo apt install radeontop

In a terminal run ‘radeontop’ to be able to check the GPU usage and, in another one, run the following container with docker. If this test succeeds, you will know that docker containers can access and use your GPU.

docker run --rm --privileged --device=/dev/kfd --device=/dev/dri rocm/tensorflow:latest rocminfo

Example output

server@ryzen3600x2:~/docker$ docker run --rm --privileged --device=/dev/kfd --device=/dev/dri rocm/tensorflow:latest rocminfo
ROCk module version 6.10.5 is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 5 3600 6-Core Processor
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 5 3600 6-Core Processor
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  ASIC Revision:           0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   3600
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            12
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Memory Properties:
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    41009704(0x271c228) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    41009704(0x271c228) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    41009704(0x271c228) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 4
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    41009704(0x271c228) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx1030
  Uuid:                    GPU-d92fc7f915dcc92a
  Marketing Name:          AMD Radeon RX 6800
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
  Chip ID:                 29631(0x73bf)
  ASIC Revision:           1(0x1)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2475
  BDFID:                   2304
  Internal Node ID:        1
  Compute Unit:            60
  SIMDs per CU:            2
  Shader Engines:          4
  Shader Arrs. per Eng.:   2
  WatchPts on Addr. Ranges:4
  Coherent Host Access:    FALSE
  Memory Properties:
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          32(0x20)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        32(0x20)
  Max Work-item Per CU:    1024(0x400)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Packet Processor uCode:: 120
  SDMA engine uCode::      83
  IOMMU Support::          None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16760832(0xffc000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Recommended Granule:2048KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 3
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Recommended Granule:0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx1030
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

2 - Setup Ollama and Open-WebUI

To create the docker compose file for our apps we need to get the ID of the ‘render’ group, you can use the command below to get it.

echo $(getent group render | cut -d: -f3)

Then create a folder for our app’s data and, in it, a file named ‘docker-compose.yml’.

mkdir ollama
cd ollama
nano docker-compose.yml

Paste the following configuration in the ‘docker-compose.yml’ file, but on the ‘group_add’ section, switch 992 with your ‘render’ group’s id:

services:
  open-webui:
    image: 'ghcr.io/open-webui/open-webui:main'
    restart: always
    container_name: open-webui
    volumes:
      - './data/ui:/app/backend/data'
    ports:
      - '3000:8080'
  ollama:
    image: 'ollama/ollama:rocm'
    restart: always
    container_name: ollama
    privileged: true
    volumes:
      - './data/ollama:/root/.ollama'
    ports:
      - '11434:11434'
    group_add:
      - "${RENDER_GROUP_ID:-992}"
      - "video"
    devices:
      - "/dev/kfd"
      - "/dev/dri"

After saving the file, we can now start our containers. Run the following command and wait until the containers start:

docker compose up -d

3 - First use of Open-WebUI

Once your containers are up and running, you can access your WebUI at:

http://yourip:3000

On your browser you should now see the following page, click on ‘Get started’:

Type your name, email and password and click on ‘Create Admin Account’:

Now you are ready to use Open-WebUI!

To test our GPU, let’s download the DeepSeek R1 model. Click on your icon and go to ‘Admin Panel’.

Go to ‘Settings’ and then ‘Models’, in this menu click on the download button at the top right:

On the Manage Models menu, input the model name from ollama and hit the download button:

Because of the size of some models, you might need to wait a while for it to download.

Back to the home page, you can now start a chat with the DeepSeek R1 model!

With a tool like ‘radeontop’, check if your GPUs are being utilized correctly:

If your GPUs are not being used, there could be some causes but the main one might be not having enough VRAM, to fix this you can try to use a smaller model, the second main issue is either drivers not installed correctly or an error in the ‘docker-compose.yml’ file.

4 - Tests w/DeepSeek-R1:14b

I will use the system below to test the 14b version of ‘DeepSeek-R1’. The tests will be very simple and just an example for you to check.

Greeting

Took approximatly 8 seconds to get a reply at 18.88 tokens per second.

Prompt

Hello, how are you?

Reply

Hello! I’m just a virtual assistant, so I don’t have feelings, but I’m here and ready to help you with whatever you need. How are you doing today? 😊

Tell me a story

Took approximatly 41 seconds to get a reply at 22.22 tokens per second.

Prompt

Tell me a small story.

Reply

Thoughts

Okay, the user asked for a small story. They didn’t specify any particular genre or theme, so I can go with something general and heartwarming.

I want to create relatable characters and a simple plot that conveys a positive message. Maybe a little girl who loves drawing and finds courage through her art.

Setting it in a village could add a nice touch, making the story feel cozy and grounded. Including elements like encouragement from family will make it uplifting.

I should introduce conflict—maybe the girl is shy but wants to join the school play. Through her drawings and support from her grandmother, she gains confidence.

The resolution can be heartwarming, showing her success in the play, which ties back to the theme of overcoming fears with love and support.

Keeping the language simple and the story concise will make it suitable for a wide audience.

Once upon a time, in a cozy village surrounded by rolling hills, there lived a little girl named Lily. She had big eyes that sparkled like stars and hair as golden as wheat. But despite her beauty, Lily was terribly shy. She would hide behind her apron whenever someone tried to talk to her.

One day, while walking home from school, Lily found a small notebook lying on the path. Inside, she discovered it was filled with sketches of flowers, trees, and animals—drawings so detailed and lifelike they seemed almost magical. Intrigued, she took it home and showed it to her grandmother.

“Who do you think drew these?” Lily asked, her eyes wide with admiration.

Her grandmother smiled. “It must have been someone with a special love for nature,” she said. “Perhaps the owner left it behind by mistake.”

Lily felt a strange connection to the notebook. She began carrying it everywhere, sketching everything she saw—a bird perched on a branch, a frog hopping across the path, even the kind-faced old woman selling bread at the market.

Days turned into weeks, and Lily’s sketches became more confident. But one afternoon, as she was drawing a butterfly in her notebook, she noticed someone watching her. It was Mr. Thompson, the schoolmaster. He squinted at her drawings, his expression softening with each glance.

“Miss Lily,” he said finally, “those are quite remarkable. You have a gift.”

Lily blushed and tucked the notebook away, but his words stayed with her. That night, she showed her sketches to her grandmother again, who beamed with pride.

“Why don’t you share your talent with others?” her grandmother suggested. “Perhaps you could draw for the school play next week.”

The thought made Lily’s heart race. The play was about a brave little girl who saves her village. It felt like more than just a story—it was calling to her, urging her to step out of the shadows.

With her grandmother’s encouragement, Lily began practicing her lines and preparing her drawings for the play. Each day, she drew the characters, imagining their expressions and movements. The closer she got to the performance, the more confident she felt.

On opening night, the village hall was filled with people. Lily stood backstage, her heart pounding, but when she looked at Mr. Thompson, who gave her a reassuring nod, she took a deep breath and stepped onto the stage.

As she acted and drew live on a large canvas, the audience watched in awe. Her shyness melted away, replaced by a sense of joy and purpose. When the curtain fell, there was a standing ovation.

Lily felt tears well up in her eyes. She had found her voice—not just through words, but through her art. And though she would always cherish the quiet moments with her notebook, she knew she didn’t have to hide anymore.

Solve a spatial problem

Took 19 seconds to get a reply at 21.36 tokens per second.

Prompt

Every day from 2PM to 4PM the household cat, Tobias, is in the window. From 2 until 3, Tobias is looking at birds. For the next half hour, Tobias is sleeping. On the final half hour, Tobias is cleaning himself. The time is 3:14PM, where and what is Tobias doing.

Reply

Thoughts

Okay, so I need to figure out where Tobias the cat is and what he’s doing at 3:14 PM. Let me start by breaking down the information given.

First, Tobias is in the window every day from 2 PM to 4 PM. That means between those times, he’s there. Now, within that window, his activities change every half hour. From 2 until 3 PM, which is the first hour, he’s looking at birds. Then, for the next half hour (which would be from 3 to 3:30 PM), he’s sleeping. After that, for another half hour (from 3:30 to 4 PM), he’s cleaning himself.

Wait, let me make sure I’m understanding this correctly. The time is divided into three segments:

From 2 PM to 3 PM: Looking at birds.
From 3 PM to 3:30 PM: Sleeping.
From 3:30 PM to 4 PM: Cleaning.

So, if it’s currently 3:14 PM, that falls into the second half-hour segment, which is from 3 PM to 3:30 PM. Therefore, at 3:14 PM, Tobias should be sleeping.

But wait a minute, let me double-check the timeline:

From 2:00 to 3:00 PM: Birds.
Then, 3:00 to 3:30 PM: Sleeping.
Next, 3:30 to 4:00 PM: Cleaning.

Yes, that’s correct. So at 3:14 PM, which is between 3 and 3:30, he’s sleeping in the window.

I think I got it right the first time. He’s in the window during all those times, just doing different activities each half-hour.

At 3:14 PM, Tobias is in the window and currently sleeping.

Solve a math equation

Due to the use of formulas in the reply, this reply will be in an image format. It took DeepSeek about 23 seconds at 22.98 tokens per second to come up with the solution.

Prompt

Solve 3x^2-5x+2=0

Reply

Thoughts

Photomath gave me the same solution:

And that’s all for this article. Thanks for reading and stay tuned for more tech insights and tutorials. Until next time, and keep exploring the world of tech!