Fixing RTX 5090 GPU loss during Ollama inference (Blackwell + Linux)

If you landed here, chances are your RTX 5090 was working fine for text inference, then the moment you threw an image at Ollama, the GPU disappeared. nvidia-smi stopped finding the card, Ollama started retrying runners in a loop, and the only fix was a hard reboot. I went through that exact experience on my HP OMEN 45L (Intel Core Ultra 9 285K, 64 GB DDR5, RTX 5090 32 GB) and it took a while to fully resolve. This post documents everything – root cause, symptoms, and step-by-step resolution.

*** UPDATED March 8, 2026: Added the GSP firmware disable step, which turned out to be the final piece of the fix. Driver upgrade alone (570 → 580) was necessary but not sufficient. Both steps are required.

Background

NVIDIA’s Blackwell architecture (GB202 chip – RTX 5090, RTX 5080, RTX Pro 6000 Blackwell) introduced a component called GSP (GPU System Processor), a dedicated firmware processor on the GPU itself responsible for handling certain driver functions off the CPU. On Linux, GSP firmware can hang under sustained inference load – and when it does, the driver declares the GPU lost from the bus. There is no graceful recovery without a reboot.

Additionally, Blackwell GPUs on Linux require the open kernel module variant of the NVIDIA driver. The proprietary (closed) driver simply does not support GB202. If you installed the driver without the -open suffix, nvidia-smi will say “No devices were found” and the kernel log will say:

NVRM: This PCI I/O region assigned to your NVIDIA device is invalid: NVRM: requires use of the NVIDIA open kernel modules

These two issues together – wrong driver variant and GSP firmware instability – are what cause the RTX 5090 to vanish mid-inference on Ubuntu.

How to confirm you hit this problem

Run this command to inspect the previous boot’s kernel messages:

sudo journalctl -k -b -1 | grep -E "nvidia|GSP|FSP|kfsp|GPU"

You are looking for this sequence of events:

kfspWaitForResponse: FSP command timed outGPU lost from the bus [NV_ERR_GPU_IS_LOST] (0x0000000F) kfspCleanupBootState_IMPL: Clock boost disablement via FSP failed … RmInitAdapter failed! (0x26:0x56:1472)

The first line is the GSP firmware hanging. The second is the driver giving up on the card. RmInitAdapter failed appears about 12 minutes later when the driver tries and fails to recover – this is the point where only a reboot helps.

On the Ollama side, look for these patterns in the container logs (docker logs ollama):

ggml_cuda_init: failed to initialize CUDA: unknown error offloaded 0/33 layers to GPU <– fell back to CPU silently llama runner terminated error=exit status 2 starting runner… (repeated on many ports)

The repeated starting runner on sequential ports is Ollama trying to restart the process after each crash, making it look like a software bug when the real cause is the GPU being gone.

Steps to resolve

Important: Make a note of your current working state before proceeding. The driver cleanup step removes all NVIDIA packages and will leave you with no GPU until the reinstall is complete. Do not do this remotely unless you have console access.

– Step 1: Remove the existing NVIDIA driver completely

If you installed the driver via a .run file or via apt, you need to do a full cleanup first. Mixing install methods leaves broken state.

sudo apt-get remove --purge '^nvidia-.*' '^libnvidia-.*' '^cuda-drivers.*'
sudo apt-get autoremove --purge
sudo apt-get autoclean

# If you also installed via the .run installer at some point:
sudo nvidia-uninstall

sudo reboot

– Step 2: Install driver 580-open (the correct variant for Blackwell)

Driver 580 maps to CUDA runtime 13.0 and is the first driver series with solid Blackwell support. The -open suffix is mandatory for RTX 50-series GPUs.

sudo ubuntu-drivers install nvidia-driver-580-open

If the command reports “already installed” but nvidia-smi still shows driver 570 (or fails entirely), force a reinstall:

sudo apt-get install --reinstall nvidia-driver-580-open
sudo reboot

After reboot, verify:

nvidia-smi

Expected output:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        Off |   00000000:02:00.0 Off |                  N/A |
|  0%   69C    P1            434W /  575W |       0MiB /  32607MiB |      0%      Default |

Note on CUDA toolkit version: Your existing CUDA toolkit (e.g. 12.8) does not need to match the driver’s CUDA version (13.0). The driver is forward-compatible. Upgrade the toolkit only if you need APIs specific to CUDA 13.x.

– Step 3: Install nvidia-container-toolkit for Docker

After changing the driver, the container toolkit needs to be reinstalled and Docker reconfigured to use it. Skip this and your containers will not see the GPU.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
  | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
  | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
  | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

– Step 4: Disable GSP firmware (the critical fix)

This is the step most guides miss. Even with driver 580-open, the GSP firmware can still hang under sustained Blackwell inference load. Disabling it forces those responsibilities back onto the CPU-side driver, which is stable. The inference performance impact is negligible.

sudo bash -c 'echo "options nvidia NVreg_EnableGpuFirmware=0" \
  > /etc/modprobe.d/nvidia-gsp.conf'

sudo update-initramfs -u
sudo reboot

To verify the option is loaded after reboot:

cat /sys/module/nvidia/parameters/NVreg_EnableGpuFirmware

It should return 0.

If a future driver release fully resolves GSP stability on Blackwell and you want to re-enable it:

sudo rm /etc/modprobe.d/nvidia-gsp.conf
sudo update-initramfs -u
sudo reboot

– Step 5: Enable GPU persistence mode

Persistence mode keeps the GPU driver loaded even when no processes are using the GPU, eliminating re-initialization delays and reducing the chance of state issues between inference runs.

sudo nvidia-smi -pm 1

– Step 6: Run the Ollama container with flash attention disabled

Looking at the Ollama logs before the crash, flash attention was enabled right before the GPU was lost. Disabling it via environment variable is a safe workaround and has no meaningful impact on output quality.

docker run -d \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  --restart unless-stopped \
  -e OLLAMA_FLASH_ATTENTION=0 \
  ollama/ollama

Verifying the fix end to end

Run a quick sanity check from the host to confirm CUDA is working with PyTorch (if you have it installed):

python3 -c "
import torch
print('PyTorch:', torch.__version__)
print('CUDA available:', torch.cuda.is_available())
print('GPU:', torch.cuda.get_device_name(0))
print('VRAM (GB):', round(torch.cuda.get_device_properties(0).total_memory / 1e9, 1))
"

Expected output on a healthy system:

PyTorch: 2.8.0+cu128 CUDA available: True GPU: NVIDIA GeForce RTX 5090 VRAM (GB): 32.0

Then load an image model in Ollama and check that layers are offloaded to the GPU (not CPU):

llama_kv_cache_init: CUDA0 KV buffer size = 768.00 MiB offloaded 33/33 layers to GPU <– this is what you want to see

Common issues and quick fixes

Symptom	Likely cause	Fix
`nvidia-smi`: No devices found after install	Proprietary driver installed instead of `-open`	Remove and reinstall with `nvidia-driver-580-open`
Docker container sees no GPU (`--gpus all` fails)	`nvidia-container-toolkit` not configured after driver change	Re-run Steps 3
GPU drops mid-inference, comes back after reboot	GSP firmware hang	Step 4 – disable GSP firmware
Ollama shows `offloaded 0/N layers to GPU`	GPU already lost at time of model load	Reboot, apply all steps, re-run
`NVRM: requires use of the NVIDIA open kernel modules`	Wrong driver variant for Blackwell	Full driver removal + reinstall with `-open`
`RmInitAdapter failed` in kernel log	Driver failed to recover after GSP hang	Reboot required; apply GSP fix to prevent recurrence

Driver and CUDA version reference

All RTX 50-series (Blackwell) GPUs require the -open kernel module variant. The proprietary driver is not supported for this architecture.

Driver	CUDA Runtime	Notes
570.x	12.8	First Blackwell driver – GSP issues, avoid
575.x	12.9	Intermediate release
580.x	13.0	Recommended – best current Blackwell support

What about the kernel log noise you can ignore

After fixing the driver you will still see some messages in dmesg that look alarming but are harmless:

CIFS: VFS: Send error in SessSetup = -13 – This is an Azure Files / SMB mount authentication issue, completely unrelated to the GPU. If your Ollama models are on local storage (they should be), ignore it.
nouveau: DRM messages – The open-source Nouveau driver being disabled at boot. Normal.
PCIe bandwidth warnings on first load – Normal for Blackwell on certain motherboards during initialization.

My setup for reference

Component	Details
Machine	HP OMEN 45L GT22
CPU	Intel Core Ultra 9 285K
RAM	64 GB DDR5-5600
GPU	NVIDIA GeForce RTX 5090 32 GB GDDR7
PSU	1200 W 80 Plus Gold
OS	Ubuntu 24.04
Driver	580.126.09 (open kernel module)
CUDA toolkit	12.8
Ollama	v0.17.7 (Docker container)
PyTorch	2.8.0+cu128

I have received feedback from a few proof-readers who went through these steps and confirmed the GPU stays stable across extended image inference sessions. If you have a different Blackwell card (RTX 5080, RTX 5070 Ti) and these steps work (or don’t work) for you, please leave a comment. The more data points we have, the better.

The GSP firmware instability on Blackwell is a known open issue on Linux as of driver 580. I expect a proper fix to arrive in a future driver release. When that happens, Step 4 can be reversed by removing /etc/modprobe.d/nvidia-gsp.conf and rebuilding the initramfs.

*** UPDATE March 9, 2026: After running stable for several hours, the GPU crashed again with a different error pattern. This led to additional investigation and two more potential root causes documented below. Some of these may be specific to my HP OMEN 45L hardware, but I’m documenting them here as they may help others with similar setups.

If it crashes again after the GSP fix: Xid 79

After applying all five steps above, my system ran for several hours before crashing again – this time with a different kernel error. The previous crash started with a GSP firmware timeout. This one started with:

NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus. NVRM: GPU 0000:02:00.0: GPU has fallen off the bus. NVRM: _threadNodeCheckTimeout: API_GPU_ATTACHED_SANITY_CHECK failed! NVRM: Xid (PCI:0000:02:00): 154, GPU recovery action changed from 0x0 (None) to 0x2 (Node Reboot Required)

Xid 79 is defined in NVIDIA’s official Xid error catalog as a PCIe bus-level disconnect – the driver tried to communicate with the GPU over PCIe and got no response. This is a different failure mode from the GSP hang. The GSP fix resolves one problem; Xid 79 is another.

There are two likely causes for Xid 79 on the RTX 5090, both of which have documented community reports.

Possible cause A: 12VHPWR / 12V-2×6 connector power instability

The RTX 5090 draws up to 575W through a single 16-pin connector. Multiple independent investigations – including hardware teardowns by Geeks3D, ElcomSoft, and coverage by Tom’s Hardware – have documented that the 12VHPWR/12V-2×6 connector operates with very little safety margin at these power levels. The connector is rated for 600W; the 5090 draws 575W under sustained load with spikes beyond that.

The practical issue is contact resistance. If the connector is not fully seated, if the cable is bent sharply near the connector, or if there is any oxidation on the pins, uneven current distribution across the 12 power pins can cause local overheating. This can manifest as a momentary voltage drop that the PCIe bus interprets as a bus disconnect – producing Xid 79 – before any physical damage is visible.

What to check:

Reseat the 16-pin connector firmly until it clicks. On large cards in tight cases the latch is easy to miss.
Make sure the cable is not bent at a sharp angle within 5 cm of the GPU connector. Stress on the cable affects pin contact.
If you are using an adapter (e.g. 3× 8-pin to 16-pin), replace it with a native ATX 3.0/3.1 PSU cable. Each adapter junction adds resistance.
The HP OMEN 45L uses a proprietary PSU with a stock cable – verify the cable is the correct one supplied with the system and has not been substituted.

Possible cause B: PCIe Gen 5 signal integrity

The RTX 5090 is one of the first consumer GPUs to use PCIe Gen 5 (32 GT/s). Several reviewers and users have documented that signal integrity at Gen 5 speeds is sensitive to motherboard implementation quality. Igor’s Lab noted this in their RTX 5090 review; the issue is discussed extensively in Hardforum and ASUS ROG forums. Symptoms include instability, black screens, and GPU disconnects – all matching Xid 79.

The widely reported community workaround is to force the PCIe slot to Gen 4 in the BIOS. This has been confirmed to resolve crashes across multiple motherboard brands. Notably, there is an HP support thread documenting the exact same combination (HP workstation + RTX 5090 + Ubuntu 24) where forcing Gen 4 was the resolution – directly relevant to my HP OMEN 45L setup.

To apply this workaround, enter your BIOS and find the PCIe slot configuration for the primary x16 slot. Change it from Auto or Gen 5 to Gen 4. The performance impact for AI inference workloads is negligible – bandwidth between the CPU and GPU is not a bottleneck for local LLM inference; VRAM bandwidth is. Independent testing summarized in the Hardforum thread above puts the performance penalty at under 2% for GPU-compute workloads.

Note: Also consider disabling ASPM (Active State Power Management) for the PCIe slot in BIOS while you are there. Multiple users in the ROG forum thread above found that disabling ASPM in addition to forcing Gen 4 was required for full stability. On the HP OMEN 45L, look for this under Advanced → PCIe Configuration.

How to tell which cause you are hitting

Run this monitoring loop before your next inference session and log it to disk:

while true; do
  echo "$(date '+%Y-%m-%d %H:%M:%S') $(nvidia-smi \
    --query-gpu=temperature.gpu,power.draw,power.limit,pcie.link.gen.current,pcie.link.width.current \
    --format=csv,noheader,nounits 2>&1)" >> ~/gpu_monitor.log
  sleep 5
done

After a crash, check the last 50 lines:

tail -50 ~/gpu_monitor.log

What to look for:

pcie.link.gen.current drops from 5 → 4 → 3 before crash – PCIe signal integrity issue. Apply the Gen 4 BIOS fix.
power.draw repeatedly hitting power.limit – sustained power ceiling, connector stress likely contributing.
All values look normal right up to the last entry – sudden hard disconnect, more likely a physical connector issue.

I am still in the process of determining which cause applies to my specific HP OMEN 45L. I will update this post once I have more data. In the meantime, both the connector check and the PCIe Gen 4 BIOS change are low-risk steps worth taking regardless.