A small, defensive workaround for a vendor-side bug: Unsloth Studio's bundled
llama.cpp prebuilt ships a libhsa-runtime64.so.1 (version 1.21.x) that
segfaults on AMD RDNA 3 (gfx110X) GPUs running system ROCm 1.18.x when
llama-server initializes the HSA agent. The fix parks the bundled library
and routes the runtime through the system one.
roc r::AMD::GpuAgent::InitDma() -> SIGSEGV
This package installs a systemd user unit (a path watcher and a daily
backup timer) that detects when the Unsloth installer re-extracts the
broken library and re-applies the fix automatically.
Status: actively maintained. Verified on CachyOS with an AMD Radeon RX
7900 XT and unslothai/llama.cpp prebuilt b9692 (June 2026).
When you load a GGUF model through Unsloth Studio on an AMD RDNA 3 GPU,
llama-server exits with SIGSEGV in rocr::AMD::GpuAgent::InitDma()
because the prebuilt ships an libhsa-runtime64.so.1.21.0 that is
incompatible with the system's ROCm runtime. The first core dump lands
~500 ms after launch. The error message Unsloth Studio surfaces to the
user is misleading:
Failed to load model: llama-server failed to start. Check that the GGUF file is valid and you have enough memory.
It is not the GGUF file. It is not memory. It is the bundled HSA runtime.
See docs/PROBLEM.md for the full post-mortem with
stack traces, coredumpctl output, and timing analysis.
Park the bundled libhsa-runtime64.so.1.21.0 to hsa-bundled.bak/, then
create a symlink at the same path that points to
/opt/rocm/lib/libhsa-runtime64.so.1 (the system library, 1.18.x). The
system library initializes cleanly on gfx110X. The model loads.
See docs/SOLUTION.md for why this approach was
chosen over alternatives.
git clone https://github.com/boundring/unsloth-rocm-libhsa-fix.git
cd unsloth-rocm-libhsa-fix
./install.shThe installer:
- Copies the fix script to
~/.local/bin/fix-unsloth-libhsa.sh. - Installs three systemd user units in
~/.config/systemd/user/. - Enables and starts the path watcher and the daily backup timer.
- Runs the fix once so the system is in a clean state immediately.
- Prints the symlink chain for confirmation.
To uninstall, run ./uninstall.sh from the cloned repo.
| Component | Tested with |
|---|---|
| Distro | CachyOS (Arch-based), kernel 7.0.12 |
| GPU | AMD Radeon RX 7900 XT (gfx1100) + Raphael iGPU (gfx1036, hidden) |
| System ROCm | hsa-rocr from pacman (provides libhsa-runtime64.so.1.18.0) |
| Unsloth prebuilt | app-b9692-mix-2d6bd50-linux-x64-rocm-gfx110X.tar.gz |
| Unsloth Studio | As of June 2026 |
| Model (worked example) | unsloth/gemma-4-12b-it-qat-GGUF (UD-Q4_K_XL) |
The fix is not expected to break any other GPU generation. The bundled
library is replaced only on RDNA 3; on older or newer generations, the
system library should also work. If it does not, the daily backup timer
will log the failure and notify-send will alert you. See
docs/COMPATIBILITY.md for details.
┌──────────────────────────┐
│ ~/.unsloth/llama.cpp/ │
│ build/bin/ │
│ │
│ libhsa-runtime64.so │ ← symlink (this package)
│ libhsa-runtime64.so.1 ───┼──→ /opt/rocm/lib/libhsa-runtime64.so.1
│ │ ↓
│ (other bundled .so's │ libhsa-runtime64.so.1.18.0 ← pacman
│ untouched) │
└──────────────────────────┘
▲
│ inotify watches this directory
│
┌────────┴─────────┐
│ fix-unsloth- │ daily backup timer
│ libhsa.path │ (if inotify missed an event)
│ (systemd path) │
└────────┬─────────┘
│ fires
▼
┌────────────────────┐
│ fix-unsloth- │
│ libhsa.service │ oneshot; runs
│ (systemd service) │ fix-unsloth-libhsa.sh
└────────┬───────────┘
│ exec
▼
┌────────────────────┐
│ fix-unsloth- │
│ libhsa.sh │ parks bundled 1.21.x,
│ │ symlinks to system 1.18.x,
│ │ re-adds ROCR_VISIBLE_DEVICES=0,
│ │ prunes stale rollback venvs
└────────────────────┘
When the Unsloth installer replaces libhsa-runtime64.so.1 (e.g. as part
of unsloth studio update), PathChanged on the directory fires, the
service runs the script, the bundled file is moved aside, the symlink is
re-created, and the next llama-server launch uses the working system
library.
The included verification is a 30-second manual invocation of
llama-server with the model's exact arguments. A passing run prints,
among other things:
ROCm0 : AMD Radeon RX 7900 XT (20464 MiB, 20194 MiB free)
load_model: loading model '/…/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf'
load_model: loaded multimodal model
load_model: speculative decoding context initialized
slot load_model: id 0 | task -1 | new slot, n_ctx = 8192
A failing run prints Segmentation fault (core dumped) and writes a
core file in /var/lib/systemd/coredump/. See
docs/VERIFICATION.md for the full check list,
including ldd output and coredumpctl commands.
| File | Audience |
|---|---|
README.md |
This file. Start here. |
DOCUMENTATION.md |
The same problem explained in four literary styles: Elmore Leonard, George Orwell, Terry Pratchett, and Philip José Farmer. Read for context and character. |
docs/PROBLEM.md |
Technical post-mortem. Stack traces, timing analysis, coredumpctl output. |
docs/SOLUTION.md |
Why we chose symlinking over alternatives (LD_PRELOAD, build-from-source, fork, etc). |
docs/VERIFICATION.md |
How to confirm the fix worked. |
docs/COMPATIBILITY.md |
What hardware, distro, ROCm, and Unsloth versions are covered. |
docs/FUTURE.md |
Known limitations, considered alternatives, and ideas for the next iteration. |
CREDITS.md |
Who built this, who helped, and which upstream projects are involved. |
MIT. See LICENSE.