Skip to content

Latest commit

 

History

History
144 lines (106 loc) · 3.96 KB

File metadata and controls

144 lines (106 loc) · 3.96 KB

Verification: how to confirm the fix worked

After running ./install.sh, the system should be in a known-good state. The following checks confirm it.

1. The symlink chain

ls -la ~/.unsloth/llama.cpp/build/bin/libhsa-runtime64*

Expected output:

lrwxrwxrwx … libhsa-runtime64.so   -> libhsa-runtime64.so.1
lrwxrwxrwx … libhsa-runtime64.so.1 -> /opt/rocm/lib/libhsa-runtime64.so.1

The third name (libhsa-runtime64.so.1.21.0) should not be present in the build/bin directory; it should be in hsa-bundled.bak/.

2. The chain resolves to the system library

readlink -f ~/.unsloth/llama.cpp/build/bin/libhsa-runtime64.so.1

Expected output:

/opt/rocm/lib/libhsa-runtime64.so.1.18.0

The trailing .18.0 is the version of the system library. If the trailing version is different, the system ROCm package has been updated and the fix may need to be re-applied (the daily timer should do this automatically, but check the journal if you suspect otherwise).

3. The library actually loads

ldd ~/.unsloth/llama.cpp/build/bin/llama-server | grep hsa

Expected output:

libhsa-runtime64.so.1 => /home/bricker/.unsloth/llama.cpp/build/bin/libhsa-runtime64.so.1 (...)

The resolved path is the symlink; ldd follows it to the system library at runtime.

4. The path unit is active

systemctl --user status fix-unsloth-libhsa.path

Expected: Active: active (waiting). The "waiting" state means it has a live inotify watch and is ready to fire.

5. The backup timer is armed

systemctl --user status fix-unsloth-libhsa-backup.timer

Expected: Active: active (waiting) with a Trigger: line showing the next fire time (some time in the next 24 hours).

6. The fix script runs cleanly

UNSLOTH_FIX_VERIFY=1 ~/.local/bin/fix-unsloth-libhsa.sh
echo "exit: $?"

Expected: exit: 0. If UNSLOTH_FIX_VERIFY=1 is set, the script runs llama-server --version as a smoke test and logs the result to ~/.unsloth/libhsa-fix.log.

7. The model loads end-to-end

The definitive test. Substitute the model path and context size for your setup:

MODEL=~/.cache/huggingface/hub/models--unsloth--gemma-4-12b-it-qat-GGUF/snapshots/7102bdea62863acff919c945405ef29973113d66/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf
MMPROJ=~/.cache/huggingface/hub/models--unsloth--gemma-4-12b-it-qat-GGUF/snapshots/7102bdea62863acff919c945405ef29973113d66/mmproj-F16.gguf
DRAFTER=~/.cache/huggingface/hub/models--unsloth--gemma-4-12b-it-qat-GGUF/snapshots/7102bdea62863acff919c945405ef29973113d66/mtp-gemma-4-12B-it.gguf

ROCR_VISIBLE_DEVICES=0 \
  timeout 60s \
  ~/.unsloth/llama.cpp/llama-server \
    -m "$MODEL" \
    --port 59999 \
    -c 8192 \
    --parallel 1 \
    --flash-attn on \
    -ngl -1 \
    --jinja \
    --mmproj "$MMPROJ" \
    --model-draft "$DRAFTER" \
    --spec-type draft-mtp \
    --spec-draft-n-max 2 \
    --fit off

Expected output (key lines):

ROCm0 : AMD Radeon RX 7900 XT (20464 MiB, 20194 MiB free)
load_model: loading model '…/gemma-4-12B-it-qat-UD-Q4_K_XL.gguf'
load_model: loaded multimodal model
load_model: speculative decoding context initialized
slot  load_model: id  0 | task -1 | new slot, n_ctx = 8192

A failing run prints Segmentation fault (core dumped) within 1 second and exits with status 139 (128 + signal 11).

8. The fix log

tail ~/.unsloth/libhsa-fix.log

Expected: a chronological record of every re-application, every parked event, every ROCR_VISIBLE_DEVICES re-add, and every rollback-dir prune. If a line says smoke test FAILED, run step 7 manually and inspect coredumpctl info $(coredumpctl list -n1 llama-server | awk 'NR==2{print $5}') for the stack trace.

9. The systemd journal

journalctl --user -u fix-unsloth-libhsa.path -u fix-unsloth-libhsa.service --since today

Expected: at least one entry from when the path unit last fired (it may not have fired today; that's also fine — it means the symlink was already in place and the directory did not change).