From the workspace root:
bash build_and_load_cpu.sh [model_name]model_name filters which GGUF is uploaded (default: picks the most recently converted *.gguf).
That single command:
- Cross-compiles
llama-cliforaarch64usingaarch64-linux-gnu-gcc - Installs the binary and shared libs to
llama.cpp/build-aarch64/install/ - Uploads to the board (
khadas@192.168.1.58) under~/programs/llama_cpu/:llama-clilibllama.so/libggml*.so<model_name>-<QUANT>.gguf(fromllama.cpp/models/, if present)
# Upload the most recently converted model
bash build_and_load_cpu.sh
# Upload a specific model
bash build_and_load_cpu.sh Qwen3-0.6B
bash build_and_load_cpu.sh Qwen2.5-0.5B-InstructNote: run
convert_to_gguf.shfirst to produce the GGUF model before deploying.