Reproducible vLLM Docker image for the
NVIDIA DGX Spark (GB10 / sm_121a). Every input - CUDA base image, PyTorch
stack, NCCL, FlashInfer, vLLM - is pinned by commit SHA or digest. The same
versions.env always produces the same image.
Hardware: DGX Spark (GB10 SoC) only. The image targets
linux/arm64withTORCH_CUDA_ARCH_LIST=12.1a. It will not run on x86 or other GPU architectures.
Pull the latest release and serve a model:
docker pull ghcr.io/timothystewart6/vllm-gb10:latest
docker run --rm -it \
--gpus all \
--ipc=host \
--network host \
-v ~/.cache/huggingface:/root/.cache/huggingface \
ghcr.io/timothystewart6/vllm-gb10:latest \
vllm serve <model> --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.7For a pinned version see the releases page for the full component table and immutable tag for each build.
Each release page lists the exact versions of every component. Key stack:
| Component | Pinned by |
|---|---|
| CUDA base image | digest (sha256:...) |
| vLLM | git commit SHA |
| PyTorch / TorchVision / TorchAudio / Triton | exact version |
| NCCL | git commit SHA (built from source) |
| FlashInfer | git commit SHA (built from source) |
| vllm-rs Rust frontend | built from source (axum HTTP server + PyO3 tool-parser module) |
| bitsandbytes, accelerate | exact version (4-bit/8-bit quantization and HuggingFace model loading) |
| Ray, uv, and other runtime deps | lockfile hash |
All pins live in versions.env. All lockfiles live in locks/.
See the issues tab for tracked upstream compatibility gaps.
Each build publishes four tags:
| Tag | Notes |
|---|---|
v0.24.0-gb10.0 |
Canonical, immutable. vLLM version + stack revision. |
v0.24.0-cu13.2-torch2.11-gb10.0 |
Same image - adds CUDA and PyTorch versions for quick scanning. |
latest |
Mutable - always points at the most recent green build of main. |
sha-<short_sha> |
Immutable, tied to the exact Git commit that produced it. |
gb10.<N> increments when any non-vLLM input changes (CUDA, PyTorch, NCCL,
FlashInfer, etc.) on the same vLLM version. It resets to 0 when VLLM_REF
bumps. There is intentionally no bare v0.24.0 tag - it would be mutable.
- Edit one or more
_REFlines inversions.envon a branch - Open a pull request - the
run-bump.yamlworkflow picks it up, runsscripts/bump.shon the DGX Spark runner, and commits the resolved_COMMITSHAs, updatedGB10_BUILD, and regenerated lockfiles back to your branch - Review the diff that CI committed, then merge
- A green build on
mainpublishes updated image tags to GHCR and creates a GitHub Release automatically
You do not need to SSH into the Spark or run anything locally.
CI also triggers on changes to Dockerfile, locks/, scripts/, and
checksums/.
See CONTRIBUTING.md. Security issues: SECURITY.md.
MIT - see LICENSE.