vllm-gb10

Reproducible vLLM Docker image for the NVIDIA DGX Spark (GB10 / sm_121a). Every input - CUDA base image, PyTorch stack, NCCL, FlashInfer, vLLM - is pinned by commit SHA or digest. The same versions.env always produces the same image.

Hardware: DGX Spark (GB10 SoC) only. The image targets linux/arm64 with TORCH_CUDA_ARCH_LIST=12.1a. It will not run on x86 or other GPU architectures.

Quick start

Pull the latest release and serve a model:

docker pull ghcr.io/timothystewart6/vllm-gb10:latest

docker run --rm -it \
  --gpus all \
  --ipc=host \
  --network host \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  ghcr.io/timothystewart6/vllm-gb10:latest \
  vllm serve <model> --host 0.0.0.0 --port 8000 --gpu-memory-utilization 0.7

For a pinned version see the releases page for the full component table and immutable tag for each build.

What's in the image

Each release page lists the exact versions of every component. Key stack:

Component	Pinned by
CUDA base image	digest (`sha256:...`)
vLLM	git commit SHA
PyTorch / TorchVision / TorchAudio / Triton	exact version
NCCL	git commit SHA (built from source)
FlashInfer	git commit SHA (built from source)
vllm-rs Rust frontend	built from source (axum HTTP server + PyO3 tool-parser module)
bitsandbytes, accelerate	exact version (4-bit/8-bit quantization and HuggingFace model loading)
Ray, uv, and other runtime deps	lockfile hash

All pins live in versions.env. All lockfiles live in locks/.

Known limitations

See the issues tab for tracked upstream compatibility gaps.

Image tags

Each build publishes four tags:

Tag	Notes
`v0.24.0-gb10.0`	Canonical, immutable. vLLM version + stack revision.
`v0.24.0-cu13.2-torch2.11-gb10.0`	Same image - adds CUDA and PyTorch versions for quick scanning.
`latest`	Mutable - always points at the most recent green build of `main`.
`sha-<short_sha>`	Immutable, tied to the exact Git commit that produced it.

gb10.<N> increments when any non-vLLM input changes (CUDA, PyTorch, NCCL, FlashInfer, etc.) on the same vLLM version. It resets to 0 when VLLM_REF bumps. There is intentionally no bare v0.24.0 tag - it would be mutable.

Bumping versions

Edit one or more _REF lines in versions.env on a branch
Open a pull request - the run-bump.yaml workflow picks it up, runs scripts/bump.sh on the DGX Spark runner, and commits the resolved _COMMIT SHAs, updated GB10_BUILD, and regenerated lockfiles back to your branch
Review the diff that CI committed, then merge
A green build on main publishes updated image tags to GHCR and creates a GitHub Release automatically

You do not need to SSH into the Spark or run anything locally.

CI also triggers on changes to Dockerfile, locks/, scripts/, and checksums/.

Contributing

See CONTRIBUTING.md. Security issues: SECURITY.md.

License

MIT - see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
.github		.github
checksums		checksums
examples/two-node-cluster		examples/two-node-cluster
locks		locks
scripts		scripts
tests		tests
zIgnore		zIgnore
.dockerignore		.dockerignore
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
versions.env		versions.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vllm-gb10

Quick start

What's in the image

Known limitations

Image tags

Bumping versions

Contributing

License

About

Uh oh!

Releases 10

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

vllm-gb10

Quick start

What's in the image

Known limitations

Image tags

Bumping versions

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 10

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages