A curated list of awesome tools, guides, playbooks, and resources for the NVIDIA DGX Spark, the GB10 Grace Blackwell personal AI supercomputer.
DGX Spark is a desktop machine built on the GB10 Grace Blackwell Superchip (SM 12.1 / sm_121), with 128 GB of unified CPU+GPU memory. You can link two units over 200 Gb/s networking to run larger models. This list collects community projects for setting it up, serving models, fine-tuning, benchmarking, and day-to-day operation.
Platform essentials: aarch64 · CUDA 13.x · sm_121 · 128 GB unified memory · 200 Gb/s NVLink-C2C
- Official
- Setup & Configuration
- Inference & Serving
- Fine-tuning
- Quantization & NVFP4
- Models & Benchmarks
- Multi-node
- Image & Media Generation
- Audio & Speech
- Science & HPC
- Remote Access & Desktop
- Tools & Monitoring
- Operating Systems & Containers
- Community & Resource Collections
- NVIDIA/dgx-spark-playbooks - Official step-by-step playbooks for AI/ML workloads on DGX Spark: vLLM, SGLang, Ollama, unsloth, ComfyUI, FLUX, multi-node, and more.
- a1exus/sparky - Self-hosted DGX Spark LLM stack with vLLM, llama.cpp, and Ollama behind Traefik and three-path ingress.
- botAGI/AGmind - One-command private RAG stack for DGX Spark (aarch64/GB10), with dual-Spark cluster support and 30+ containers.
- Chrizz-lab/GB10-Agentig-Coding-Framework - Agentic coding stack for DGX Spark with dual-vLLM Qwen3 and CrewAI orchestration.
- csabakecskemeti/dgx-spark-community-playbooks - Community playbook collection for DGX Spark, covering dual-Spark RDMA inference, heterogeneous RoCE clustering, and local Claude Code.
- getainode/ainode - Browser-UI AI appliance for GB10 (DGX Spark, ASUS GX10) with UDP-discovered multi-Spark tensor-parallel clustering, verified on a 4-node 487 GB cluster.
- GuigsEvt/dgx_spark_config - End-to-end setup for AI workloads on DGX Spark.
- HeKun-NVIDIA/dgx-spark-openclaw - One-command deploy of a local LLM plus OpenClaw agent frontend on DGX Spark, using a GB10 NVFP4-kernel vLLM image.
- HendrikSchoettle/ragflow-dgx-spark - Build and deploy pipeline for RAGFlow v0.24.0 on DGX Spark aarch64, with a source-built onnxruntime-gpu wheel for sm_121 and multilingual OCR.
- install-safe-press/gb10-playbooks - Chinese-language hands-on playbooks for GB10 DGX Spark, spanning hardware, networking, inference engines, fine-tuning, and clustering.
- IryNeko/patched-flash_attn-2.8.3-for-dgx-spark - Prebuilt flash-attn 2.8.3 wheel for DGX Spark aarch64, patched to add sm_121 onto the sm_80 kernel path.
- JetBrains-Hardware/spark-setup - DGX Spark setup and vLLM deployment scripts for Qwen, GPT-OSS, and Nemotron 3.
- jl-codes/dgx-spark-ai - Curriculum for running GPT-OSS 120B on DGX Spark with unified-memory architecture lessons.
- mARTin-B78/dgx-spark_lite-llm_llama-swap_vllm_llama-cpp_ollama - Multi-engine LLM stack for DGX Spark with llama-swap VRAM eviction and a LiteLLM gateway, tiered for GB10's 128 GB unified memory.
- natolambert/dgx-spark-setup - Setup guide focused on ML training (GB10 Blackwell, CUDA 13, aarch64).
- raphaelamorim/spark-playbooks - Community playbooks and recipes for deploying AI models and workloads on DGX Spark.
- Sggin1/DGX-SPARK - Research and tests with containers and benchmarks for GB10 (SM 12.1).
- sjug/dgx-spark-ethernet-patch - Binary patch for the DGX Spark OOBE ethernet-detection bug, an 8-byte ARM64 HasInternet edit for FastOS 1.120.38.
- theshiphq/claw-spark - One-command OpenClaw and Ollama agent for DGX Spark at ~59 tok/s on Qwen3.5-35B-A3B.
- timothystewart6/ubuntu-gb10 - Ubuntu 24.04 setup guide for GB10 running standard Ubuntu Server instead of DGX OS, covering DGX Spark and ASUS/Lenovo/Dell partner units.
- AEON-7/vllm-dflash - DGX Spark vLLM image wiring DFlash speculative decoding and NVFP4, 64 tok/s single-stream on GB10.
- AEON-7/vllm-ultimate-dgx-spark - DGX Spark vLLM 0.22.1 image with Triton NVFP4 KV cache (3x capacity), DFlash, and four sm_121a patches.
- airawatraj/dgx-spark-nemotron-super-agent - Nemotron-3-Super-120B agentic stack on DGX Spark with tool-calling and spark-arena 23.7 tok/s.
- atcuality2021/vllm-gb10-gemma4 - Gemma 4 backport for DGX Spark with GB10 fixes: sm_121 NCCL build, CUTLASS FP8 disable, Ray unified-memory.
- Avarok-Cybersecurity/dgx-vllm - vLLM Docker image for DGX Spark.
- bjk110/spark_vllm_docker - vLLM serving for DGX Spark spanning single-box TP=1 and dual-Spark TP=2 over 200 Gb/s RoCE, with sm_121 FP8 and NVFP4 patches.
- eelbaz/dgx-spark-vllm-setup - One-command vLLM installation for DGX Spark with Blackwell GB10 GPUs (sm_121 architecture).
- EmilHaase/DGX-Spark-VLLM-Hydra-Manager - vLLM manager for DGX Spark with sm_121a source builds and UMA KV-cache limits for multi-model launch.
- Entrpi/ds4-spark-vllm - 2-bit DeepSeek-V4-Flash vLLM serving on a single DGX Spark, with KV-layout fork patches raising concurrency to 229x.
- eugr/spark-vllm-docker - Docker configuration for running vLLM on dual DGX Sparks with Ray/PyTorch distributed mode.
- gyohng/spark-vllm-compose - Run vLLM on DGX Spark with Docker Compose.
- jleighfields/vllm-dgx-spark - Tools for hosting vLLM on DGX Spark.
- jordanovski/overdrive - Async TUI, CLI, and web orchestrator for launching, monitoring, and benchmarking concurrent vLLM instances on DGX Spark via NGC containers.
- mark-ramsey-ri/vllm-dgx-spark - Run vLLM on 1-to-N DGX Spark servers (single Spark, 2 via direct cable, or 3+ via switched fabric) to serve or benchmark LLMs.
- omnia-projetcs/spark-dgx - Interactive vLLM Docker launcher for DGX Spark with per-model NVFP4/FP8 configs across a 20+ model catalog.
- phuongncn/asus-gx10-qwen35-speed-hack - One-shell-script hybrid INT4+FP8 + MTP vLLM setup for Qwen3.5 on ASUS GX10 / DGX Spark, 35B 30 to 112 tok/s.
- spark-arena/sparkrun - Launch, manage, and stop LLM inference workloads on DGX Spark systems.
- technigmaai/dgx-spark - vLLM serving recipes for Qwen3.6 (PrismaQuant and NVFP4) on DGX Spark, with chat-template fixes and benchmark notes.
- timothystewart6/vllm-gb10 - Docker image for DGX Spark with the full vLLM stack pinned by commit SHA or digest.
- croll83/llama.cpp-dgx - Fork of llama.cpp optimized for DGX Spark with NVFP4, TurboQuant, and DFlash MTP.
- DandinPower/llama.cpp_bench - Benchmarking scripts and performance reports for llama.cpp on DGX Spark.
- phuongncn/qwen3.6-27b-speedhack-gx10-dgx-spark - DFlash block-diffusion spec-decode llama.cpp for Qwen3.6-27B on DGX Spark (GB10), 7 to 38 tok/s coding via p_min drafting.
- shamily/gemma4-llama-dgx-spark - Dockerized Gemma 4 inference with llama.cpp for GB10 (ARM64 + CUDA 13).
- ZengboJamesWang/Qwen3.5-35B-A3B-openclaw-dgx-spark - Run Qwen3.5-35B-A3B with llama.cpp and openclaw on DGX Spark (GB10).
- BTankut/dgx-spark-sglang-moe-configs - SGLang MoE kernel configs for DGX Spark (GLM-4.7-FP8).
- mark-ramsey-ri/sglang-dgx-spark - Run SGLang on 1-to-N DGX Spark servers (single Spark, 2 via direct cable, or 3+ via switched fabric) to serve or benchmark LLMs.
- ridanuae/dgx-spark-sglang-qwen35 - Run Qwen3.5-35B-A3B on DGX Spark with SGLang (Docker image and guide).
- scottgl9/sglang-spark-gb10-optimizations - SGLang fork that gets NVFP4 models running on SM121 (Marlin FP4 path around broken CUTLASS FP4) plus GB10 unified-memory tuning, with MTP decode benchmarks.
- antirez/ds4 - DeepSeek 4 Flash local inference engine in C with a dedicated
cuda-sparkbuild target and published GB10 benchmarks. - Avarok-Cybersecurity/atlas - Pure-Rust LLM inference engine with a dedicated GB10/Spark hardware target, KV-cache quantization, and a pluggable model and hardware abstraction.
- calico88x/DGX-Model-Manager - Single-file web UI for managing Ollama, SGLang, vLLM, llama.cpp, LocalAI, and ComfyUI on DGX Spark.
- dataforgex/dgx_spark - Multi-model LLM serving with vLLM, web UI, and tool calling.
- jdaln/dgx-spark-inference-stack - Docker serving stack for a single DGX Spark with on-demand model loading, automatic idle shutdown, and a unified API gateway.
- joshhu/meetaclawtaipei - Three concurrent NVFP4 vLLM models on one DGX Spark with a 3-LLM voice-clone roommate demo.
- kshetrajna12/sparkstation - LLM gateway for DGX Spark fronting vLLM, SGLang, and TRT-LLM under one OpenAI-compatible API, with auto-suspend and thermal protection.
- mark-ramsey-ri/trt-dgx-spark - TensorRT-LLM serving on 1-to-N DGX Spark with an arm64 nvcr 1.2.1 container and tensor-parallel auto-scaling to cluster size.
- rdoiron/mimo-mods-for-dgx-spark - Ten vLLM runtime patches for MiMo-V2.5 on sm_121a, with a CUTLASS block-FP8 bypass and a backported tool-call corruption fix (PR #42969).
- re-cinq/minimax-m2.5-nvidia-dgx - MiniMax-M2.5 (230B-A10B) GGUF inference server for DGX Spark via llama.cpp Docker Compose, with an OpenCode agent frontend.
- Th0rgal/dgx-spark-router - Zero-dependency OpenAI-compatible router for DGX Spark that swaps llama.cpp and vLLM NVFP4 backends in-place to fit 128 GB unified memory.
- wshobson/minimax-dgx-spark - MiniMax M2 inference server for DGX Spark.
- albond/DGX_Spark_Unsloth_Lossless_Speedup - Unsloth optimizations for Qwen3.5 fine-tuning on DGX Spark, reaching 7.67x LoRA / 8.35x full fine-tune speedups with a bit-identical loss curve.
- alicankiraz1/DGX-Spark-Asus-Ascent-Nvidia-GB10-SFT-Finetuner - No-code SFT fine-tuning tool for DGX Spark.
- haven-jeon/unsloth-vllm-gb10 - Unsloth training and vLLM inference Docker image for DGX Spark GB10 with source-built xformers and Triton.
- kreuzhofer/dgx-spark-unsloth-qwen3.5-training - BF16 LoRA fine-tuning of Qwen3.5-35B-A3B on a single DGX Spark with unsloth.
- NvMayMay/nvfp4-lora-spark - NVFP4-aware LoRA training and serving for Nemotron-3 MoE on one GB10, Super-120B at 93 GB peak with loss 1.00 vs BF16 0.98.
- riomus/dgx-spark-unsloth - Unsloth usage on DGX Spark using UV and NVIDIA's Docker image.
- waybarrios/dgx-spark-finetune-llm - LLM fine-tuning with LoRA + NVFP4/MXFP8 on DGX Spark.
GB10's Blackwell architecture supports NVFP4 (4-bit floating point) in hardware. It runs faster than INT4 at similar quality.
- AEON-7/Gemma-4-26B-A4B-it-Uncensored-NVFP4 - NVFP4 Gemma 4 26B MoE on DGX Spark with DFlash speculative decoding, 39-155 tok/s single-stream.
- AEON-7/Gemma-4-31B-Uncensored-NVFP4-DFlash - vLLM image for DGX Spark serving NVFP4 Gemma 4 31B (Deckard Heretic) with z-lab DFlash speculative decoding.
- AEON-7/Nemotron-3-Nano-Omni-AEON-Ultimate-Uncensored - Source-built vLLM image for DGX Spark serving abliterated Nemotron-3-Nano-Omni multimodal in BF16 and NVFP4.
- AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash - Prebuilt vLLM container for DGX Spark with abliterated Qwen3.6-27B (NVFP4 + DFlash), sm_121a-patched for 37.6 tok/s vs 10.5 raw.
- AEON-7/Qwen3.6-NVFP4-DFlash - Source-built vLLM image with 7 sm_121a patches serving NVFP4 Qwen3.6-35B-A3B at 84 tok/s with DFlash speculative decoding.
- AEON-7/supergemma4-26b-abliterated-multimodal-nvfp4 - NVFP4 (AWQ) SuperGemma4-26B abliterated multimodal for DGX Spark, as a prebuilt vLLM container.
- BioInfo/turboquant-dgx - TurboQuant KV-cache quantization on GB10 with 3.88x compression and 8.4x Triton kernel speedup.
- localai-org/apex-quant - MoE-aware mixed-precision GGUF quant recipe, quality and throughput benchmarked on GB10.
- Logos-Flux/optimized-CUDA-GB10 - CUDA kernels (RMSNorm, GELU) for GB10 sm_121, the first sm_121 kernels on the Hugging Face Kernel Hub.
- mitkox/sparser-faster-llms - GB10 sm_121 CUDA-core TwELL sparse-kernel port of SakanaAI's sparser-faster-llms for DGX Spark builds without Hopper WGMMA.
- Plaaasma/FlashQLA-Blackwell - Qwen's FlashQLA TileLang Gated Delta Net kernels ported to GB10 (sm_121), dropping into vLLM as a faster prefill kernel for Qwen3.6 linear-attention models.
- r0b0tlab/gemma4-26b-a4b-nvfp4-gb10-native-cutlass - Gemma-4-26B-A4B NVFP4 for GB10 via native VLLM_CUTLASS MoE backend on CUDA-13 nightly, 260 tok/s at concurrency 8.
- r0b0tlab/gemma4-31b-it-nvfp4-gb10 - Gemma-4-31B-IT NVFP4 reproducibility pack for GB10 with native FlashInfer/CUTLASS FP4 GEMM, 54 tok/s at concurrency 8.
- r0b0tlab/nemotron3-super-120b-a12b-nvfp4-gb10-native-mtp - Nemotron-3-Super-120B-A12B NVFP4 for GB10 on SGLang native MTP, 21.64 tok/s and +45.8% over baseline.
- r0b0tlab/qwen36-27b-nvfp4-gb10-native-mtp - Qwen3.6-27B-Text NVFP4 reproducibility pack for GB10 (sm_121) with vLLM native MTP, 93 tok/s at concurrency 4.
- r0b0tlab/qwen36-35b-a3b-nvfp4-gb10-native-mtp - Qwen3.6-35B-A3B NVFP4 for GB10 (sm_121) on SGLang native MTP with a GDN-attention loader patch, 174 tok/s at concurrency 4.
- secYOUre/nvfp4bench - NVFP4 peak-throughput CLI for GB10 sm_121a, 1022 TFLOPS sparse and 511 dense via packed mxf4nvf4 MMA.
- adadrag/qwen3.5-dgx-spark - Guide to running Qwen3.5-35B-A3B on DGX Spark (GB10) with vLLM: installation, benchmarks, vision features, and troubleshooting.
- albond/DGX_Spark_Qwen3.5-122B-A10B-AR-INT4 - Qwen3.5-122B-A10B on DGX Spark, tuned from 28.3 to 51 tok/s (+80%).
- Avarok-Cybersecurity/atlas-recipes - Sparkrun recipe registry for the Atlas engine on GB10, 15+ NVFP4 models with validated KV/MoE settings and per-model tok/s.
- bigs/deepseek-v4-flash-dgx-spark - Runtime experiments and serving harness for DeepSeek-V4-Flash on a single DGX Spark.
- casualcomputer/rtx_pro_6000_vs_dgx_spark - DGX Spark vs RTX PRO 6000 inference benchmark with memory-bandwidth analysis across batch sizes.
- DanTup/spark-evals - Accuracy evals (BFCL, BigCodeBench, IFEvalCode) for models and quantizations that fit on a single DGX Spark, as a leaderboard.
- elsung/dgx-spark-deepseek-v4-flash - Cross-machine DeepSeek-V4-Flash benchmark for dual DGX Spark, with a 10-model GGUF suite and 500K long-context decay curves.
- Entrpi/ds4-on-spark - Single-Spark deployment of antirez's ds4 (DeepSeek-V4-Flash) with measured benchmarks and a memory-bandwidth roofline analysis, documenting a CUDA MTP-parity gap.
- jeremy-newhouse/dgx-spark-nemotron-super-bench - Single-stream decode benchmark of Nemotron-3-Super-120B-A12B-NVFP4 on one GB10, ~26-27 tok/s realistic with MTP vs ~33.6 microbench.
- Kleybrink/dgx-spark-bench - Benchmarking framework measuring throughput, latency, VRAM, and accuracy with LLM-as-a-Judge.
- lmxxf/deepseek-v4-deployment-on-dgx-spark - DeepSeek-V4 deployment guide for DGX Spark.
- marksunner/dgx-spark-ds4-benchmark - DeepSeek-V4-Flash distributed across two DGX Sparks with antirez's ds4 engine (pipeline parallel), benchmarked across context lengths.
- marksunner/dgx-spark-step37-flash - Notes on running StepFun's Step 3.7 Flash (198B MoE) on a single DGX Spark with llama.cpp at ~27 tok/s and 128K context.
- martimramos/dgx-spark-ml-guide - Guide to running PyTorch and ML workloads on DGX Spark.
- Memoriant/dgx-spark-kv-cache-benchmark - KV cache quantization on GB10: dequantization cliff (q4_0 −37% gen tps at 110K), unified-memory paradox, prefill immunity.
- nabe2030/dense-27b-31b-dgx-spark - Benchmark of Qwen 3.5/3.6-27B and Gemma 4-31B on DGX Spark.
- nabe2030/gemma4-vs-qwen35-dgx-spark - Gemma 4 vs Qwen 3.5 MoE benchmark with llama.cpp.
- r0b0tlab/deepseek-v4-flash-nvfp4-gb10-benchmark - DeepSeek-V4-Flash FP8 benchmark on dual DGX Spark (sm_121a, TP=2, RoCE, MTP), 7.5x to 38 tok/s from build-commit pinning.
- r0b0tlab/diffusiongemma-26b-nvfp4-sm121-vllm - vLLM container and benchmark for DiffusionGemma 26B-A4B NVFP4 on GB10 via native CUTLASS FP4 MoE.
- r0b0tlab/minimax-m27-nvfp4-gb10-benchmark - MiniMax-M2.7 NVFP4 benchmark on dual GB10 (sm_121) via vLLM FlashInfer-CUTLASS, 25.06 tok/s tg128 with an arm64 image.
- r0b0tlab/nex-n2-mini-nvfp4 - NVFP4 vLLM container for Nex-N2-mini (Qwen3.5-MoE-35B) on GB10, 185 tok/s aggregate at concurrency 8.
- r0b0tlab/step37-flash-nvfp4-sm121-vllm-docker - vLLM container for StepFun's Step 3.7 Flash NVFP4 (198B MoE VLM) on dual GB10 TP=2, with verified native-CUTLASS sm_121 execution at 16.49 tok/s.
- rossingram/Spark-DGX-Benchmark - Benchmark script testing compute, memory bandwidth, diffusion, and LLM throughput on DGX Spark.
- wengzhiwen/DeepSeek-OCR-DGX-Spark - DeepSeek OCR on DGX Spark (ARM64 + CUDA 13.0).
- Weschera/spark-bench - LLM benchmark for DGX Spark across 64 scenarios and 11 domains, with agentic multi-turn workflows and a calibration score for prompt-injection resistance and over-refusal.
- yunusshin/DGX_Spark_Qwen3.5-35B-A3B-Optimized - Qwen3.5-35B-A3B optimizations for DGX Spark: INT8 lm_head and MTP-2, 64 to 113 tok/s.
You can connect two DGX Spark units directly over 200 Gb/s QSFP for double the memory and compute.
- ArgentAIOS/dgx-spark-cluster - 2-node setup with EXO inference, NCCL tuning, NVMe-TCP storage, and 200 Gb/s fabric.
- bird/GLM-spark - GLM-5.2 469B (REAP-pruned from 753B, NVFP4) served across three DGX Spark nodes with vLLM pipeline-parallel, 256K context at ~4.4 tok/s decode.
- bkrabach/dgx-spark-cluster - Dual-node LLM cluster setup kit with Ray + vLLM.
- cesarb-ai/dgx-spark-cluster-compass - Guide to clustering DGX Spark nodes for multi-node vLLM inference (NCCL, RoCE, Ray).
- CosmicRaisins/glm-5.2-gb10 - GLM-5.2 (744B MoE) on a 4-node GB10 cluster, porting the Hopper-only sparse-MLA attention to sm_121 with custom Triton kernels at 256K context.
- digchick/dgx-spark-200g-link-fix - Troubleshooting playbook for the 200G ConnectX-7 link failing to train between two Sparks (CX7 hotplug power-saving), with the fix and NCCL/RoCE verification.
- hazyumps/deepseek-v4-flash-gb10 - Recipe and patches to serve DeepSeek-V4-Flash across two GB10 Sparks with vLLM (tensor + expert parallel over RoCE) at 384K context.
- HeNryous/mimo-spark-optimized - MiMo-V2.5 NVFP4 on two DGX Spark TP=2, with a custom WMMA tensor-core flash-decode kernel and 4-bit NVFP4 KV-cache that doubles KV capacity.
- idonati/spark-vllm-docker-festr2 - vLLM patches for festr2 MiMo-V2.5 NVFP4/MXFP8 on an 8-node sm_121 cluster, with a fused-QKV loader fix for Q mis-slotted as K/V on 7 of 8 ranks.
- makiisthenes/dgx-spark-multinode-vllm-ray - Dual-DGX Spark vLLM deployment with NVIDIA vLLM 26.04, Ray, and 200 GbE QSFP.
- pfn/spark-vllm-compose - Multi-node Docker Compose configuration for vLLM on DGX Spark.
- rajsinghtechbot/dgx-spark-vllm-k8s - Kubernetes cookbook for DeepSeek-V4-Flash on dual DGX Spark, with Multus/Spiderpool RDMA over RoCEv2, UMA-aware container memory limits, and Prometheus monitoring.
- RustRunner/DGX-Llama-Cluster - Three-node llama.cpp cluster for DGX Spark over ConnectX-7 RDMA, 384 GB pooled unified memory.
- tomsti/guides - GB10 cluster guide for DGX Spark over ConnectX-7 RoCE, covering NCCL rail pinning, the duplicate-MAC workaround, and MikroTik 400G switching.
- tonyd2wild/deepseek-v4-flash-2x-spark-1m - DeepSeek-V4-Flash at 1M-token context on dual DGX Spark, 45.5 tok/s decode and 786 tok/s prefill on 800K prompts.
- tonyd2wild/MiMo-V2.5-TP3-NVFP4-KV-3xDGX-Spark - MiMo V2.5 Omni (310B MoE, text/image/video/audio) at tensor-parallel 3 across three DGX Sparks, with 4-bit NVFP4 KV cache for a ~10.6M-token KV pool at 1M context.
- tonyd2wild/Minimax-M3-NVFP-3x-DGX-Sparks-TP-3 - MiniMax-M3 NVFP4 (428B-A23B) served at tensor-parallel 3 across three DGX Sparks, with head-padding and RoCE fixes.
- vroomfondel/dgxarley - Ansible playbooks for a K3s cluster of four DGX Spark nodes and an x86 control plane, running distributed SGLang inference.
- ZD-AI-Lab/Triple-GB10 - Three-node GB10 RoCE ring (QSFP, no switch) for Ray + vLLM pipeline-parallel across 3 Sparks.
- AEON-7/comfyui-aeon-spark - ComfyUI Docker for DGX Spark with SageAttention v3 compiled for sm_121a, CUDA 13, NVFP4, and Flux 2 / LTX 2.3 pre-bundled.
- CoconutMacaroon/blender-arm64 - Blender build for GB10 aarch64 with CUDA, OptiX, and Vulkan, shipping a prebuilt DGX Spark binary release.
- dr-vij/Hunyuan3D-2.1-DGX-Spark-Docker - Hunyuan3D-2.1 3D generation on DGX Spark via Docker Compose, building custom_rasterizer and DifferentiableRenderer CUDA components on-box.
- dr-vij/Trellis2-DGX-Spark-Docker - Trellis2 3D generation on DGX Spark.
- ecarmen16/SparkyUI - ComfyUI + SageAttention for DGX Spark (ARM64, sm_121).
- HurbaLurba/DGX-SPARK-COMFYUI-DOCKER - Ansible deployment of ComfyUI on DGX Spark with SageAttention, FP16, and unified-memory CUDA tuning.
- joeynyc/cosmos-locateanything-dgx - Two-stage DGX Spark pipeline: Cosmos 3 video generation, then NVIDIA LocateAnything object grounding.
- kabilankb/cosmos3-nano-gb10 - Cosmos3-Nano (16B) video and image generation on GB10 with auto-applied ARM Blackwell patches.
- luix93/DGX-Spark-ComfyUI - Setup for running ComfyUI on DGX Spark.
- mmartial/ComfyUI-Nvidia-Docker - Multi-platform ComfyUI Docker (x86_64, Blackwell, DGX Spark) with notes for compiling SageAttention on sm_121a.
- mvalancy/blender-nvidia-gb10 - Blender 5.0.1 source build for GB10 aarch64 with Cycles CUDA-13 GPU rendering, via 8 sm_121/CUDA-13 patches.
- phaserblast/ComfyUI-DGXSparkSafetensorsLoader - Zero-copy model loader for ComfyUI on DGX Spark using the fastsafetensors library.
- raibid-entertainment/dgx-pixels - Stable Diffusion + LoRA pipeline for pixel art generation on DGX Spark.
- Triplany/comfyui-dgx-spark - Updater kit for an existing ComfyUI install on DGX Spark, adding sm_121 SageAttention and an aarch64 cu13 ONNX Runtime wheel.
- AEON-7/qwen3-asr-server - OpenAI /v1/audio/transcriptions server for Qwen3-ASR-0.6B, vLLM-native with sm_120 flash-attn 2, hot-path RTF 16x.
- AEON-7/qwen3-tts-server - OpenAI /v1/audio/speech server for Qwen3-TTS-1.7B-VoiceDesign with sm_120 flash-attn 2, hot-path RTF 1.30x and free-form voice conditioning.
- jxlarrea/homeassistant-voice-recipes - Home Assistant voice stack for DGX Spark with GB10 ARM64 ASR fork and Gemma-4-26B on llama.cpp.
- kedarpotdar-nv/spark-realtime-chatbot - On-device voice and vision assistant for DGX Spark, ~320 ms voice latency on one GB10 with Qwen3-VL and faster-whisper.
- Logos-Flux/spark-voice-pipeline - Real-time voice assistant on DGX Spark achieving ~766 ms latency to first audio.
- mARTin-B78/dgx-spark-faster-qwen3-tts - Faster-Qwen3-TTS on DGX Spark (GB10) as an OpenAI-compatible TTS API with CUDA-graph acceleration and four voice backends.
- Mekopa/whisperx-blackwell - GPU-accelerated WhisperX on Blackwell (sm_121) for DGX Spark.
- rappdw/transcribe-dgx - Audio transcription with speaker diarization for DGX Spark using WhisperX.
Beyond LLMs, GB10's unified memory and aarch64 stack run scientific compute: protein folding, biomolecular prediction, and RAN simulation.
- adrian-greenneuron/openfold3-DGX-Spark - OpenFold3 protein-structure prediction on DGX Spark with DeepSpeed sm_121 patches.
- rcbarke/ai-ran-dgx-spark - NVIDIA Aerial and Sionna 5G/6G RAN simulation on DGX Spark over multi-node fabric.
- sanjyotshenoy/boltz-gb10-spark - Boltz-2 biomolecular-interaction prediction on DGX Spark with Triton-nightly sm_121 codegen.
- eelbaz/dgx-spark-headless-sunshine - Headless remote desktop setup for DGX Spark using Sunshine streaming.
- seanGSISG/dgx-spark-sunshine-setup - Headless 4K remote desktop to DGX Spark over Sunshine.
- amer8/pulsebar - Unofficial macOS menu bar monitor that streams GPU and memory telemetry from the DGX Spark dashboard.
- antheas/spark_hwmon - Linux hwmon kernel driver exposing GB10 system power telemetry (per-rail power, energy counters, temperatures) and PL1/PL2 power-cap controls via sysfs.
- ateska/dgx-spark-prometheus - Prometheus metrics exporter for DGX Spark clusters.
- chappa-ai-llc/spark-smi - System-monitor TUI for DGX Spark with unified-memory and Grace P/E-core awareness, MT2910 200 Gb/s NIC bandwidth, and mixed sm_121 + sm_86 GPU support.
- chronosolidus/dgxsparkmonitor - Cyberpunk-themed real-time monitoring dashboard for DGX Spark over SSH.
- CINOAdam/nvml-unified-shim - NVML LD_PRELOAD shim for GB10 unified memory with /proc and CUDA fallback when NVML reports NVML_ERROR_NOT_SUPPORTED.
- DanTup/dgx_dashboard - Simple monitoring dashboard for DGX Spark.
- dorangao/dgx-spark-toolkit - Validation scripts for DGX Spark hardware and networking: RoCE checks, NCCL 200 GbE tests, RDMA pods.
- engineering87/sparkfit - Memory capacity planner for DGX Spark: 128 GB unified-memory split, roofline tok/s estimate, and quantization advisor.
- GigCoder-ai/dgxtop - Terminal hardware monitor for DGX Spark with GB10 GPU, CPU, memory, and per-drive I/O speeds.
- hoesing/spark-gpu-throttle-check - Throttle test for DGX Spark that loads the GB10 with cuBLAS matmuls to detect sub-850 MHz USB-PD power-delivery throttling.
- jasonacox/dgx-spark - Tools for the NVIDIA DGX Spark AI personal supercomputer.
- joeynyc/spark-doctor - Diagnostic CLI for DGX Spark that flags the GB10 14 W power cap, unified-memory pressure, and thermal risk, and validates vLLM/Ollama/SGLang recipes.
- lynx-lee/lynx-ollama - Ollama manager for DGX Spark with GB10 unified-memory detection and auto-tuned concurrency.
- mcampa/sparkrun-ui - Web UI for sparkrun on DGX Spark with launch wizard, live log tail, and cluster monitor.
- mchenetz/sparkd - Localhost dashboard for a DGX Spark fleet, with HF browsing, Claude-generated vLLM recipes, and single-box or Ray-cluster launch.
- parallelArchitect/sparkview - Terminal GPU monitor with GB10-aware unified-memory reporting, memory-pressure (PSI) and power-rail readouts, and an anomaly auto-logger.
- paul-aviles/NVIDIA-DGX-Spark-Dashboard - Browser-based monitoring dashboard for DGX Spark nodes.
- thx0701/dgx-spark-status - Real-time system monitoring dashboard built with SvelteKit and SSE.
- vybe/sparky - Vue 3 web UI for DGX Spark with ComfyUI generation, Ollama chat, voice, and container control.
- wentbackward/nv-monitor - Terminal monitor and Prometheus exporter for DGX Spark in one zero-dependency C binary, with HugePages-correct unified memory and Grace big.LITTLE core labels.
- graham33/nixos-dgx-spark - Nix and NixOS on DGX Spark with USB images and flake templates.
- scitrera/cuda-containers - CUDA container builds for version consistency and reproducibility on DGX Spark.
- straylight-software/nixos-dgx-spark - NixOS configuration for DGX Spark (GB10).
- AEON-7/AEON-7 - Index of AEON-7's DGX Spark releases: NVFP4 model packs, prebuilt vLLM images, and a voice-AI stack.
- jeremyeder/dgx-agentskills - Claude Code integration for DGX Spark: local model serving, GPU monitoring, and VM management.
- odnodn/dgx-spark - Curated collection of NVIDIA DGX Spark resources and self-hosted AI projects.
Contributions are welcome. Read the contribution guidelines before opening a pull request.