Skip to content

bidual/awesome-dgx-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome DGX Spark Awesome

A curated list of awesome tools, guides, playbooks, and resources for the NVIDIA DGX Spark, the GB10 Grace Blackwell personal AI supercomputer.

DGX Spark is a desktop machine built on the GB10 Grace Blackwell Superchip (SM 12.1 / sm_121), with 128 GB of unified CPU+GPU memory. You can link two units over 200 Gb/s networking to run larger models. This list collects community projects for setting it up, serving models, fine-tuning, benchmarking, and day-to-day operation.

Platform essentials: aarch64 · CUDA 13.x · sm_121 · 128 GB unified memory · 200 Gb/s NVLink-C2C

Contents

Official

  • NVIDIA/dgx-spark-playbooks - Official step-by-step playbooks for AI/ML workloads on DGX Spark: vLLM, SGLang, Ollama, unsloth, ComfyUI, FLUX, multi-node, and more.

Setup & Configuration

Inference & Serving

vLLM

llama.cpp

SGLang

Other Engines

  • antirez/ds4 - DeepSeek 4 Flash local inference engine in C with a dedicated cuda-spark build target and published GB10 benchmarks.
  • Avarok-Cybersecurity/atlas - Pure-Rust LLM inference engine with a dedicated GB10/Spark hardware target, KV-cache quantization, and a pluggable model and hardware abstraction.
  • calico88x/DGX-Model-Manager - Single-file web UI for managing Ollama, SGLang, vLLM, llama.cpp, LocalAI, and ComfyUI on DGX Spark.
  • dataforgex/dgx_spark - Multi-model LLM serving with vLLM, web UI, and tool calling.
  • jdaln/dgx-spark-inference-stack - Docker serving stack for a single DGX Spark with on-demand model loading, automatic idle shutdown, and a unified API gateway.
  • joshhu/meetaclawtaipei - Three concurrent NVFP4 vLLM models on one DGX Spark with a 3-LLM voice-clone roommate demo.
  • kshetrajna12/sparkstation - LLM gateway for DGX Spark fronting vLLM, SGLang, and TRT-LLM under one OpenAI-compatible API, with auto-suspend and thermal protection.
  • mark-ramsey-ri/trt-dgx-spark - TensorRT-LLM serving on 1-to-N DGX Spark with an arm64 nvcr 1.2.1 container and tensor-parallel auto-scaling to cluster size.
  • rdoiron/mimo-mods-for-dgx-spark - Ten vLLM runtime patches for MiMo-V2.5 on sm_121a, with a CUTLASS block-FP8 bypass and a backported tool-call corruption fix (PR #42969).
  • re-cinq/minimax-m2.5-nvidia-dgx - MiniMax-M2.5 (230B-A10B) GGUF inference server for DGX Spark via llama.cpp Docker Compose, with an OpenCode agent frontend.
  • Th0rgal/dgx-spark-router - Zero-dependency OpenAI-compatible router for DGX Spark that swaps llama.cpp and vLLM NVFP4 backends in-place to fit 128 GB unified memory.
  • wshobson/minimax-dgx-spark - MiniMax M2 inference server for DGX Spark.

Fine-tuning

Quantization & NVFP4

GB10's Blackwell architecture supports NVFP4 (4-bit floating point) in hardware. It runs faster than INT4 at similar quality.

Models & Benchmarks

Multi-node

You can connect two DGX Spark units directly over 200 Gb/s QSFP for double the memory and compute.

  • ArgentAIOS/dgx-spark-cluster - 2-node setup with EXO inference, NCCL tuning, NVMe-TCP storage, and 200 Gb/s fabric.
  • bird/GLM-spark - GLM-5.2 469B (REAP-pruned from 753B, NVFP4) served across three DGX Spark nodes with vLLM pipeline-parallel, 256K context at ~4.4 tok/s decode.
  • bkrabach/dgx-spark-cluster - Dual-node LLM cluster setup kit with Ray + vLLM.
  • cesarb-ai/dgx-spark-cluster-compass - Guide to clustering DGX Spark nodes for multi-node vLLM inference (NCCL, RoCE, Ray).
  • CosmicRaisins/glm-5.2-gb10 - GLM-5.2 (744B MoE) on a 4-node GB10 cluster, porting the Hopper-only sparse-MLA attention to sm_121 with custom Triton kernels at 256K context.
  • digchick/dgx-spark-200g-link-fix - Troubleshooting playbook for the 200G ConnectX-7 link failing to train between two Sparks (CX7 hotplug power-saving), with the fix and NCCL/RoCE verification.
  • hazyumps/deepseek-v4-flash-gb10 - Recipe and patches to serve DeepSeek-V4-Flash across two GB10 Sparks with vLLM (tensor + expert parallel over RoCE) at 384K context.
  • HeNryous/mimo-spark-optimized - MiMo-V2.5 NVFP4 on two DGX Spark TP=2, with a custom WMMA tensor-core flash-decode kernel and 4-bit NVFP4 KV-cache that doubles KV capacity.
  • idonati/spark-vllm-docker-festr2 - vLLM patches for festr2 MiMo-V2.5 NVFP4/MXFP8 on an 8-node sm_121 cluster, with a fused-QKV loader fix for Q mis-slotted as K/V on 7 of 8 ranks.
  • makiisthenes/dgx-spark-multinode-vllm-ray - Dual-DGX Spark vLLM deployment with NVIDIA vLLM 26.04, Ray, and 200 GbE QSFP.
  • pfn/spark-vllm-compose - Multi-node Docker Compose configuration for vLLM on DGX Spark.
  • rajsinghtechbot/dgx-spark-vllm-k8s - Kubernetes cookbook for DeepSeek-V4-Flash on dual DGX Spark, with Multus/Spiderpool RDMA over RoCEv2, UMA-aware container memory limits, and Prometheus monitoring.
  • RustRunner/DGX-Llama-Cluster - Three-node llama.cpp cluster for DGX Spark over ConnectX-7 RDMA, 384 GB pooled unified memory.
  • tomsti/guides - GB10 cluster guide for DGX Spark over ConnectX-7 RoCE, covering NCCL rail pinning, the duplicate-MAC workaround, and MikroTik 400G switching.
  • tonyd2wild/deepseek-v4-flash-2x-spark-1m - DeepSeek-V4-Flash at 1M-token context on dual DGX Spark, 45.5 tok/s decode and 786 tok/s prefill on 800K prompts.
  • tonyd2wild/MiMo-V2.5-TP3-NVFP4-KV-3xDGX-Spark - MiMo V2.5 Omni (310B MoE, text/image/video/audio) at tensor-parallel 3 across three DGX Sparks, with 4-bit NVFP4 KV cache for a ~10.6M-token KV pool at 1M context.
  • tonyd2wild/Minimax-M3-NVFP-3x-DGX-Sparks-TP-3 - MiniMax-M3 NVFP4 (428B-A23B) served at tensor-parallel 3 across three DGX Sparks, with head-padding and RoCE fixes.
  • vroomfondel/dgxarley - Ansible playbooks for a K3s cluster of four DGX Spark nodes and an x86 control plane, running distributed SGLang inference.
  • ZD-AI-Lab/Triple-GB10 - Three-node GB10 RoCE ring (QSFP, no switch) for Ray + vLLM pipeline-parallel across 3 Sparks.

Image & Media Generation

Audio & Speech

Science & HPC

Beyond LLMs, GB10's unified memory and aarch64 stack run scientific compute: protein folding, biomolecular prediction, and RAN simulation.

Remote Access & Desktop

Tools & Monitoring

  • amer8/pulsebar - Unofficial macOS menu bar monitor that streams GPU and memory telemetry from the DGX Spark dashboard.
  • antheas/spark_hwmon - Linux hwmon kernel driver exposing GB10 system power telemetry (per-rail power, energy counters, temperatures) and PL1/PL2 power-cap controls via sysfs.
  • ateska/dgx-spark-prometheus - Prometheus metrics exporter for DGX Spark clusters.
  • chappa-ai-llc/spark-smi - System-monitor TUI for DGX Spark with unified-memory and Grace P/E-core awareness, MT2910 200 Gb/s NIC bandwidth, and mixed sm_121 + sm_86 GPU support.
  • chronosolidus/dgxsparkmonitor - Cyberpunk-themed real-time monitoring dashboard for DGX Spark over SSH.
  • CINOAdam/nvml-unified-shim - NVML LD_PRELOAD shim for GB10 unified memory with /proc and CUDA fallback when NVML reports NVML_ERROR_NOT_SUPPORTED.
  • DanTup/dgx_dashboard - Simple monitoring dashboard for DGX Spark.
  • dorangao/dgx-spark-toolkit - Validation scripts for DGX Spark hardware and networking: RoCE checks, NCCL 200 GbE tests, RDMA pods.
  • engineering87/sparkfit - Memory capacity planner for DGX Spark: 128 GB unified-memory split, roofline tok/s estimate, and quantization advisor.
  • GigCoder-ai/dgxtop - Terminal hardware monitor for DGX Spark with GB10 GPU, CPU, memory, and per-drive I/O speeds.
  • hoesing/spark-gpu-throttle-check - Throttle test for DGX Spark that loads the GB10 with cuBLAS matmuls to detect sub-850 MHz USB-PD power-delivery throttling.
  • jasonacox/dgx-spark - Tools for the NVIDIA DGX Spark AI personal supercomputer.
  • joeynyc/spark-doctor - Diagnostic CLI for DGX Spark that flags the GB10 14 W power cap, unified-memory pressure, and thermal risk, and validates vLLM/Ollama/SGLang recipes.
  • lynx-lee/lynx-ollama - Ollama manager for DGX Spark with GB10 unified-memory detection and auto-tuned concurrency.
  • mcampa/sparkrun-ui - Web UI for sparkrun on DGX Spark with launch wizard, live log tail, and cluster monitor.
  • mchenetz/sparkd - Localhost dashboard for a DGX Spark fleet, with HF browsing, Claude-generated vLLM recipes, and single-box or Ray-cluster launch.
  • parallelArchitect/sparkview - Terminal GPU monitor with GB10-aware unified-memory reporting, memory-pressure (PSI) and power-rail readouts, and an anomaly auto-logger.
  • paul-aviles/NVIDIA-DGX-Spark-Dashboard - Browser-based monitoring dashboard for DGX Spark nodes.
  • thx0701/dgx-spark-status - Real-time system monitoring dashboard built with SvelteKit and SSE.
  • vybe/sparky - Vue 3 web UI for DGX Spark with ComfyUI generation, Ollama chat, voice, and container control.
  • wentbackward/nv-monitor - Terminal monitor and Prometheus exporter for DGX Spark in one zero-dependency C binary, with HugePages-correct unified memory and Grace big.LITTLE core labels.

Operating Systems & Containers

Community & Resource Collections

  • AEON-7/AEON-7 - Index of AEON-7's DGX Spark releases: NVFP4 model packs, prebuilt vLLM images, and a voice-AI stack.
  • jeremyeder/dgx-agentskills - Claude Code integration for DGX Spark: local model serving, GPU monitoring, and VM management.
  • odnodn/dgx-spark - Curated collection of NVIDIA DGX Spark resources and self-hosted AI projects.

Contributing

Contributions are welcome. Read the contribution guidelines before opening a pull request.

About

A curated list of tools, guides, playbooks, and resources for the NVIDIA DGX Spark (GB10 Grace Blackwell personal AI supercomputer).

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages