Production runbook for Qwen3.5-122B hybrid INT4+FP8 on NVIDIA DGX Spark GB10 — optimization stack, PD firmware wedge diagnosis, bench results
cuda inference moe quantization mtp runbook blackwell llm-serving vllm llm-inference speculative-decoding autoround flashinfer qwen3 gb10 dgx-spark qwen3-5 sm121 qwen3-5-122b-a10b asus-gx10
-
Updated
Jun 18, 2026