#
qwen3-5-122b-a10b
Here are 3 public repositories matching this topic...
Production runbook for Qwen3.5-122B hybrid INT4+FP8 on NVIDIA DGX Spark GB10 — optimization stack, PD firmware wedge diagnosis, bench results
cuda inference moe quantization mtp runbook blackwell llm-serving vllm llm-inference speculative-decoding autoround flashinfer qwen3 gb10 dgx-spark qwen3-5 sm121 qwen3-5-122b-a10b asus-gx10
-
Updated
Jun 18, 2026
Cogni-Brain on DGX Spark: Qwen3.5-122B-A10B INT4+FP8 hybrid, DFlash speculative decoding, 262K context, ~54 tok/s, 100/100 Tool-Eval, vLLM.
benchmark telegram-bot nvidia reasoning tool-use long-context fp8 vllm llm-agent local-llm local-ai qwen speculative-decoding openai-compatible dgx-spark dflash qwen3-5-122b-a10b
-
Updated
Jun 30, 2026 - Python
Improve this page
Add a description, image, and links to the qwen3-5-122b-a10b topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the qwen3-5-122b-a10b topic, visit your repo's landing page and select "manage topics."