qwen3-5-122b-a10b

Here are 3 public repositories matching this topic...

Qwen3.5-122B-A10B on DGX Spark: 28.3 → 51 tok/s (+80%)

cuda lossless mtp speedup performance-optimization vllm autoround dgx-spark qwen3-5 sm121 qwen3-5-122b-a10b

Production runbook for Qwen3.5-122B hybrid INT4+FP8 on NVIDIA DGX Spark GB10 — optimization stack, PD firmware wedge diagnosis, bench results

Cogni-Brain on DGX Spark: Qwen3.5-122B-A10B INT4+FP8 hybrid, DFlash speculative decoding, 262K context, ~54 tok/s, 100/100 Tool-Eval, vLLM.

benchmark telegram-bot nvidia reasoning tool-use long-context fp8 vllm llm-agent local-llm local-ai qwen speculative-decoding openai-compatible dgx-spark dflash qwen3-5-122b-a10b

Add a description, image, and links to the qwen3-5-122b-a10b topic page so that developers can more easily learn about it.

To associate your repository with the qwen3-5-122b-a10b topic, visit your repo's landing page and select "manage topics."