Skip to content

feat(qwen3.6): add Qwen3.6-35B-A3B-NVFP4 + MTP Atlas recipe#2

Merged
AzeezIsh merged 1 commit into
mainfrom
feat/qwen3.6-35b-nvfp4
May 11, 2026
Merged

feat(qwen3.6): add Qwen3.6-35B-A3B-NVFP4 + MTP Atlas recipe#2
AzeezIsh merged 1 commit into
mainfrom
feat/qwen3.6-35b-nvfp4

Conversation

@tbraun96
Copy link
Copy Markdown
Contributor

Summary

Adds the Atlas recipe for RedHatAI/Qwen3.6-35B-A3B-NVFP4 with the upstream-bundled MTP K=2 draft head. Mirrors the existing qwen3.5-35b-a3b-nvfp4 recipe; only the model id changes (qwen3_5_moe arch is identical) and max_model_len is bumped to 131072 so the full spark-arena-v2 depth sweep up to 100k fits.

Result (live-validated on a single GB10)

sparkrun benchmark run recipes/qwen3.6/qwen3.6-35b-a3b-nvfp4-atlas.yaml --profile spark-arena-v2 --port 8888

  • Decode @ tg=128, pp=2048, depth=0, concurrency=1: 214.59 tok/s (4.66 ms TPOT)
  • vs current Spark Arena leaderboard #6 (Qwen3.6-35B-A3B-NVFP4 on vLLM): 77.07 tok/s → 2.78× speedup
  • 25/28 cells in the heat-aware schedule produced clean arena-valid run JSONs
  • 3 cells (d=4096,c=1 etc.) hit an Atlas-side hang on `avarok/atlas-gb10:latest`; recoverable via `sparkrun benchmark resume bench_b9efdca5fa68`

Reproducibility

  • Image: `avarok/atlas-gb10:latest` (stock public)
  • Model: `RedHatAI/Qwen3.6-35B-A3B-NVFP4` (stock public)
  • Sparkrun: v0.2.31 (PyPI)
  • No engine patches required

Test plan

  • `sparkrun show recipes/qwen3.6/qwen3.6-35b-a3b-nvfp4-atlas.yaml` parses
  • `sparkrun benchmark run … --profile spark-arena-v2` produces consolidated.json with 25 cells
  • Result beats every NVFP4 entry on the current Spark Arena leaderboard

🤖 Generated with Claude Code

Mirrors the qwen3.5-35b-a3b-nvfp4 recipe but uses RedHatAI's NVFP4
quantization of Qwen3.6-35B-A3B (model_type=qwen3_5_moe). MTP K=2
draft head retained from upstream; KV cache stays NVFP4 for the full
quant pipeline.

Live-validated via `sparkrun benchmark run --profile spark-arena-v2`
on a single GB10: 25/28 cells produced clean arena-valid JSON in
/workspace/.cache/sparkrun/benchmarks/bench_b9efdca5fa68/.
Headline: c=1 d=0 decode = 214.6 tok/s (4.66 ms TPOT), ~4x faster
than the same-size Qwen3.6-35B-A3B-FP8 vLLM+MTP path.

Co-Authored-By: Azeez Ishaqui <debaterishaqui@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants