[NV] qwen35 b200 MTP update sglang config#1065
Conversation
|
|
||
| qwen3.5-fp8-b200-sglang-mtp: | ||
| image: lmsysorg/sglang:v0.5.9-cu130 | ||
| image: lmsysorg/sglang:v0.5.10.post1-cu130 |
There was a problem hiding this comment.
PLease change image to:
image: lmsysorg/sglang:nightly-dev-20260422-de962f32
Also please add TP8 point for search space:
- isl: 1024
osl: 1024
search-space:- { tp: 8, ep: 1, conc-start: 4, conc-end: 4, spec-decoding: mtp } <--- add
- isl: 8192
osl: 1024
search-space:- { tp: 8, ep: 1, conc-start: 4, conc-end: 4, spec-decoding: mtp } <-- add
| --speculative-eagle-topk 1 \ | ||
| --speculative-num-draft-tokens 4 \ | ||
| --context-length $CONTEXT_LENGTH > $SERVER_LOG 2>&1 & | ||
|
|
There was a problem hiding this comment.
Please change:
--scheduler-recv-interval $( [[ $CONC -gt 4 ]] && echo 30 || echo 10 )
|
@claude - add description |
|
Claude finished @hshrivastava-droid's task in 42s —— View job Adding PR Description
PR description has been added summarizing all changes:
|
functionstackx
left a comment
There was a problem hiding this comment.
recipe pr plz & validation run plz
|
Sglang cookbook receipe- sgl-project/sgl-cookbook#263 |
|
@claude rebase plz and reverse conflicts |
|
Claude finished @functionstackx's task in 20s —— View job Rebase RequestI'm unable to perform git rebase operations due to my current capabilities — I can only create and push commits, not rebase, merge, or perform other branch operations. To rebase this branch and resolve conflicts, please run locally: git checkout nv/qwen35-mtp-b200-v2
git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease origin nv/qwen35-mtp-b200-v2You can find more information and potential workarounds in the FAQ. |
|
@claude i am pretty sure u have perms to rebase... rebase plz and resolve conflicts |
|
Claude finished @functionstackx's task in 10m 4s —— View job Rebasing Branch onto Main
Conflicts resolved:
Branch successfully rebased and force-pushed. |
bfa4845 to
08dcd39
Compare
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25233017816 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25233017816 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25237087682 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25237087682 |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you
PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25237087682 |
functionstackx
left a comment
There was a problem hiding this comment.
lgtm besides recipe PR plz
|
Sglang cookbook receipe- sgl-project/sgl-cookbook#263 |
functionstackx
left a comment
There was a problem hiding this comment.
thanks! lgtm, feel free to merge as long as you had validation pass
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25342582662 |
* update sglang config and image * update configs
Summary
Update the Qwen 3.5 FP8 B200 MTP (Multi-Token Prediction) SGLang benchmark configuration and script to align with the latest SGLang nightly image and B300 flag conventions.
Changes
Image Update
v0.5.9-cu130→nightly-dev-20260422-de962f32Config Changes (nvidia-master.yaml)
1k1kand8k1ksequence lengths alongside the existing TP4 pointsBenchmark Script Overhaul (qwen3.5_fp8_b200_mtp.sh)
--enable-symm-memflag--ep-sizeto--expert-parallel-size(SGLang convention)--tokenizer-pathpointing to model--max-prefill-tokensand--chunked-prefill-sizefrom 32768 → 16384--scheduler-recv-interval: 30 when CONC > 4, 10 otherwise--stream-intervalfrom 30 → 50--fp8-gemm-backend=flashinfer_trtllmand--enable-flashinfer-allreduce-fusionNCCL_NVLS_ENABLE,SGLANG_ENABLE_JIT_DEEPGEMM,PYTHONUNBUFFERED)Perf Changelog
perf-changelog.yamlentry documenting all changes forqwen3.5-fp8-b200-sglang-mtp