[NV] qwen35 b200 MTP update sglang config by hshrivastava-droid · Pull Request #1065 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-04-17T16:36:47Z

Summary

Update the Qwen 3.5 FP8 B200 MTP (Multi-Token Prediction) SGLang benchmark configuration and script to align with the latest SGLang nightly image and B300 flag conventions.

Changes

Image Update

Bump SGLang image from v0.5.9-cu130 → nightly-dev-20260422-de962f32

Config Changes (nvidia-master.yaml)

Add TP8 search-space points (conc 4) for both 1k1k and 8k1k sequence lengths alongside the existing TP4 points

Benchmark Script Overhaul (qwen3.5_fp8_b200_mtp.sh)

Enable SGLANG_ENABLE_SPEC_V2=1 for speculative decoding v2
Add --enable-symm-mem flag
Switch from --ep-size to --expert-parallel-size (SGLang convention)
Add --tokenizer-path pointing to model
Reduce --max-prefill-tokens and --chunked-prefill-size from 32768 → 16384
Dynamic --scheduler-recv-interval: 30 when CONC > 4, 10 otherwise
Increase --stream-interval from 30 → 50
Remove --fp8-gemm-backend=flashinfer_trtllm and --enable-flashinfer-allreduce-fusion
Remove unused env vars (NCCL_NVLS_ENABLE, SGLANG_ENABLE_JIT_DEEPGEMM, PYTHONUNBUFFERED)
Simplify script by inlining constants instead of intermediate variables

Perf Changelog

Added perf-changelog.yaml entry documenting all changes for qwen3.5-fp8-b200-sglang-mtp

gracehonv · 2026-04-29T21:01:50Z

-
 qwen3.5-fp8-b200-sglang-mtp:
-  image: lmsysorg/sglang:v0.5.9-cu130
+  image: lmsysorg/sglang:v0.5.10.post1-cu130


PLease change image to:
image: lmsysorg/sglang:nightly-dev-20260422-de962f32
Also please add TP8 point for search space:

isl: 1024
osl: 1024
search-space:

{ tp: 8, ep: 1, conc-start: 4, conc-end: 4, spec-decoding: mtp } <--- add

isl: 8192
osl: 1024
search-space:

{ tp: 8, ep: 1, conc-start: 4, conc-end: 4, spec-decoding: mtp } <-- add

gracehonv · 2026-04-29T21:03:31Z

+--speculative-eagle-topk 1 \
+--speculative-num-draft-tokens 4 \
+--context-length $CONTEXT_LENGTH > $SERVER_LOG 2>&1 &



Please change:
--scheduler-recv-interval $( [[ $CONC -gt 4 ]] && echo 30 || echo 10 )

hshrivastava-droid · 2026-04-30T23:45:33Z

@claude - add description

Klaud-Cold · 2026-04-30T23:46:01Z

Claude finished @hshrivastava-droid's task in 42s —— View job

Adding PR Description

Analyze PR changes
Write and update PR description

PR description has been added summarizing all changes:

SGLang image bump to nightly-dev-20260422-de962f32
TP8 search-space points added for 1k1k and 8k1k
Benchmark script overhaul: spec v2, symm-mem, reduced prefill tokens, dynamic scheduler interval, removed unused flags/env vars
Perf changelog entry added

gracehonv

LGTM

functionstackx

recipe pr plz & validation run plz

jgangani

LGTM

hshrivastava-droid · 2026-05-01T20:39:28Z

Sglang cookbook receipe- sgl-project/sgl-cookbook#263

functionstackx · 2026-05-01T20:41:27Z

@claude rebase plz and reverse conflicts

Klaud-Cold · 2026-05-01T20:41:56Z

Claude finished @functionstackx's task in 20s —— View job

Rebase Request

I'm unable to perform git rebase operations due to my current capabilities — I can only create and push commits, not rebase, merge, or perform other branch operations.

To rebase this branch and resolve conflicts, please run locally:

git checkout nv/qwen35-mtp-b200-v2
git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease origin nv/qwen35-mtp-b200-v2

You can find more information and potential workarounds in the FAQ.

functionstackx · 2026-05-01T20:51:29Z

@claude i am pretty sure u have perms to rebase... rebase plz and resolve conflicts

Klaud-Cold · 2026-05-01T20:51:53Z

Claude finished @functionstackx's task in 10m 4s —— View job

Rebasing Branch onto Main

Fetch latest main and check for conflicts
Rebase nv/qwen35-mtp-b200-v2 onto origin/main (2 commits rebased, was 137 commits behind)
Resolve conflicts in perf-changelog.yaml and .github/configs/nvidia-master.yaml
Push rebased branch

Conflicts resolved:

perf-changelog.yaml: Kept all new entries from main, appended this PR's entries for qwen3.5-fp8-b200-sglang and qwen3.5-fp8-b200-sglang-mtp
.github/configs/nvidia-master.yaml: Adopted main's scenarios: fixed-seq-len: format while keeping the TP8 search-space points from this PR

Branch successfully rebased and force-pushed.
|

github-actions · 2026-05-01T22:05:25Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25233017816
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25233017816

github-actions · 2026-05-01T23:11:24Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25233017816
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25233017816

hshrivastava-droid · 2026-05-01T23:18:37Z

@functionstackx
ongoing run link- https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25237087682?pr=1065

github-actions · 2026-05-01T23:35:00Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25237087682
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25237087682

github-actions · 2026-05-01T23:58:23Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25237087682
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25237087682

github-actions · 2026-05-02T00:26:14Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-05-02T00:46:03Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25237087682
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25237087682

functionstackx

lgtm besides recipe PR plz

hshrivastava-droid · 2026-05-04T20:45:26Z

Sglang cookbook receipe- sgl-project/sgl-cookbook#263
@functionstackx

functionstackx

thanks! lgtm, feel free to merge as long as you had validation pass

hshrivastava-droid · 2026-05-04T20:52:24Z

validation pass- https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25237087682?pr=1065

github-actions · 2026-05-04T21:07:39Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25342582662
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25342582662

* update sglang config and image * update configs

hshrivastava-droid requested a review from a team April 17, 2026 16:36

hshrivastava-droid requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 16:36

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

hshrivastava-droid added NVIDIA sweep-enabled labels Apr 17, 2026

gracehonv reviewed Apr 29, 2026

View reviewed changes

hshrivastava-droid changed the title ~~[WIP][NV] qwen35 b200 MTP update sglang config~~ [NV] qwen35 b200 MTP update sglang config Apr 30, 2026

hshrivastava-droid requested a review from gracehonv April 30, 2026 23:45

gracehonv approved these changes Apr 30, 2026

View reviewed changes

functionstackx requested changes May 1, 2026

View reviewed changes

faradawn mentioned this pull request May 1, 2026

Update Qwen3.5 B200 FP8 MTP SGLang recipe sgl-project/sgl-cookbook#263

Open

kedarpotdar-nv approved these changes May 1, 2026

View reviewed changes

jgangani approved these changes May 1, 2026

View reviewed changes

hshrivastava-droid added 2 commits May 1, 2026 21:00

update sglang config and image

7a12cb1

update configs

08dcd39

Klaud-Cold force-pushed the nv/qwen35-mtp-b200-v2 branch from bfa4845 to 08dcd39 Compare May 1, 2026 21:02

Merge branch 'main' into nv/qwen35-mtp-b200-v2

ea65806

hshrivastava-droid added full-sweep-enabled and removed sweep-enabled labels May 1, 2026

hshrivastava-droid requested a review from functionstackx May 2, 2026 00:26

hshrivastava-droid changed the title ~~[NV] qwen35 b200 MTP update sglang config~~ [DO NOT MERGE][NV] qwen35 b200 MTP update sglang config May 2, 2026

hshrivastava-droid changed the title ~~[DO NOT MERGE][NV] qwen35 b200 MTP update sglang config~~ [NV] qwen35 b200 MTP update sglang config May 4, 2026

functionstackx requested changes May 4, 2026

View reviewed changes

Merge branch 'main' into nv/qwen35-mtp-b200-v2

cc1bbbb

hshrivastava-droid requested a review from functionstackx May 4, 2026 20:48

functionstackx approved these changes May 4, 2026

View reviewed changes

hshrivastava-droid merged commit a68d253 into main May 4, 2026
14 of 30 checks passed

hshrivastava-droid deleted the nv/qwen35-mtp-b200-v2 branch May 4, 2026 20:53

github-project-automation Bot moved this to Done in InferenceMAX Board May 4, 2026

xiaohuguo2023 pushed a commit to xiaohuguo2023/InferenceX that referenced this pull request May 6, 2026

[NV] qwen35 b200 MTP update sglang config (SemiAnalysisAI#1065)

9210168

* update sglang config and image * update configs

Conversation

hshrivastava-droid commented Apr 17, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Image Update

Config Changes (nvidia-master.yaml)

Benchmark Script Overhaul (qwen3.5_fp8_b200_mtp.sh)

Perf Changelog

Uh oh!

gracehonv Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gracehonv Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented Apr 30, 2026

Uh oh!

Klaud-Cold commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

gracehonv left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

jgangani left a comment

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented May 1, 2026

Uh oh!

functionstackx commented May 1, 2026

Uh oh!

Klaud-Cold commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rebase Request

Uh oh!

functionstackx commented May 1, 2026

Uh oh!

Klaud-Cold commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rebasing Branch onto Main

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

hshrivastava-droid commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

github-actions Bot commented May 2, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

hshrivastava-droid commented May 4, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

hshrivastava-droid commented Apr 17, 2026 •

edited by Klaud-Cold

Loading

gracehonv Apr 29, 2026 •

edited

Loading

Klaud-Cold commented Apr 30, 2026 •

edited

Loading

Klaud-Cold commented May 1, 2026 •

edited

Loading

Klaud-Cold commented May 1, 2026 •

edited

Loading

hshrivastava-droid commented May 4, 2026 •

edited

Loading