Skip to content

[NV] qwen35 b200 MTP update sglang config#1065

Merged
hshrivastava-droid merged 4 commits intomainfrom
nv/qwen35-mtp-b200-v2
May 4, 2026
Merged

[NV] qwen35 b200 MTP update sglang config#1065
hshrivastava-droid merged 4 commits intomainfrom
nv/qwen35-mtp-b200-v2

Conversation

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

@hshrivastava-droid hshrivastava-droid commented Apr 17, 2026

Summary

Update the Qwen 3.5 FP8 B200 MTP (Multi-Token Prediction) SGLang benchmark configuration and script to align with the latest SGLang nightly image and B300 flag conventions.

Changes

Image Update

  • Bump SGLang image from v0.5.9-cu130nightly-dev-20260422-de962f32

Config Changes (nvidia-master.yaml)

  • Add TP8 search-space points (conc 4) for both 1k1k and 8k1k sequence lengths alongside the existing TP4 points

Benchmark Script Overhaul (qwen3.5_fp8_b200_mtp.sh)

  • Enable SGLANG_ENABLE_SPEC_V2=1 for speculative decoding v2
  • Add --enable-symm-mem flag
  • Switch from --ep-size to --expert-parallel-size (SGLang convention)
  • Add --tokenizer-path pointing to model
  • Reduce --max-prefill-tokens and --chunked-prefill-size from 32768 → 16384
  • Dynamic --scheduler-recv-interval: 30 when CONC > 4, 10 otherwise
  • Increase --stream-interval from 30 → 50
  • Remove --fp8-gemm-backend=flashinfer_trtllm and --enable-flashinfer-allreduce-fusion
  • Remove unused env vars (NCCL_NVLS_ENABLE, SGLANG_ENABLE_JIT_DEEPGEMM, PYTHONUNBUFFERED)
  • Simplify script by inlining constants instead of intermediate variables

Perf Changelog

  • Added perf-changelog.yaml entry documenting all changes for qwen3.5-fp8-b200-sglang-mtp


qwen3.5-fp8-b200-sglang-mtp:
image: lmsysorg/sglang:v0.5.9-cu130
image: lmsysorg/sglang:v0.5.10.post1-cu130
Copy link
Copy Markdown

@gracehonv gracehonv Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PLease change image to:
image: lmsysorg/sglang:nightly-dev-20260422-de962f32
Also please add TP8 point for search space:

  • isl: 1024
    osl: 1024
    search-space:
    • { tp: 8, ep: 1, conc-start: 4, conc-end: 4, spec-decoding: mtp } <--- add
  • isl: 8192
    osl: 1024
    search-space:
    • { tp: 8, ep: 1, conc-start: 4, conc-end: 4, spec-decoding: mtp } <-- add

--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--context-length $CONTEXT_LENGTH > $SERVER_LOG 2>&1 &

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change:
--scheduler-recv-interval $( [[ $CONC -gt 4 ]] && echo 30 || echo 10 )

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@claude - add description

@hshrivastava-droid hshrivastava-droid changed the title [WIP][NV] qwen35 b200 MTP update sglang config [NV] qwen35 b200 MTP update sglang config Apr 30, 2026
@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 30, 2026

Claude finished @hshrivastava-droid's task in 42s —— View job


Adding PR Description

  • Analyze PR changes
  • Write and update PR description

PR description has been added summarizing all changes:

  • SGLang image bump to nightly-dev-20260422-de962f32
  • TP8 search-space points added for 1k1k and 8k1k
  • Benchmark script overhaul: spec v2, symm-mem, reduced prefill tokens, dynamic scheduler interval, removed unused flags/env vars
  • Perf changelog entry added

Copy link
Copy Markdown

@gracehonv gracehonv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recipe pr plz & validation run plz

Copy link
Copy Markdown
Collaborator

@jgangani jgangani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

Sglang cookbook receipe- sgl-project/sgl-cookbook#263

@functionstackx
Copy link
Copy Markdown
Contributor

@claude rebase plz and reverse conflicts

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented May 1, 2026

Claude finished @functionstackx's task in 20s —— View job


Rebase Request

I'm unable to perform git rebase operations due to my current capabilities — I can only create and push commits, not rebase, merge, or perform other branch operations.

To rebase this branch and resolve conflicts, please run locally:

git checkout nv/qwen35-mtp-b200-v2
git fetch origin main
git rebase origin/main
# Resolve any conflicts, then:
git push --force-with-lease origin nv/qwen35-mtp-b200-v2

You can find more information and potential workarounds in the FAQ.


@functionstackx
Copy link
Copy Markdown
Contributor

@claude i am pretty sure u have perms to rebase... rebase plz and resolve conflicts

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented May 1, 2026

Claude finished @functionstackx's task in 10m 4s —— View job


Rebasing Branch onto Main

  • Fetch latest main and check for conflicts
  • Rebase nv/qwen35-mtp-b200-v2 onto origin/main (2 commits rebased, was 137 commits behind)
  • Resolve conflicts in perf-changelog.yaml and .github/configs/nvidia-master.yaml
  • Push rebased branch

Conflicts resolved:

  • perf-changelog.yaml: Kept all new entries from main, appended this PR's entries for qwen3.5-fp8-b200-sglang and qwen3.5-fp8-b200-sglang-mtp
  • .github/configs/nvidia-master.yaml: Adopted main's scenarios: fixed-seq-len: format while keeping the TP8 search-space points from this PR

Branch successfully rebased and force-pushed.
|

@Klaud-Cold Klaud-Cold force-pushed the nv/qwen35-mtp-b200-v2 branch from bfa4845 to 08dcd39 Compare May 1, 2026 21:02
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@hshrivastava-droid hshrivastava-droid changed the title [NV] qwen35 b200 MTP update sglang config [DO NOT MERGE][NV] qwen35 b200 MTP update sglang config May 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 2, 2026

@hshrivastava-droid hshrivastava-droid changed the title [DO NOT MERGE][NV] qwen35 b200 MTP update sglang config [NV] qwen35 b200 MTP update sglang config May 4, 2026
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm besides recipe PR plz

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

hshrivastava-droid commented May 4, 2026

Sglang cookbook receipe- sgl-project/sgl-cookbook#263
@functionstackx

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! lgtm, feel free to merge as long as you had validation pass

@hshrivastava-droid
Copy link
Copy Markdown
Collaborator Author

@hshrivastava-droid hshrivastava-droid merged commit a68d253 into main May 4, 2026
14 of 30 checks passed
@hshrivastava-droid hshrivastava-droid deleted the nv/qwen35-mtp-b200-v2 branch May 4, 2026 20:53
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

xiaohuguo2023 pushed a commit to xiaohuguo2023/InferenceX that referenced this pull request May 6, 2026
* update sglang config and image

* update configs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

6 participants