Skip to content

[NV] Update Qwen3.5 FP4 B200 SGLang#1018

Merged
Ankur-singh merged 1 commit into
mainfrom
nv/qwen3.5-fp4-b200-sglang
May 1, 2026
Merged

[NV] Update Qwen3.5 FP4 B200 SGLang#1018
Ankur-singh merged 1 commit into
mainfrom
nv/qwen3.5-fp4-b200-sglang

Conversation

@Ankur-singh
Copy link
Copy Markdown
Collaborator

@Ankur-singh Ankur-singh commented Apr 10, 2026

Summary

Update Qwen3.5 FP4 B200 SGLang benchmark configuration and launch script for improved throughput sweeps and tuned server parameters.

Config changes (.github/configs/nvidia-master.yaml)

  • Image update: lmsysorg/sglang:v0.5.10.post1-cu130lmsysorg/sglang:nightly-dev-20260422-de962f32
  • Search space: Add tp: 2, ep: 1, conc: 4-128 sweep for both 1k1k and 8k1k sequence lengths
  • Baseline: Keep tp: 4, ep: 1, conc: 4 as low-latency baseline

Script changes (benchmarks/single_node/qwen3.5_fp4_b200.sh)

  • Remove --max-running-requests $CONC (let SGLang auto-manage)
  • Reduce --max-prefill-tokens and --chunked-prefill-size from 81920 → 16384
  • Dynamic --scheduler-recv-interval: 30 when CONC > 4, 10 otherwise
  • Existing flags retained: --enable-symm-mem, --tokenizer-path, --mem-fraction-static 0.8, --stream-interval 50

Changelog (perf-changelog.yaml)

  • Updated entry for qwen3.5-fp4-b200-sglang reflecting all above changes

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@Ankur-singh Ankur-singh changed the title [WIP] Update Qwen3.5 FP4 B200 SGLang: add tp2 sweep, update server launch args [WIP] Update Qwen3.5 FP4 B200 SGLang Apr 10, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is marked [WIP] — deferring for human review once it's ready.

Extended reasoning...

Overview

The PR modifies two files: the nvidia-master.yaml CI config (adding a tp=2 search space entry and narrowing tp=4 to conc=4 only) and the qwen3.5_fp4_b200.sh benchmark launch script (removing deprecated env exports and conditional TP8 logic, updating server flags including --expert-parallel-size, --enable-symm-mem, --tokenizer-path, and tuning memory/prefill parameters).

Security Risks

No security risks. These are benchmark configuration and shell script changes with no auth, crypto, or data-access implications.

Level of Scrutiny

Low-risk changes — benchmark scripts and CI sweep configs are not production-critical paths. The changes are consistent in intent and follow established patterns in the repo.

Other Factors

The PR title includes [WIP], indicating the author considers it incomplete. No bugs were found. A human should confirm the PR is finished before merging.

Comment thread .github/configs/nvidia-master.yaml Outdated
search-space:
- { tp: 4, ep: 1, conc-start: 4, conc-end: 128 }
- { tp: 4, ep: 1, conc-start: 4, conc-end: 4 }
- { tp: 2, ep: 2, conc-start: 4, conc-end: 128 }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change:

  • { tp: 2, ep: 2, conc-start: 4, conc-end: 128 }
    to:
  • { tp: 2, ep: 1, conc-start: 4, conc-end: 128 }

Comment thread .github/configs/nvidia-master.yaml Outdated
search-space:
- { tp: 4, ep: 1, conc-start: 4, conc-end: 128 }
- { tp: 4, ep: 1, conc-start: 4, conc-end: 4 }
- { tp: 2, ep: 2, conc-start: 4, conc-end: 128 }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ tp: 2, ep: 2, conc-start: 4, conc-end: 128 }
to:
{ tp: 2, ep: 1, conc-start: 4, conc-end: 128 }

Comment thread benchmarks/single_node/qwen3.5_fp4_b200.sh
Comment thread .github/configs/nvidia-master.yaml
@hshrivastava-droid hshrivastava-droid changed the title [WIP] Update Qwen3.5 FP4 B200 SGLang [NV] Update Qwen3.5 FP4 B200 SGLang Apr 30, 2026
@hshrivastava-droid
Copy link
Copy Markdown
Collaborator

@claude - add PR description

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 30, 2026

Claude finished @hshrivastava-droid's task in 1m 3s —— View job


Adding PR Description

  • Gather context on PR changes
  • Write and update PR description

PR description has been added with a summary of all changes across the three modified files:

  • Config: image update + TP2 sweep search space
  • Script: tuned server launch flags (prefill, scheduler-recv-interval, removed max-running-requests)
  • Changelog: updated perf-changelog.yaml entry

Copy link
Copy Markdown

@gracehonv gracehonv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. ty for the recipe sgl-project/sgl-cookbook#264

Comment thread perf-changelog.yaml Outdated
Comment on lines -1349 to -1355

- config-keys:
- qwen3.5-fp4-mi355x-sglang
description:
- "TP2/TP4 seach space exploration for Qwen3.5 fp4 on SGL"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1022
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ankur-singh one small nit, dont remove whitespace plz or it will break

Copy link
Copy Markdown
Collaborator

@jgangani jgangani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Ankur-singh Ankur-singh force-pushed the nv/qwen3.5-fp4-b200-sglang branch from f5dd427 to ca9fc7d Compare May 1, 2026 20:55
@Ankur-singh Ankur-singh merged commit 9189c18 into main May 1, 2026
27 of 43 checks passed
@Ankur-singh Ankur-singh deleted the nv/qwen3.5-fp4-b200-sglang branch May 1, 2026 21:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

xiaohuguo2023 pushed a commit to xiaohuguo2023/InferenceX that referenced this pull request May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

7 participants