[NV] Update Qwen3.5 FP4 B200 SGLang by Ankur-singh · Pull Request #1018 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-04-10T05:03:18Z

Summary

Update Qwen3.5 FP4 B200 SGLang benchmark configuration and launch script for improved throughput sweeps and tuned server parameters.

Config changes (`.github/configs/nvidia-master.yaml`)

Image update: lmsysorg/sglang:v0.5.10.post1-cu130 → lmsysorg/sglang:nightly-dev-20260422-de962f32
Search space: Add tp: 2, ep: 1, conc: 4-128 sweep for both 1k1k and 8k1k sequence lengths
Baseline: Keep tp: 4, ep: 1, conc: 4 as low-latency baseline

Script changes (`benchmarks/single_node/qwen3.5_fp4_b200.sh`)

Remove --max-running-requests $CONC (let SGLang auto-manage)
Reduce --max-prefill-tokens and --chunked-prefill-size from 81920 → 16384
Dynamic --scheduler-recv-interval: 30 when CONC > 4, 10 otherwise
Existing flags retained: --enable-symm-mem, --tokenizer-path, --mem-fraction-static 0.8, --stream-interval 50

Changelog (`perf-changelog.yaml`)

Updated entry for qwen3.5-fp4-b200-sglang reflecting all above changes

github-actions · 2026-04-10T05:03:26Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

claude

This PR is marked [WIP] — deferring for human review once it's ready.

Extended reasoning...

Overview

The PR modifies two files: the nvidia-master.yaml CI config (adding a tp=2 search space entry and narrowing tp=4 to conc=4 only) and the qwen3.5_fp4_b200.sh benchmark launch script (removing deprecated env exports and conditional TP8 logic, updating server flags including --expert-parallel-size, --enable-symm-mem, --tokenizer-path, and tuning memory/prefill parameters).

Security Risks

No security risks. These are benchmark configuration and shell script changes with no auth, crypto, or data-access implications.

Level of Scrutiny

Low-risk changes — benchmark scripts and CI sweep configs are not production-critical paths. The changes are consistent in intent and follow established patterns in the repo.

Other Factors

The PR title includes [WIP], indicating the author considers it incomplete. No bugs were found. A human should confirm the PR is finished before merging.

gracehonv · 2026-04-29T20:42:37Z

    search-space:
-    - { tp: 4, ep: 1, conc-start: 4, conc-end: 128 }
+    - { tp: 4, ep: 1, conc-start: 4, conc-end: 4 }
+    - { tp: 2, ep: 2, conc-start: 4, conc-end: 128 }


Please change:

{ tp: 2, ep: 2, conc-start: 4, conc-end: 128 }
to:

{ tp: 2, ep: 1, conc-start: 4, conc-end: 128 }

gracehonv · 2026-04-29T20:42:49Z

    search-space:
-    - { tp: 4, ep: 1, conc-start: 4, conc-end: 128 }
+    - { tp: 4, ep: 1, conc-start: 4, conc-end: 4 }
+    - { tp: 2, ep: 2, conc-start: 4, conc-end: 128 }


{ tp: 2, ep: 2, conc-start: 4, conc-end: 128 }
to:
{ tp: 2, ep: 1, conc-start: 4, conc-end: 128 }

hshrivastava-droid · 2026-04-30T23:46:35Z

@claude - add PR description

Klaud-Cold · 2026-04-30T23:47:00Z

Claude finished @hshrivastava-droid's task in 1m 3s —— View job

Adding PR Description

Gather context on PR changes
Write and update PR description

PR description has been added with a summary of all changes across the three modified files:

Config: image update + TP2 sweep search space
Script: tuned server launch flags (prefill, scheduler-recv-interval, removed max-running-requests)
Changelog: updated perf-changelog.yaml entry

gracehonv

LGTM

functionstackx

lgtm. ty for the recipe sgl-project/sgl-cookbook#264

functionstackx · 2026-05-01T18:50:50Z

-  
+
 - config-keys:
    - qwen3.5-fp4-mi355x-sglang
  description:
    - "TP2/TP4 seach space exploration for Qwen3.5 fp4 on SGL"
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1022
-  


@Ankur-singh one small nit, dont remove whitespace plz or it will break

jgangani

LGTM

github-actions · 2026-05-01T21:05:43Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=25232812945
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=25232812945

Ankur-singh requested a review from a team April 10, 2026 05:03

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners April 10, 2026 05:03

github-project-automation Bot added this to InferenceMAX Board Apr 10, 2026

Ankur-singh changed the title ~~[WIP] Update Qwen3.5 FP4 B200 SGLang: add tp2 sweep, update server launch args~~ [WIP] Update Qwen3.5 FP4 B200 SGLang Apr 10, 2026

Ankur-singh added the sweep-enabled label Apr 10, 2026

claude Bot reviewed Apr 10, 2026

View reviewed changes

gracehonv reviewed Apr 29, 2026

View reviewed changes

Comment thread benchmarks/single_node/qwen3.5_fp4_b200.sh

gracehonv reviewed Apr 29, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml

hshrivastava-droid changed the title ~~[WIP] Update Qwen3.5 FP4 B200 SGLang~~ [NV] Update Qwen3.5 FP4 B200 SGLang Apr 30, 2026

gracehonv approved these changes Apr 30, 2026

View reviewed changes

faradawn mentioned this pull request May 1, 2026

Update Qwen3.5 B200 FP4 SGLang recipe sgl-project/sgl-cookbook#264

Open

kedarpotdar-nv approved these changes May 1, 2026

View reviewed changes

functionstackx approved these changes May 1, 2026

View reviewed changes

functionstackx reviewed May 1, 2026

View reviewed changes

functionstackx added approved and removed sweep-enabled labels May 1, 2026

jgangani approved these changes May 1, 2026

View reviewed changes

bench: update Qwen3.5 FP4 B200 SGLang

ca9fc7d

Ankur-singh force-pushed the nv/qwen3.5-fp4-b200-sglang branch from f5dd427 to ca9fc7d Compare May 1, 2026 20:55

Ankur-singh added the full-sweep-enabled label May 1, 2026

Ankur-singh merged commit 9189c18 into main May 1, 2026
27 of 43 checks passed

Ankur-singh deleted the nv/qwen3.5-fp4-b200-sglang branch May 1, 2026 21:03

github-project-automation Bot moved this to Done in InferenceMAX Board May 1, 2026

xiaohuguo2023 pushed a commit to xiaohuguo2023/InferenceX that referenced this pull request May 6, 2026

Update Qwen3.5 FP4 B200 SGLang (SemiAnalysisAI#1018)

d030fe1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Update Qwen3.5 FP4 B200 SGLang#1018

[NV] Update Qwen3.5 FP4 B200 SGLang#1018
Ankur-singh merged 1 commit into
mainfrom
nv/qwen3.5-fp4-b200-sglang

Ankur-singh commented Apr 10, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gracehonv Apr 29, 2026

Uh oh!

gracehonv Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

hshrivastava-droid commented Apr 30, 2026

Uh oh!

Klaud-Cold commented Apr 30, 2026 •

edited

Loading

Uh oh!

gracehonv left a comment

Uh oh!

functionstackx left a comment

Uh oh!

functionstackx May 1, 2026

Uh oh!

jgangani left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Ankur-singh commented Apr 10, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Config changes (.github/configs/nvidia-master.yaml)

Script changes (benchmarks/single_node/qwen3.5_fp4_b200.sh)

Changelog (perf-changelog.yaml)

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security Risks

Level of Scrutiny

Other Factors

Uh oh!

gracehonv Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gracehonv Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hshrivastava-droid commented Apr 30, 2026

Uh oh!

Klaud-Cold commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

gracehonv left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx May 1, 2026

Choose a reason for hiding this comment

Uh oh!

jgangani left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Ankur-singh commented Apr 10, 2026 •

edited by Klaud-Cold

Loading

Config changes (`.github/configs/nvidia-master.yaml`)

Script changes (`benchmarks/single_node/qwen3.5_fp4_b200.sh`)

Changelog (`perf-changelog.yaml`)

Klaud-Cold commented Apr 30, 2026 •

edited

Loading