Skip to content

Update Qwen3.5 B200 FP4 SGLang recipe#264

Open
faradawn wants to merge 1 commit into
sgl-project:mainfrom
faradawn:qwen35-b200-fp4-1018
Open

Update Qwen3.5 B200 FP4 SGLang recipe#264
faradawn wants to merge 1 commit into
sgl-project:mainfrom
faradawn:qwen35-b200-fp4-1018

Conversation

@faradawn
Copy link
Copy Markdown
Contributor

@faradawn faradawn commented May 1, 2026

Update the Qwen3.5 B200 FP4 SGLang recipe to match the latest validated benchmark — switch B200 FP4 to tp=2 mem=0.8, drop --fp4-gemm-backend and --max-running-requests, add --enable-symm-mem and --mamba-ssm-dtype bfloat16, lower prefill/chunked sizes to 16384, and bump --stream-interval to 50. Based on SemiAnalysisAI/InferenceX#1018.

Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant