Qwen3next flashinfer allreduce auto enable by BBuf · Pull Request #22664 · sgl-project/sglang

BBuf · 2026-04-13T05:31:03Z

Made with @codex

Summary

Enable FlashInfer allreduce fusion by default for Qwen3NextForCausalLM on supported single-node SM90/SM100 TP runs.

Why

Qwen/Qwen3-Coder-Next was running with enable_flashinfer_allreduce_fusion=false on H100, and profiler traces showed prefill time dominated by unfused cross-device reduce kernels.

Change

add Qwen3NextForCausalLM to the existing FlashInfer allreduce auto-enable whitelist

H100 Evidence

Model: Qwen/Qwen3-Coder-Next

Command:

python -m sglang.launch_server --model-path Qwen/Qwen3-Coder-Next --tp 4 --port 31080

Server args:

baseline: enable_flashinfer_allreduce_fusion=false
patch: enable_flashinfer_allreduce_fusion=true

Benchmark (sglang.bench_serving, random 2048/256, 128 prompts, max_concurrency=32):

Metric	Baseline	Patch	Delta
Request throughput (req/s)	5.49	9.41	+71.4%
Mean TTFT (ms)	456.24	167.54	-63.3%
Mean TPOT (ms)	50.41	25.49	-49.4%

Full accuracy:

Eval	Baseline	Patch	Delta
MMLU (14042)	0.8745905	0.8714571	-0.31 pp
GSM8K (1314)	0.9627093	0.9687976	+0.61 pp

Profiler (TP-0 EXTEND):

baseline: cross_device_reduce_2stage 136.171 ms (21.89%)
patch: allreduce_fusion_kernel_oneshot_lamport 57.661 ms (10.41%)

This change activates the fused allreduce path and removes the previous dominant unfused reduce hotspot.

Validation

H100 before/after throughput benchmark
H100 before/after sglang.profiler
H100 before/after full MMLU and full GSM8K server eval

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-13T05:31:06Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

chatgpt-codex-connector · 2026-04-13T05:31:15Z

To use Codex here, create an environment for this repo.

BBuf · 2026-04-13T11:37:57Z

/tag-and-rerun-ci

BBuf · 2026-04-13T15:02:38Z

/rerun-failed-ci

ispobock · 2026-04-16T15:17:18Z

/rerun-test test_qwen3_next_models.py test_qwen3_next_models_mtp.py

github-actions · 2026-04-16T15:17:51Z

✅ 4-gpu-h100 (2 tests): View workflow run

cd test/ && python3 registered/4-gpu-models/test_qwen3_next_models.py
cd test/ && python3 registered/4-gpu-models/test_qwen3_next_models_mtp.py

ispobock · 2026-04-16T15:18:14Z

cc: @yizhang2077

Bisects nightly-test-general-4-gpu-h100 :: TestFlashInferDeterministic.test_prefix_with_logprobs to commit c6a45fa / PR sgl-project#22664 'Qwen3next flashinfer allreduce auto enable'. Last pass: 9c47bba (2026-04-18). First fail: 2a327f0 (2026-04-19). Still failing on main as of 2026-05-07.

Run TestFlashInferDeterministic.test_prefix_with_logprobs at parent commit 4839cec: PASS (118.4s, '+++ identical across all batch sizes'). Run same test at c6a45fa (PR sgl-project#22664): FAIL (102.6s, 244 per-sample mismatches with the same -2.355271339416504 vs -2.3723394870758057 fingerprint seen in CI run 24971499389).

BBuf added 2 commits April 13, 2026 09:32

server: auto-enable flashinfer allreduce for qwen3next

7b95e92

Drop Qwen3 Next server args test coverage from PR

f2176f6

github-actions Bot added the run-ci label Apr 13, 2026

ispobock approved these changes Apr 16, 2026

View reviewed changes

yizhang2077 approved these changes Apr 16, 2026

View reviewed changes

BBuf merged commit c6a45fa into main Apr 18, 2026
220 of 263 checks passed

BBuf deleted the bbuf/qwen3next-flashinfer-allreduce-auto-enable branch April 18, 2026 14:32

jmamou pushed a commit to jmamou/sglang that referenced this pull request Apr 20, 2026

Qwen3next flashinfer allreduce auto enable (sgl-project#22664)

90615b2

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

Qwen3next flashinfer allreduce auto enable (sgl-project#22664)

627a6ae

kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026

Qwen3next flashinfer allreduce auto enable (sgl-project#22664)

661dbb2

timzsu mentioned this pull request Apr 27, 2026

[RFC]: Kernel Optimization for Diffusion DiT and MoE LLM vllm-project/vllm-omni#3186

Open

1 task

BBuf mentioned this pull request Apr 29, 2026

SGLang AI Agent Performance Optimization PRs (2026-01-29 to 2026-04-29) BBuf/AI-Infra-Auto-Driven-SKILLS#46

Open

Jiminator mentioned this pull request May 8, 2026

[Fix] Disable FlashInfer allreduce fusion under deterministic inference #24629

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3next flashinfer allreduce auto enable#22664

Qwen3next flashinfer allreduce auto enable#22664
BBuf merged 2 commits into
mainfrom
bbuf/qwen3next-flashinfer-allreduce-auto-enable

BBuf commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

BBuf commented Apr 13, 2026

Uh oh!

BBuf commented Apr 13, 2026

Uh oh!

ispobock commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

ispobock commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BBuf commented Apr 13, 2026

Summary

Why

Change

H100 Evidence

Validation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

BBuf commented Apr 13, 2026

Uh oh!

BBuf commented Apr 13, 2026

Uh oh!

ispobock commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

ispobock commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants