[Dependency] Flashinfer 0.6.8post1 -> 0.6.11 by b8zhong · Pull Request #24452 · sgl-project/sglang

b8zhong · 2026-05-05T22:56:44Z

Commits of interest:

https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.6.10
https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.6.9

Note: if this is not cherry-picked, maybe just use 0.6.11, because it will break main and bad performance of this features

Confirming not falling back from TRTLLM allreduce fusio

n:

CUDA_VISIBLE_DEVICES=4,5,6,7 pytest test/registered/4-gpu-models/test_qwen3_30b.py
========================================================================= test session starts ==========================================================================
platform linux -- Python 3.12.3, pytest-9.0.3, pluggy-1.6.0
rootdir: /sgl-workspace/sglang/test
configfile: pytest.ini
plugins: anyio-4.13.0, typeguard-4.5.1
collected 2 items                                                                                                                                                      

test/registered/4-gpu-models/test_qwen3_30b.py ..                                                                                                                [100%]

=========================================================================== warnings summary ===========================================================================
../../usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1434
  /usr/local/lib/python3.12/dist-packages/_pytest/config/__init__.py:1434: PytestConfigWarning: Unknown config option: asyncio_mode
  
    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================== 2 passed, 3 warnings in 244.72s (0:04:04) ===============================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

gemini-code-assist · 2026-05-05T22:56:48Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

b8zhong · 2026-05-06T16:18:29Z

It's related to due hang in Flashinfer allreduce fusion. This test can pass with --enforce-disable-flashinfer-allreduce-fusion

b8zhong · 2026-05-06T16:24:36Z

/rerun-stage stage-c-test-4-gpu-h100

github-actions · 2026-05-06T16:25:26Z

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies). View workflow run

b8zhong · 2026-05-06T16:27:29Z

/rerun-stage stage-c-test-8-gpu-h200

github-actions · 2026-05-06T16:28:01Z

✅ Triggered stage-c-test-4-gpu-h100 to run independently (skipping dependencies). View workflow run

github-actions · 2026-05-06T16:28:14Z

✅ Triggered stage-c-test-8-gpu-h200 to run independently (skipping dependencies). View workflow run

## 📌 Description Caused by #2955 Currently, it's causing a bug in SGLang. in missing `group=` parameter, (with scenario of 4 devices and world size = 2), the rendezvous will expect all 4 to respond, and cause a hang in warmup.  ## 🔍 Related Issues #2955  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes sgl-project/sglang#24452  ## Summary by CodeRabbit ## Release Notes * **New Features** * Added optional process group parameter for AllReduce fusion on TRTLLM backends, enabling users to configure symmetric memory rendezvous behavior. * **Documentation** * Updated documentation to describe the new parameter and its default behavior.

b8zhong · 2026-05-07T15:02:46Z

Thanks Alex @aleozlx . Just bumped it, I think we should be good to go after NV CI passes.

Swipe4057 · 2026-05-09T12:09:10Z

https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.6.11

b8zhong · 2026-05-12T13:52:10Z

github.com/sgl-project/sglang/actions/runs/25711744448/job/75567004924?pr=24452 H20 is failing on main

Fridge003 · 2026-05-12T21:37:36Z

@b8zhong H20 failing test is unrelated. Let me merge this PR

Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>

Root cause: d5f3254 (sgl-project#24452, Flashinfer 0.6.8.post1 -> 0.6.11) introduced a strict shape check on `globalScale` in `flashinfer.fp4_quantize` that rejects the per-expert tensor sglang's `compressed_tensors_w4a4_nvfp4_moe.apply_weights` passes at line 315. Confirms regression by local A/B: 13afe8a (flashinfer 0.6.8.post1, sgl-kernel 0.4.1.post1+cu130, torch 2.9.1+cu130) passes with gsm8k 0.951; 34c0029 (flashinfer 0.6.11.post1, torch 2.11.0+cu130) fails with "RuntimeError: shape '[1]' is invalid for input of size 128" in nvfp4_quantize_cute_dsl. Patching fp4_utils.py to backend="cuda" reveals the underlying flashinfer-side assertion ("globalScale should have shape [1] or [num_tokens]"), confirming the kernel - not the cute-dsl wrapper - is the true gate. Refutes the previous session's hypothesis (51a9403, 28758d3): 51a9403 is a patch-version bump (0.6.11 -> 0.6.11.post1) with no relevant kernel change; 28758d3 is SM90+MXFP4 only and never runs on B200 NVFP4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

b8zhong requested review from CatherineSue, ispobock and slin1237 as code owners May 5, 2026 22:56

b8zhong added the run-ci label May 5, 2026

b8zhong requested review from Fridge003, HaiShaw, JustinTong0323, ishandhanani, merrymercy and yctseng0211 as code owners May 5, 2026 22:56

b8zhong changed the title ~~[Dependency] Flashinfer 0.6.10~~ [Dependency] Flashinfer 0.6.8post1 -> 0.6.10 May 5, 2026

github-actions Bot added the dependencies Pull requests that update a dependency file label May 5, 2026

b8zhong self-assigned this May 6, 2026

b8zhong requested review from BBuf, Edwardf0t1, Ying1123 and ch-wan as code owners May 6, 2026 18:06

b8zhong mentioned this pull request May 6, 2026

fix hang in allreduce comms in SGL flashinfer-ai/flashinfer#3247

Merged

5 tasks

b8zhong marked this pull request as draft May 7, 2026 04:06

b8zhong added the high priority label May 7, 2026

b8zhong marked this pull request as ready for review May 7, 2026 11:06

b8zhong changed the title ~~[Dependency] Flashinfer 0.6.8post1 -> 0.6.10~~ [Dependency] Flashinfer 0.6.8post1 -> 0.6.10post1 May 7, 2026

b8zhong added bypass-fastfail labels May 7, 2026

b8zhong changed the title ~~[Dependency] Flashinfer 0.6.8post1 -> 0.6.10post1~~ [Dependency] Flashinfer 0.6.8post1 -> 0.6.11 May 9, 2026

b8zhong changed the title ~~[Dependency] Flashinfer 0.6.8post1 -> 0.6.11~~ [Dependency] Flashinfer 0.6.8post1 -> 0.6.10.post1 May 9, 2026

b8zhong changed the title ~~[Dependency] Flashinfer 0.6.8post1 -> 0.6.10.post1~~ [Dependency] Flashinfer 0.6.8post1 -> 0.6.11 May 9, 2026

This was referenced May 10, 2026

[FlashInfer v0.6.11] [RL] Support FlashInfer per-token NVFP4 MoE #22918

Merged

[FlashInfer v0.6.10] [RL] [DSv32] [GLM-5] Add --dsa-topk-backend and integrate FlashInfer and pytorch topk #22851

Open

yuan-luo mentioned this pull request May 10, 2026

Add FlashInfer SM90 cutlass MXFP4 MoE backend (W4A16) for GPT-OSS + DeepSeek-V4 #24816

Merged

b8zhong requested review from AniZpZ and FlamingoPg as code owners May 11, 2026 23:49

b8zhong and others added 7 commits May 11, 2026 19:58

more

cca1713

more

8980765

mor

cfb2bce

more

6629b9b

more

67f8ddb

more

0002db5

fixx api change from fi

3760bb6

b8zhong force-pushed the dep/flashinfer-0610 branch from 5d0e28a to 3760bb6 Compare May 11, 2026 23:58

b8zhong added 2 commits May 12, 2026 03:32

more

7bcdf7a

more

1095152

Fridge003 merged commit d5f3254 into main May 12, 2026
249 of 273 checks passed

Fridge003 deleted the dep/flashinfer-0610 branch May 12, 2026 21:38

xjpang pushed a commit to xjpang/sglang that referenced this pull request May 13, 2026

[Dependency] Flashinfer 0.6.8post1 -> 0.6.11 (sgl-project#24452)

ce183cb

Co-authored-by: b8zhong <b8zhong@users.noreply.github.com>

hnyls2002 mentioned this pull request May 14, 2026

revert flashinfer 0.6.11 bumps #25310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dependency] Flashinfer 0.6.8post1 -> 0.6.11#24452

[Dependency] Flashinfer 0.6.8post1 -> 0.6.11#24452
Fridge003 merged 9 commits into
mainfrom
dep/flashinfer-0610

b8zhong commented May 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 5, 2026

Uh oh!

b8zhong commented May 6, 2026 •

edited

Loading

Uh oh!

b8zhong commented May 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

b8zhong commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

b8zhong commented May 7, 2026

Uh oh!

Swipe4057 commented May 9, 2026

Uh oh!

b8zhong commented May 12, 2026 •

edited

Loading

Uh oh!

Fridge003 commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

b8zhong commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented May 5, 2026

Uh oh!

b8zhong commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

b8zhong commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

b8zhong commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

b8zhong commented May 7, 2026

Uh oh!

Swipe4057 commented May 9, 2026

Uh oh!

b8zhong commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fridge003 commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

b8zhong commented May 5, 2026 •

edited

Loading

b8zhong commented May 6, 2026 •

edited

Loading

b8zhong commented May 6, 2026 •

edited

Loading

b8zhong commented May 12, 2026 •

edited

Loading