[NVIDIA] Relax previous restraints for using flashinfer gdn decode by kaixih · Pull Request #22818 · sgl-project/sglang

kaixih · 2026-04-14T18:13:44Z

The combination of --linear-attn-decode-backend flashinfer and --mamba-scheduler-strategy no_buffer was previously blocked with a hard ValueError in server_args.py. The root cause was a bug in the FlashInfer bf16 CuTe-DSL kernel (SM100+/Blackwell): CUDA graph decode uses padded batches where padding slots have initial_state_indices = -1, but the kernel had no guard for negative indices. Now the flashinfer side has fixed it (flashinfer-ai/flashinfer#2810, and later flashinfer-ai/flashinfer#2679).

Also added tests and got the results below on 8xB200:

============================================================
Qwen3.5-397B-A17B-NVFP4 Results Summary
Dataset: gsm8k
Baseline: 0.95
============================================================

Model 1: nvidia/Qwen3.5-397B-A17B-NVFP4
  Accuracy: PASS
  Score: 0.990

Model 2: nvidia/Qwen3.5-397B-A17B-NVFP4
  Accuracy: PASS
  Score: 0.990

Model 3: nvidia/Qwen3.5-397B-A17B-NVFP4
  Accuracy: PASS
  Score: 0.995

============================================================
OVERALL: ALL TESTS PASSED
============================================================

.
----------------------------------------------------------------------
Ran 1 test in 940.944s

OK

gemini-code-assist · 2026-04-14T18:13:49Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

kaixih · 2026-04-14T18:15:02Z

cc. @Fridge003 @hlu1

kaixih · 2026-04-14T18:29:44Z

Sorry, this has already fixed the issue: #21861.

Closing.

Relax previous restraints for using flashinfer gdn decode

4cf8d52

kaixih changed the title ~~Relax previous restraints for using flashinfer gdn decode~~ [NVIDIA] Relax previous restraints for using flashinfer gdn decode Apr 14, 2026

kaixih requested a review from Qiaolin-Yu April 14, 2026 18:18

kaixih closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Relax previous restraints for using flashinfer gdn decode#22818

[NVIDIA] Relax previous restraints for using flashinfer gdn decode#22818
kaixih wants to merge 1 commit intosgl-project:mainfrom
kaixih:improve_flashinfer_gdn_decode

kaixih commented Apr 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Apr 14, 2026

Uh oh!

kaixih commented Apr 14, 2026

Uh oh!

kaixih commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kaixih commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Apr 14, 2026

Uh oh!

kaixih commented Apr 14, 2026

Uh oh!

kaixih commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kaixih commented Apr 14, 2026 •

edited

Loading