[Fix] Fix gpt oss triton kernels and upgrade flashinfer back to 0.6.11.post1 by Fridge003 · Pull Request #25335 · sgl-project/sglang

Fridge003 · 2026-05-15T02:14:19Z

Motivation

co-author: @b8zhong @mmangkad
Modified upon #25312

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

This commit updates the sglang-kernel version across SGLang files to match the version defined in sgl-kernel/pyproject.toml. Files updated: - docker/Dockerfile - python/pyproject.toml - python/sglang/srt/entrypoints/engine.py 🤖 Generated with GitHub Actions

This reverts commit 22dfcda.

gemini-code-assist · 2026-05-15T02:14:22Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Fridge003 · 2026-05-15T02:16:37Z

/tag-and-rerun-ci

Fridge003 · 2026-05-15T02:18:05Z

/rerun-test test_gpt_oss_4gpu.py

github-actions · 2026-05-15T02:18:23Z

🚀 4-gpu-h100 (1 test): ✅ View workflow run

cd test/ && python3 registered/4-gpu-models/test_gpt_oss_4gpu.py

mmangkad · 2026-05-15T02:18:49Z

@Fridge003 what about https://github.com/flashinfer-ai/flashinfer/releases/tag/v0.6.11.post2 with some more bugfixes on FI side?

Fridge003 · 2026-05-15T02:19:34Z

@mmangkad we can open another PR for that

This reverts commit 1913cb4.

Fridge003 · 2026-05-15T07:02:40Z

/rerun-test test_gpt_oss_4gpu.py

github-actions · 2026-05-15T07:03:18Z

🚀 4-gpu-h100 (1 test): ✅ View workflow run

cd test/ && python3 registered/4-gpu-models/test_gpt_oss_4gpu.py

…1.post1 (#25335) Co-authored-by: sglang-bot <sglang-bot@users.noreply.github.com> Co-authored-by: b8zhong <b8zhong@users.noreply.github.com> Co-authored-by: mmangkad <mmangkad@users.noreply.github.com>

…2c1034 Two findings appended to the bisect report: 1. PR sgl-project#25335 ("Fix gpt oss triton kernels and upgrade flashinfer back to 0.6.11.post1") re-bumped flashinfer past PR sgl-project#25310's revert. The one-line fix in fp4_utils.py:22 (cute-dsl -> cuda) is therefore no longer sufficient on latest main: experiment G reproduces the strict cuda-side check from fp4Quantize.cpp:64 ("globalScale should have shape [1] or [num_tokens]"), identical to experiment C. The proper fix is now at the call site in compressed_tensors_w4a4_nvfp4_moe.py:315: collapse layer.w13_input_scale_quant (shape [num_experts]) to scalar [1] or per-token [num_tokens] before passing as global_scale. 2. The TP8+MTP variant has its own separate pre-existing regression, bisected to d2c1034 ("[Gemma 4] Adding MTP support", PR sgl-project#24436). That PR added _resolve_speculative_algorithm_alias in server_args.py:318-342 which unconditionally calls AutoConfig.from_pretrained on the draft path to detect Gemma4 drafts. It crashes on any draft in Mistral native format (params.json, no HF config.json), even when --speculative-algorithm is already explicit EAGLE. Empirical proof for (2): - d2c1034 + TP8+MTP-only test: FAIL with "Unrecognized model in ...Eagle. Should have a model_type key in its config.json", total wall time 60.7s (crashes before model load). - f1395af (parent of d2c1034) + same test: PASS, gsm8k 0.949. Both with flashinfer 0.6.8.post1, sglang-kernel 0.4.2.post1+cu130, torch 2.11.0+cu130, SGLANG_IS_IN_CI=true, SGLANG_ENABLE_JIT_DEEPGEMM=0, SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1. Minimal fix for (2): wrap the AutoConfig.from_pretrained call in _resolve_speculative_algorithm_alias with try/except, or short-circuit when speculative_algorithm is already explicit and the user did not request NEXTN aliasing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sglang-bot and others added 3 commits May 15, 2026 02:10

mor

55ffba3

Revert "revert flashinfer 0.6.11 bumps (#25310)"

be5922c

This reverts commit 22dfcda.

Fridge003 requested review from AniZpZ, BBuf, CatherineSue, Edwardf0t1, FlamingoPg, HaiShaw, JustinTong0323, Ying1123, ch-wan, ishandhanani, ispobock, merrymercy, slin1237 and yctseng0211 as code owners May 15, 2026 02:14

github-actions Bot added the dependencies Pull requests that update a dependency file label May 15, 2026

Fridge003 changed the title ~~[Fix] Fix gpt oss triton kernels and upgrade flashinfer back to 0.6.11.post2~~ [Fix] Fix gpt oss triton kernels and upgrade flashinfer back to 0.6.11.post1 May 15, 2026

Fridge003 added the high priority label May 15, 2026

github-actions Bot added the run-ci label May 15, 2026

b8zhong approved these changes May 15, 2026

View reviewed changes

Revert "Skip CI tests added in #24816 (broken on main) (#25329)"

c61ceea

This reverts commit 1913cb4.

github-actions Bot added the deepseek label May 15, 2026

Fridge003 added the run-ci-extra label May 15, 2026

upd

35459b9

Fridge003 merged commit 0c19540 into main May 15, 2026
123 of 139 checks passed

Fridge003 deleted the fix_flashinfer branch May 15, 2026 08:04

AliceChenyy mentioned this pull request May 18, 2026

feat: SM120 (Blackwell Desktop) support for DeepSeek-V4 inference #24692

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Fix gpt oss triton kernels and upgrade flashinfer back to 0.6.11.post1#25335

[Fix] Fix gpt oss triton kernels and upgrade flashinfer back to 0.6.11.post1#25335
Fridge003 merged 5 commits into
mainfrom
fix_flashinfer

Fridge003 commented May 15, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

mmangkad commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Fridge003 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mmangkad commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

Fridge003 commented May 15, 2026

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fridge003 commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading