[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm by Fridge003 · Pull Request #24268 · sgl-project/sglang

Fridge003 · 2026-05-02T09:04:43Z

Motivation

Ref:
sgl-project/DeepGEMM#26
https://pypi.org/project/sgl-deep-gemm/
#20745

Do the following one by one:

Remove deepgemm in CMakelist.txt of sglang-kernel && Bump kernel version chore: bump sgl-kernel version to 0.4.2.post1 #24457
Apply new wheel and new sglang-kernel at the same time to main branch (this PR)

We will build a single wheel for deepgemm in sglang, rather than compiling it with sglang-kernel

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request removes the DeepGEMM JIT build and installation logic from the CMake configuration. It also introduces a fix in the FP8 utilities to restore TMA-aligned strides for tensors where size-1 trailing dimensions have their strides collapsed during DLPack conversion in the DeepGEMM path. I have no feedback to provide.

Fridge003 · 2026-05-02T10:35:42Z

/tag-and-rerun-ci

Fridge003 · 2026-05-02T10:39:48Z

/tag-and-rerun-ci

…tage-b Switch all stage-c-test-* jobs' `wait-for-stage-b` dependency to `wait-for-stage-a` so stage-c does not block on stage-b completion. The final aggregator still requires `wait-for-stage-b`, so PR success gating is unchanged — only the start gate is relaxed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DeepGEMM's fp8_paged_mqa_logits asserts that context_lens [B, next_n] matches q [B, next_n, H, D] (csrc/apis/attention.hpp:355). q_fp8 in the indexer is unsqueeze(1)'d to [N_total, 1, H, D], so context_lens must also be [N_total, 1]. Switch the indexer reshape to unsqueeze(-1), matching the precompute path in nsa_backend.py. Verified end-to-end with test_dsa_models_mtp.py::TestDeepseekV32TPMTP (8x H200): 2 passed in 276s, gsm8k complete and bs=1 speed run reports acc_length=2.97 speed=177.06 tok/s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…24279) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py

…deep-gemm (sgl-project#24268)

Fridge003 added 2 commits May 2, 2026 01:57

fix tvm-ffi bug

4267eca

remove deepgemm from sgl-kernel

00a3d96

github-actions Bot added the sgl-kernel label May 2, 2026

upd

e241876

gemini-code-assist Bot reviewed May 2, 2026

View reviewed changes

Fridge003 changed the title ~~Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm~~ [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm May 2, 2026

upd

dbdcaa9

github-actions Bot added the dependencies Pull requests that update a dependency file label May 2, 2026

github-actions Bot added the run-ci label May 2, 2026

Fridge003 marked this pull request as ready for review May 2, 2026 10:39

Fridge003 requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg, HaiShaw, Ying1123, b8zhong, ch-wan, ispobock, merrymercy and yizhang2077 as code owners May 2, 2026 10:39

Fridge003 added 2 commits May 4, 2026 14:56

upd

f2475a0

Merge branch 'main' into dg-wheel

38e49e0

Fridge003 changed the title ~~[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm~~ [DO NOT MERGE] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm May 4, 2026

Fridge003 and others added 2 commits May 4, 2026 23:17

fix

22281bc

Fridge003 requested review from Kangyan-Zhou and bingxche as code owners May 5, 2026 22:52

Fridge003 requested review from Qiaolin-Yu, hebiao064 and hlu1 as code owners May 6, 2026 02:45

Fridge003 added 3 commits May 5, 2026 19:48

fix

146caaa

Merge branch 'main' into dg-wheel

5eccdb2

fix

1f728cb

Fridge003 added the high priority label May 6, 2026

Fridge003 changed the title ~~[DO NOT MERGE] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm~~ [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm May 6, 2026

fixes

17d2aeb

Fridge003 requested review from CatherineSue, JustinTong0323, ishandhanani, slin1237 and yctseng0211 as code owners May 6, 2026 06:56

Fridge003 mentioned this pull request May 6, 2026

[Feature] Refactor SGLang DeepGemm with tvm-ffi interfaces #20745

Open

4 tasks

Fridge003 and others added 4 commits May 6, 2026 13:03

Merge branch 'main' into dg-wheel

8c373ad

[CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (#…

7e9d937

…24279) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ci: bump test_mimo_models.py est_time 330 → 610 (#24551)

cfe02ab

small fix weight checker test group

d193764

Fridge003 requested a review from sundar24295s as a code owner May 6, 2026 21:45

Fridge003 added 2 commits May 6, 2026 15:04

Merge branch 'main' into dg-wheel

6fabeab

skip some tests

79725c4

github-actions Bot added the deepseek label May 7, 2026

Fridge003 merged commit ecb786c into main May 7, 2026
27 of 65 checks passed

Fridge003 deleted the dg-wheel branch May 7, 2026 01:59

LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026

[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-…

40ad58e

…deep-gemm (sgl-project#24268)

LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026

[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-…

1c4afbb

…deep-gemm (sgl-project#24268)

Fridge003 mentioned this pull request May 12, 2026

DeepSeek V4 Roadmap #23602

Open

33 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm#24268

[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm#24268
Fridge003 merged 25 commits into
mainfrom
dg-wheel

Fridge003 commented May 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Fridge003 commented May 2, 2026

Uh oh!

Fridge003 commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Fridge003 commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Fridge003 commented May 2, 2026

Uh oh!

Fridge003 commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fridge003 commented May 2, 2026 •

edited

Loading