Skip to content

[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm#24268

Merged
Fridge003 merged 25 commits into
mainfrom
dg-wheel
May 7, 2026
Merged

[Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm#24268
Fridge003 merged 25 commits into
mainfrom
dg-wheel

Conversation

@Fridge003
Copy link
Copy Markdown
Collaborator

@Fridge003 Fridge003 commented May 2, 2026

Motivation

Ref:
sgl-project/DeepGEMM#26
https://pypi.org/project/sgl-deep-gemm/
#20745

Do the following one by one:

We will build a single wheel for deepgemm in sglang, rather than compiling it with sglang-kernel

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the DeepGEMM JIT build and installation logic from the CMake configuration. It also introduces a fix in the FP8 utilities to restore TMA-aligned strides for tensors where size-1 trailing dimensions have their strides collapsed during DLPack conversion in the DeepGEMM path. I have no feedback to provide.

@Fridge003 Fridge003 changed the title Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm May 2, 2026
@github-actions github-actions Bot added the dependencies Pull requests that update a dependency file label May 2, 2026
@Fridge003
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@Fridge003
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@Fridge003 Fridge003 changed the title [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm [DO NOT MERGE] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm May 4, 2026
Fridge003 and others added 2 commits May 4, 2026 23:17
…tage-b

Switch all stage-c-test-* jobs' `wait-for-stage-b` dependency to
`wait-for-stage-a` so stage-c does not block on stage-b completion.
The final aggregator still requires `wait-for-stage-b`, so PR success
gating is unchanged — only the start gate is relaxed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeepGEMM's fp8_paged_mqa_logits asserts that context_lens [B, next_n]
matches q [B, next_n, H, D] (csrc/apis/attention.hpp:355). q_fp8 in the
indexer is unsqueeze(1)'d to [N_total, 1, H, D], so context_lens must
also be [N_total, 1]. Switch the indexer reshape to unsqueeze(-1),
matching the precompute path in nsa_backend.py.

Verified end-to-end with test_dsa_models_mtp.py::TestDeepseekV32TPMTP
(8x H200): 2 passed in 276s, gsm8k complete and bs=1 speed run reports
acc_length=2.97 speed=177.06 tok/s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Fridge003 Fridge003 changed the title [DO NOT MERGE] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm May 6, 2026
@Fridge003 Fridge003 requested a review from sundar24295s as a code owner May 6, 2026 21:45
@Fridge003 Fridge003 merged commit ecb786c into main May 7, 2026
27 of 65 checks passed
@Fridge003 Fridge003 deleted the dg-wheel branch May 7, 2026 01:59
ltcs11 added a commit to ltcs11/sglang that referenced this pull request May 7, 2026
* main: (894 commits)
  [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715)
  [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268)
  propagate pytest exit code from test __main__ entries (sgl-project#24487)
  [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550)
  Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981)
  Support Triton MLA FP8 KV cache (sgl-project#20479)
  [diffusion] chore: align LTX-2 with official (sgl-project#24313)
  Expand support matrix for pypi wheel release (sgl-project#24565)
  [codex] Optimize Z-Image packed QKV (sgl-project#24117)
  [Misc] Fix breaking weight checker test (sgl-project#24553)
  [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420)
  ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551)
  [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279)
  Improve metrics, observability, and PD deploy tooling (sgl-project#24521)
  Fix diffusion fallback guards and validation (sgl-project#23335)
  [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539)
  [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040)
  Support getting checksums in weight checker (sgl-project#24537)
  Refactor buffer patterns in weight checker (sgl-project#24538)
  Add unit and end-to-end tests for weight checker (sgl-project#24536)
  ...

# Conflicts:
#	python/sglang/srt/managers/scheduler.py
#	python/sglang/srt/model_executor/model_runner.py
LLThomas pushed a commit to LLThomas/sglang that referenced this pull request May 8, 2026
LucQueen pushed a commit to LucQueen/sglang that referenced this pull request May 12, 2026
@Fridge003 Fridge003 mentioned this pull request May 12, 2026
33 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deepseek dependencies Pull requests that update a dependency file high priority run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants