Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request removes the DeepGEMM JIT build and installation logic from the CMake configuration. It also introduces a fix in the FP8 utilities to restore TMA-aligned strides for tensors where size-1 trailing dimensions have their strides collapsed during DLPack conversion in the DeepGEMM path. I have no feedback to provide.
Collaborator
Author
|
/tag-and-rerun-ci |
Collaborator
Author
|
/tag-and-rerun-ci |
…tage-b Switch all stage-c-test-* jobs' `wait-for-stage-b` dependency to `wait-for-stage-a` so stage-c does not block on stage-b completion. The final aggregator still requires `wait-for-stage-b`, so PR success gating is unchanged — only the start gate is relaxed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DeepGEMM's fp8_paged_mqa_logits asserts that context_lens [B, next_n] matches q [B, next_n, H, D] (csrc/apis/attention.hpp:355). q_fp8 in the indexer is unsqueeze(1)'d to [N_total, 1, H, D], so context_lens must also be [N_total, 1]. Switch the indexer reshape to unsqueeze(-1), matching the precompute path in nsa_backend.py. Verified end-to-end with test_dsa_models_mtp.py::TestDeepseekV32TPMTP (8x H200): 2 passed in 276s, gsm8k complete and bs=1 speed run reports acc_length=2.97 speed=177.06 tok/s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
…24279) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ltcs11
added a commit
to ltcs11/sglang
that referenced
this pull request
May 7, 2026
* main: (894 commits) [Bug Fix] Fix RunAI streamer: corrupted weights, missing quant init, and broken URIs for multimodal models (sgl-project#22715) [Kernel] Deprecate DeepGemm in sgl kernel and apply custom wheel sgl-deep-gemm (sgl-project#24268) propagate pytest exit code from test __main__ entries (sgl-project#24487) [R3] Avoid implicit CUDA sync in routed experts DP slicing (sgl-project#24550) Add ChatCompletionRequest-style support to /v1/tokenize (sgl-project#23981) Support Triton MLA FP8 KV cache (sgl-project#20479) [diffusion] chore: align LTX-2 with official (sgl-project#24313) Expand support matrix for pypi wheel release (sgl-project#24565) [codex] Optimize Z-Image packed QKV (sgl-project#24117) [Misc] Fix breaking weight checker test (sgl-project#24553) [LoRA] Fix qkv_proj LoRA buffer sizing when tp_size > num_key_value_heads (sgl-project#24420) ci: bump test_mimo_models.py est_time 330 → 610 (sgl-project#24551) [CI] Temporarily disable marco/mcdse-2b-v1 in test_embedding_models (sgl-project#24279) Improve metrics, observability, and PD deploy tooling (sgl-project#24521) Fix diffusion fallback guards and validation (sgl-project#23335) [PD] Prevent update_status to Failed from cleared entries (sgl-project#24539) [CP] Register KV cache allgather buffer with symmetric memory (sgl-project#24040) Support getting checksums in weight checker (sgl-project#24537) Refactor buffer patterns in weight checker (sgl-project#24538) Add unit and end-to-end tests for weight checker (sgl-project#24536) ... # Conflicts: # python/sglang/srt/managers/scheduler.py # python/sglang/srt/model_executor/model_runner.py
LLThomas
pushed a commit
to LLThomas/sglang
that referenced
this pull request
May 8, 2026
LucQueen
pushed a commit
to LucQueen/sglang
that referenced
this pull request
May 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Ref:
sgl-project/DeepGEMM#26
https://pypi.org/project/sgl-deep-gemm/
#20745
Do the following one by one:
We will build a single wheel for deepgemm in sglang, rather than compiling it with sglang-kernel
Modifications
Accuracy Tests
Speed Tests and Profiling
Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ci