deepgemm update enhance groupgemm #6893

zhangxiaolei123456 · 2025-06-05T09:20:59Z

Motivation

deepseek-ai/DeepGEMM@8dfa329

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Hello @zhangxiaolei123456, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by @zhangxiaolei123456, appears to focus on updating the integration with the DeepGEMM library. Based on the title and the code changes, the primary goal is to update the DeepGEMM dependency to a newer version and adapt the local code to changes in the library's JIT compilation API, specifically for FP8 GEMM kernels.

Highlights

DeepGEMM Dependency Update: The pull request updates the git tag for the DeepGEMM dependency in the CMake configuration (sgl-kernel/CMakeLists.txt) to a newer commit (8dfa3298...). This brings in the latest changes from the DeepGEMM repository.
Kernel Compilation Logic Refactor: The Python code responsible for compiling DeepGEMM kernels (python/sglang/srt/layers/quantization/deep_gemm.py) has been modified. Instead of using jit_tuner.compile_and_tune, the code now explicitly calls FP8GemmRuntime.generate to get the kernel code and then uses the newly imported build function from deep_gemm.jit to compile it. This change is applied to the compilation of masked, contiguous, and normal FP8 GEMM kernels.
Updated Compilation Parameters: The parameters passed to the kernel compilation functions (_compile_grouped_gemm_nt_f8f8bf16_masked_one, _compile_grouped_gemm_nt_f8f8bf16_contig_one, _compile_gemm_nt_f8f8bf16_one) have been updated. More kernel configuration parameters like N, K, BLOCK_M, BLOCK_N, etc., are now included directly in the kwargs dictionary used for generating and building the kernel code.

Changelog

python/sglang/srt/layers/quantization/deep_gemm.py
- Added import for build from deep_gemm.jit.
- Replaced calls to jit_tuner.compile_and_tune with calls to FP8GemmRuntime.generate followed by build for FP8 GEMM kernel compilation.
- Modified the kwargs dictionary passed to the kernel compilation functions to include additional kernel configuration parameters.
sgl-kernel/CMakeLists.txt
- Updated the GIT_TAG for the DeepGEMM dependency to commit 8dfa3298274bfe6b242f6f8a3e6f3eff2707dd9f.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Code changes flow in,
New library version pulled,
Kernels compile now.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the integration with the deep_gemm library, primarily by changing how JIT compilation is invoked for GEMM kernels and updating the deep_gemm dependency version in CMake.

The changes in python/sglang/srt/layers/quantization/deep_gemm.py adapt to a new API, replacing jit_tuner.compile_and_tune with FP8GemmRuntime.generate and deep_gemm.jit.build. While this adaptation seems to follow a new pattern, there's a critical concern regarding the handling of NUM_GROUPS for masked grouped GEMM, and a medium concern about an unused parameter for contiguous grouped GEMM.

The PR description is currently a template. It would be beneficial to fill it out with the motivation and a summary of modifications to help reviewers understand the changes better. Also, the checklist is not completed.

Summary of Findings

Potential incorrect NUM_GROUPS handling for masked grouped GEMM: In _compile_grouped_gemm_nt_f8f8bf16_masked_one, NUM_GROUPS is hardcoded to 1 in the kwargs passed to FP8GemmRuntime.generate and build. This might be incorrect if the actual num_groups (passed as a function parameter) can be greater than 1 and is required for correct kernel compilation.
Unused parameter in contiguous grouped GEMM compilation: The num_groups parameter in _compile_grouped_gemm_nt_f8f8bf16_contig_one appears to be unused, as the logic consistently uses NUM_GROUPS = 1 for this kernel type. The parameter should be marked as unused (e.g., renamed to _) for clarity.
PR Description and Checklist: The pull request description is a template and the checklist is not filled. Providing details about the motivation and changes would be helpful for reviewers.

Merge Readiness

This pull request updates the deep_gemm integration, which is a significant change. There is a critical concern regarding the handling of NUM_GROUPS in _compile_grouped_gemm_nt_f8f8bf16_masked_one that needs to be addressed to ensure correctness. Additionally, there's a point of code clarity regarding an unused parameter in another function.

I recommend that these issues, especially the critical one, be resolved before merging. As an AI, I am not authorized to approve pull requests; please ensure further review and approval from team members.

gemini-code-assist · 2025-06-05T09:22:36Z

python/sglang/srt/layers/quantization/deep_gemm.py

        "NUM_MATH_THREADS_PER_GROUP": num_math_threads_per_group,
-        "BLOCK_K": block_k,
+        'N': n, 'K': k,
+        'NUM_GROUPS': 1,


It appears NUM_GROUPS is hardcoded to 1 here. However, the function _compile_grouped_gemm_nt_f8f8bf16_masked_one receives a num_groups parameter, which is passed down from _maybe_compile_deep_gemm_one_type_all and is used in configure_func (which calls get_best_configs with this num_groups).

If FP8GemmRuntime.generate or deep_gemm.jit.build expect the actual num_groups for kernel generation (e.g., as a template parameter), hardcoding it to 1 could lead to incorrect kernel compilation or runtime behavior when num_groups is greater than 1.

Could you clarify if this is an intentional change due to the deep_gemm API update, or if the num_groups variable from the function signature should be used here?

'NUM_GROUPS': num_groups, // Use the num_groups parameter from the function signature

Alcanderian · 2025-06-05T09:43:32Z

This line should be remove

Alcanderian · 2025-06-05T09:44:39Z

@zhyncs Let's update this PR while next version of sgl-kernel bumped

noop

zhyncs · 2025-06-07T10:07:45Z

Thanks for your contribution!

zhangxiaolei123456 added 2 commits June 5, 2025 17:15

Update deep_gemm.py

d870632

Update CMakeLists.txt

dcb4f89

zhangxiaolei123456 requested review from BBuf, FlamingoPg, HaiShaw, HandH1998, Ying1123, ch-wan, ispobock, merrymercy, yizhang2077 and zhyncs as code owners June 5, 2025 09:21

gemini-code-assist bot reviewed Jun 5, 2025

View reviewed changes

gemini-code-assist bot previously requested changes Jun 5, 2025

View reviewed changes

zhangxiaolei123456 changed the title ~~deepgemm update import groupgemm~~ deepgemm update enhance groupgemm Jun 5, 2025

Alcanderian assigned Alcanderian and zhyncs Jun 5, 2025

zhangxiaolei123456 and others added 2 commits June 5, 2025 17:48

Update deep_gemm.py

92ed23b

Merge branch 'sgl-project:main' into zhangxiaolei123456-patch-2

6b92687

Alcanderian mentioned this pull request Jun 7, 2025

[sgl-kernel] update deepgemm #6942

Merged

6 tasks

Alcanderian mentioned this pull request Jun 7, 2025

chore: upgrade sgl-kernel v0.1.6 #6945

Merged

6 tasks

zhyncs closed this Jun 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

deepgemm update enhance groupgemm #6893

deepgemm update enhance groupgemm #6893

Uh oh!

zhangxiaolei123456 commented Jun 5, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jun 5, 2025

Uh oh!

Alcanderian commented Jun 5, 2025

Uh oh!

Alcanderian commented Jun 5, 2025 •

edited

Loading

Uh oh!

zhyncs commented Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deepgemm update enhance groupgemm #6893

deepgemm update enhance groupgemm #6893

Uh oh!

Conversation

zhangxiaolei123456 commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Alcanderian commented Jun 5, 2025

Uh oh!

Alcanderian commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhyncs commented Jun 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhangxiaolei123456 commented Jun 5, 2025 •

edited

Loading

Alcanderian commented Jun 5, 2025 •

edited

Loading