[sgl-kernel] Support PDL for activatons #6722

Edenzzzz · 2025-05-29T02:18:10Z

Motivation

Dependent on #5981, upgrades flashinfer in sgl-kernel to enable PDL in activation functions.
Will update to verify compilation

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Hello @Edenzzzz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR focuses on upgrading the FlashInfer dependency within sgl-kernel to version 0.2.5. The primary motivation, as stated in the description, is to enable support for Programmatic Dependent Launch (PDL) in the activation functions. This involves updating the FlashInfer submodule reference and modifying the kernel implementations and their Python bindings to accept and utilize an enable_pdl flag instead of the raw CUDA stream pointer.

Highlights

FlashInfer Upgrade: The FlashInfer dependency is updated to a new commit (592b110a78725e034daf3a3c978053e7dae92e5c) in sgl-kernel/CMakeLists.txt, moving from the previous commit (9220fb3443b5a5d274f00ca5552f798e225239b7). This upgrade is necessary to gain access to FlashInfer features that support PDL.
PDL Integration in Activation Kernels: The core activation kernels (silu_and_mul, gelu_tanh_and_mul, gelu_and_mul) in sgl-kernel/csrc/elementwise/activation.cu are modified to use cudaLaunchKernelEx with the cudaLaunchAttributeProgrammaticStreamSerialization attribute. This attribute is conditionally enabled based on a new bool enable_pdl parameter passed to the functions. This replaces the previous method of explicitly passing and using a cuda_stream pointer.
API Signature Changes: The C++ function signatures for the activation kernels (silu_and_mul, gelu_tanh_and_mul, gelu_and_mul) are updated in sgl-kernel/include/sgl_kernel_ops.h and their PyTorch bindings in sgl-kernel/csrc/common_extension.cc. The int64_t cuda_stream parameter is replaced with bool enable_pdl.
Python Wrapper Updates: The Python wrapper functions for the activation kernels in sgl-kernel/python/sgl_kernel/elementwise.py are updated to accept an optional enable_pdl boolean argument (defaulting to False). This flag is then passed down to the underlying C++ kernel calls via torch.ops.

Changelog

Click here to see the changelog

sgl-kernel/CMakeLists.txt
- Updated FlashInfer GIT_TAG from 9220fb3443b5a5d274f00ca5552f798e225239b7 to 592b110a78725e034daf3a3c978053e7dae92e5c (line 64).
sgl-kernel/csrc/common_extension.cc
- Changed the signature of silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul in the TORCH_LIBRARY_FRAGMENT definition from accepting int cuda_stream to bool enable_pdl (lines 73, 76, 79).
sgl-kernel/csrc/elementwise/activation.cu
- Modified silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul implementations to accept bool enable_pdl instead of int64_t cuda_stream (lines 36, 68, 100).
- Updated kernel launches from <<<grid, block, 0, stream>>> syntax to cudaLaunchKernelEx with cudaLaunchConfig_t (lines 45-46, 60-61, 77-78, 90-91, 109-110, 120-121).
- Added cudaLaunchAttributeProgrammaticStreamSerialization attribute to the launch configuration, controlled by the enable_pdl parameter (lines 50-52, 82-84, 114-116).
- Added error checking after kernel launches using cudaGetLastError and TORCH_CHECK (lines 61-62, 93-94, 124-125).
sgl-kernel/include/sgl_kernel_ops.h
- Updated function declarations for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul to accept bool enable_pdl instead of int64_t cuda_stream (lines 110, 111, 112).
sgl-kernel/python/sgl_kernel/elementwise.py
- Added enable_pdl: bool = False parameter to the Python wrapper functions silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul (lines 160, 177, 194).
- Passed the enable_pdl parameter to the underlying torch.ops.sgl_kernel calls (lines 172, 189, 206).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Kernel code, a deep dive,
PDL helps threads thrive.
FlashInfer updated,
Performance elevated.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Hello @Edenzzzz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by Edenzzzz, focuses on upgrading the FlashInfer dependency within the sgl-kernel module to version 0.2.5. The primary motivation behind this upgrade is to enable the use of Programmatic Dependent Launch (PDL) within activation functions, specifically silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul. The changes involve updating the FlashInfer GIT tag in the CMake configuration, modifying the C++ kernel implementations to use cudaLaunchKernelEx with the PDL attribute, and updating the corresponding Python wrappers to expose an enable_pdl flag.

Highlights

FlashInfer Upgrade: The FlashInfer dependency is updated to a newer commit (from 9220fb3... to 592b110...), which is necessary to access features enabling PDL.
PDL Support in Activations: The silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul activation functions are modified to accept an enable_pdl boolean flag. Their underlying CUDA kernel launches are updated to use cudaLaunchKernelEx and set the cudaLaunchAttributeProgrammaticStreamSerialization attribute based on this flag.
API Changes: The C++ and Python interfaces for the affected activation functions are updated to replace the cuda_stream integer argument with the new enable_pdl boolean argument.

Changelog

Click here to see the changelog

sgl-kernel/CMakeLists.txt
- Updated FlashInfer GIT_TAG to 592b110a78725e034daf3a3c978053e7dae92e5c.
sgl-kernel/csrc/common_extension.cc
- Changed the function signatures for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul to accept bool enable_pdl instead of int cuda_stream.
sgl-kernel/csrc/elementwise/activation.cu
- Modified silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul implementations to use cudaLaunchKernelEx.
- Added logic to set the cudaLaunchAttributeProgrammaticStreamSerialization attribute based on the enable_pdl parameter.
- Replaced retrieval of stream from int64_t cuda_stream with at::cuda::getCurrentCUDAStream().
sgl-kernel/include/sgl_kernel_ops.h
- Updated function declarations for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul to match the new bool enable_pdl parameter.
sgl-kernel/python/sgl_kernel/elementwise.py
- Added enable_pdl: bool = False parameter to the Python wrappers for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul.
- Passed the enable_pdl flag to the underlying C++ kernel calls.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Library updated,
Kernels launch with new control,
PDL now enabled.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively upgrades the FlashInfer dependency to version 0.2.5 and integrates Programmatic Dependent Launch (PDL) capabilities into the activation functions. The changes are well-contained and consistent across the C++, Python, and build files.

I've identified a couple of areas for improvement, primarily concerning Python docstring updates and a suggestion for refactoring duplicated CUDA kernel launch logic to enhance maintainability. Addressing these would further improve the quality of the codebase.

Also, a minor typo in the pull request title: "activaiton" should be "activation".

Summary of Findings

Code Duplication in CUDA Kernel Launch: The CUDA kernel launch configuration and execution logic using cudaLaunchKernelEx is duplicated across silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul functions in sgl-kernel/csrc/elementwise/activation.cu. This was suggested to be refactored into a helper function to improve maintainability. (Severity: medium)
Missing Python Docstring Updates: The Python docstrings for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul in sgl-kernel/python/sgl_kernel/elementwise.py have not been updated to include the new enable_pdl parameter. (Severity: medium)
PR Title Typo: Minor typo in PR title: "activaiton" should be "activation". (Not commented on directly, mentioned in general feedback)

Merge Readiness

The pull request is well-structured and successfully upgrades FlashInfer to enable Programmatic Dependent Launch (PDL). However, there are a couple of medium-severity issues that should be addressed:

Python docstrings for the modified activation functions need to be updated to reflect the new enable_pdl parameter.
There's an opportunity to refactor duplicated CUDA kernel launch code in activation.cu for better maintainability.

I recommend addressing these points, especially the docstring updates, before merging to ensure API clarity and codebase quality. As an AI reviewer, I am not authorized to approve pull requests; please ensure further review and approval by team members.

sgl-kernel/csrc/elementwise/activation.cu

sgl-kernel/python/sgl_kernel/elementwise.py

gemini-code-assist

Code Review

This pull request upgrades FlashInfer to version 0.2.5 and refactors activation function kernels (silu_and_mul, gelu_tanh_and_mul, gelu_and_mul) to support Programmatic Dependent Launch (PDL). The changes in CMakeLists.txt, common_extension.cc, sgl_kernel_ops.h, and elementwise.py are consistent with this goal and seem correct.

The core changes in csrc/elementwise/activation.cu correctly implement the mechanism to enable/disable PDL using cudaLaunchKernelEx and cudaLaunchAttributeProgrammaticStreamSerialization.

However, there are a few critical issues in the kernel launch configuration within activation.cu that need addressing, related to potential zero values for blockDim and gridDim. Additionally, there's an opportunity to reduce code duplication in that file.

No specific style guide was provided, so feedback is based on common C++/CUDA and Python best practices (e.g., Google C++ Style Guide, PEP 8 for Python).

Summary of Findings

CUDA Kernel Launch Configuration: Zero Block Dimension: In sgl-kernel/csrc/elementwise/activation.cu, the blockDim for CUDA kernels can be calculated as 0 if d < vec_size. A block dimension of 0 is invalid and will cause cudaLaunchKernelEx to fail. This affects silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul.
CUDA Kernel Launch Configuration: Zero Grid Dimension: In sgl-kernel/csrc/elementwise/activation.cu, if input.numel() is 0, num_tokens (used for gridDim) will be 0. A grid dimension of 0 is invalid and will cause cudaLaunchKernelEx to fail. This affects all three activation functions.
Code Duplication in Kernel Launch Logic: The logic for setting up cudaLaunchConfig_t, PDL attributes, launching the kernel, and error checking is largely duplicated across the three activation functions in sgl-kernel/csrc/elementwise/activation.cu. Refactoring into a helper function would improve maintainability.
Python Docstring Update (Low Severity): The Python docstrings for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul in sgl-kernel/python/sgl_kernel/elementwise.py should be updated to include the new enable_pdl parameter. (Not commented inline due to severity filter settings).

Merge Readiness

The pull request makes good progress in enabling PDL for activation functions by upgrading FlashInfer and modifying the kernel launch mechanism. However, there are critical issues in sgl-kernel/csrc/elementwise/activation.cu related to CUDA kernel launch configurations (blockDim and gridDim potentially being zero) that must be addressed before merging. Additionally, refactoring the duplicated kernel launch logic is recommended for better maintainability.

I am unable to approve pull requests, but I recommend that these changes, especially the critical ones, be made and reviewed before this PR is merged.

sgl-kernel/csrc/elementwise/activation.cu

FlamingoPg · 2025-05-29T06:27:36Z

I see, this modification help some activation kernel use PDL? @Edenzzzz

Edenzzzz · 2025-05-29T16:42:24Z

I see, this modification help some activation kernel use PDL? @Edenzzzz

Yes this should reduce launch overhead further combined with PDL in norm.

Edenzzzz · 2025-05-29T18:40:00Z

Build successful on H100

Co-authored-by: Chang Su <[email protected]>

… non-stream (sgl-project#6784)

…gl-project#6785)

Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: Thien Tran <[email protected]>

…#6734)

Signed-off-by: storyicon <[email protected]>

…-project#6791)

Edenzzzz · 2025-11-04T21:12:39Z

@Fridge003 Have you seen the deepep failure elsewhere? it should kernel launch fail, but I didn't change launch parameters.

b8zhong · 2025-11-07T19:28:17Z

To my understanding, this PR is still needed because currently we use the CPP interface of activations from Flashinfer, right? Because I found for other kernels from FI, PDL is automatically determined

Edenzzzz · 2025-11-08T02:25:02Z

To my understanding, this PR is still needed because currently we use the CPP interface of activations from Flashinfer, right? Because I found for other kernels from FI, PDL is automatically determined

Yes this is basically migrating the launch logic from flashinfer while supporting ROCM in the other path

update to flashinfer 0.2.5

1818b07

Edenzzzz requested review from BBuf, FlamingoPg, HandH1998, ispobock, merrymercy, yizhang2077 and zhyncs as code owners May 29, 2025 02:18

Edenzzzz changed the title ~~sgl-kernel: upgrade to flashinfer 0.2.5 to support PDL in activaiton~~ sgl-kernel: upgrade to flashinfer 0.2.5 to enable PDL in activaiton May 29, 2025

gemini-code-assist bot reviewed May 29, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 29, 2025

View reviewed changes

sgl-kernel/csrc/elementwise/activation.cu Outdated Show resolved Hide resolved

sgl-kernel/python/sgl_kernel/elementwise.py Show resolved Hide resolved

sgl-kernel/python/sgl_kernel/elementwise.py Show resolved Hide resolved

sgl-kernel/python/sgl_kernel/elementwise.py Show resolved Hide resolved

gemini-code-assist bot suggested changes May 29, 2025

View reviewed changes

Edenzzzz added 2 commits May 29, 2025 11:42

Merge branch 'main' into upgrade_act

c82cd7f

Update rope and bmm args

c6d6ebe

Merge branch 'main' into upgrade_act

f2bc110

Fridge003 mentioned this pull request Jun 1, 2025

[Feature] integrate FlashInfer Blackwell kernels #5855

Closed

2 tasks

Edenzzzz mentioned this pull request Jun 1, 2025

misc: cache is_hopper_arch #6799

Merged

6 tasks

upfixer and others added 9 commits June 2, 2025 00:45

update llama4 chat template and pythonic parser (sgl-project#6679)

199a7e4

Co-authored-by: Chang Su <[email protected]>

feat(tool call): Enhance Llama32Detector for improved JSON parsing in…

dd79f42

… non-stream (sgl-project#6784)

Support token-level quantization for EP MoE (sgl-project#6782)

d143c1e

Temporarily lower mmlu threshold for triton sliding window backend (s…

e13c073

…gl-project#6785)

ci: relax test_function_call_required (sgl-project#6786)

e622cca

Add intel_amx backend for Radix Attention for CPU (sgl-project#6408)

51dadba

Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: Thien Tran <[email protected]>

Fix incorrect LoRA weight loading for fused gate_up_proj (sgl-project…

813e7f6

…#6734)

fix(PD-disaggregation): Can not get local ip (sgl-project#6792)

ec8a3c9

Signed-off-by: storyicon <[email protected]>

[FIX] mmmu bench serving result display error (sgl-project#6525) (sgl…

4eb04d5

…-project#6791)

Edenzzzz and others added 6 commits September 16, 2025 14:51

fix

b3e29e4

Merge branch 'main' into upgrade_act

aeca0d0

fix

3bf8ef3

Merge branch 'main' into upgrade_act

ebb587f

Merge branch 'main' into upgrade_act

ee883be

Merge branch 'main' into upgrade_act

9fc56c2

FlamingoPg approved these changes Oct 28, 2025

View reviewed changes

Fridge003 and others added 9 commits October 27, 2025 20:35

Merge branch 'main' into upgrade_act

2cc892a

Merge branch 'main' into upgrade_act

eda5473

Merge branch 'main' into upgrade_act

124d7f8

Merge branch 'main' into upgrade_act

ff1d2d2

Merge branch 'main' into upgrade_act

232dfe7

Merge branch 'main' into upgrade_act

a3bc9dc

Merge branch 'main' into upgrade_act

440e351

Merge branch 'main' into upgrade_act

9bda166

Merge branch 'main' into upgrade_act

c5b257a

Edenzzzz changed the title ~~[sgl-kernel] Support PDL in activatons~~ [sgl-kernel] Support PDL for activatons Oct 31, 2025

Fridge003 and others added 2 commits October 31, 2025 20:08

Merge branch 'main' into upgrade_act

59bbe7e

Merge branch 'main' into upgrade_act

5235888

Edenzzzz and others added 3 commits November 4, 2025 15:12

Merge branch 'main' into upgrade_act

5ff976c

Merge branch 'main' into upgrade_act

b41e65d

Merge branch 'main' into upgrade_act

6bb801f

Merge branch 'main' into upgrade_act

0478776

github-actions bot added amd sgl-kernel labels Nov 8, 2025

Edenzzzz and others added 2 commits November 9, 2025 12:24

try device_guard earlier

0f14bf6

Merge branch 'main' into upgrade_act

f2c4eb5

[sgl-kernel] Support PDL for activatons #6722

Are you sure you want to change the base?

[sgl-kernel] Support PDL for activatons #6722

Conversation

Edenzzzz commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FlamingoPg commented May 29, 2025

Uh oh!

Edenzzzz commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Edenzzzz commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Edenzzzz commented Nov 4, 2025

Uh oh!

b8zhong commented Nov 7, 2025

Uh oh!

Edenzzzz commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

18 participants

Edenzzzz commented May 29, 2025 •

edited

Loading

Edenzzzz commented May 29, 2025 •

edited

Loading

Edenzzzz commented May 29, 2025 •

edited

Loading