-
Notifications
You must be signed in to change notification settings - Fork 3.4k
[sgl-kernel] Support PDL for activatons #6722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Edenzzzz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. This PR focuses on upgrading the FlashInfer dependency within sgl-kernel to version 0.2.5. The primary motivation, as stated in the description, is to enable support for Programmatic Dependent Launch (PDL) in the activation functions. This involves updating the FlashInfer submodule reference and modifying the kernel implementations and their Python bindings to accept and utilize an enable_pdl flag instead of the raw CUDA stream pointer.
Highlights
- FlashInfer Upgrade: The FlashInfer dependency is updated to a new commit (
592b110a78725e034daf3a3c978053e7dae92e5c) insgl-kernel/CMakeLists.txt, moving from the previous commit (9220fb3443b5a5d274f00ca5552f798e225239b7). This upgrade is necessary to gain access to FlashInfer features that support PDL. - PDL Integration in Activation Kernels: The core activation kernels (
silu_and_mul,gelu_tanh_and_mul,gelu_and_mul) insgl-kernel/csrc/elementwise/activation.cuare modified to usecudaLaunchKernelExwith thecudaLaunchAttributeProgrammaticStreamSerializationattribute. This attribute is conditionally enabled based on a newbool enable_pdlparameter passed to the functions. This replaces the previous method of explicitly passing and using acuda_streampointer. - API Signature Changes: The C++ function signatures for the activation kernels (
silu_and_mul,gelu_tanh_and_mul,gelu_and_mul) are updated insgl-kernel/include/sgl_kernel_ops.hand their PyTorch bindings insgl-kernel/csrc/common_extension.cc. Theint64_t cuda_streamparameter is replaced withbool enable_pdl. - Python Wrapper Updates: The Python wrapper functions for the activation kernels in
sgl-kernel/python/sgl_kernel/elementwise.pyare updated to accept an optionalenable_pdlboolean argument (defaulting toFalse). This flag is then passed down to the underlying C++ kernel calls viatorch.ops.
Changelog
Click here to see the changelog
- sgl-kernel/CMakeLists.txt
- Updated FlashInfer GIT_TAG from
9220fb3443b5a5d274f00ca5552f798e225239b7to592b110a78725e034daf3a3c978053e7dae92e5c(line 64).
- Updated FlashInfer GIT_TAG from
- sgl-kernel/csrc/common_extension.cc
- Changed the signature of
silu_and_mul,gelu_tanh_and_mul, andgelu_and_mulin theTORCH_LIBRARY_FRAGMENTdefinition from acceptingint cuda_streamtobool enable_pdl(lines 73, 76, 79).
- Changed the signature of
- sgl-kernel/csrc/elementwise/activation.cu
- Modified
silu_and_mul,gelu_tanh_and_mul, andgelu_and_mulimplementations to acceptbool enable_pdlinstead ofint64_t cuda_stream(lines 36, 68, 100). - Updated kernel launches from
<<<grid, block, 0, stream>>>syntax tocudaLaunchKernelExwithcudaLaunchConfig_t(lines 45-46, 60-61, 77-78, 90-91, 109-110, 120-121). - Added
cudaLaunchAttributeProgrammaticStreamSerializationattribute to the launch configuration, controlled by theenable_pdlparameter (lines 50-52, 82-84, 114-116). - Added error checking after kernel launches using
cudaGetLastErrorandTORCH_CHECK(lines 61-62, 93-94, 124-125).
- Modified
- sgl-kernel/include/sgl_kernel_ops.h
- Updated function declarations for
silu_and_mul,gelu_tanh_and_mul, andgelu_and_multo acceptbool enable_pdlinstead ofint64_t cuda_stream(lines 110, 111, 112).
- Updated function declarations for
- sgl-kernel/python/sgl_kernel/elementwise.py
- Added
enable_pdl: bool = Falseparameter to the Python wrapper functionssilu_and_mul,gelu_tanh_and_mul, andgelu_and_mul(lines 160, 177, 194). - Passed the
enable_pdlparameter to the underlyingtorch.ops.sgl_kernelcalls (lines 172, 189, 206).
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Kernel code, a deep dive,
PDL helps threads thrive.
FlashInfer updated,
Performance elevated.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @Edenzzzz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by Edenzzzz, focuses on upgrading the FlashInfer dependency within the sgl-kernel module to version 0.2.5. The primary motivation behind this upgrade is to enable the use of Programmatic Dependent Launch (PDL) within activation functions, specifically silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul. The changes involve updating the FlashInfer GIT tag in the CMake configuration, modifying the C++ kernel implementations to use cudaLaunchKernelEx with the PDL attribute, and updating the corresponding Python wrappers to expose an enable_pdl flag.
Highlights
- FlashInfer Upgrade: The FlashInfer dependency is updated to a newer commit (from
9220fb3...to592b110...), which is necessary to access features enabling PDL. - PDL Support in Activations: The
silu_and_mul,gelu_tanh_and_mul, andgelu_and_mulactivation functions are modified to accept anenable_pdlboolean flag. Their underlying CUDA kernel launches are updated to usecudaLaunchKernelExand set thecudaLaunchAttributeProgrammaticStreamSerializationattribute based on this flag. - API Changes: The C++ and Python interfaces for the affected activation functions are updated to replace the
cuda_streaminteger argument with the newenable_pdlboolean argument.
Changelog
Click here to see the changelog
- sgl-kernel/CMakeLists.txt
- Updated FlashInfer GIT_TAG to
592b110a78725e034daf3a3c978053e7dae92e5c.
- Updated FlashInfer GIT_TAG to
- sgl-kernel/csrc/common_extension.cc
- Changed the function signatures for
silu_and_mul,gelu_tanh_and_mul, andgelu_and_multo acceptbool enable_pdlinstead ofint cuda_stream.
- Changed the function signatures for
- sgl-kernel/csrc/elementwise/activation.cu
- Modified
silu_and_mul,gelu_tanh_and_mul, andgelu_and_mulimplementations to usecudaLaunchKernelEx. - Added logic to set the
cudaLaunchAttributeProgrammaticStreamSerializationattribute based on theenable_pdlparameter. - Replaced retrieval of stream from
int64_t cuda_streamwithat::cuda::getCurrentCUDAStream().
- Modified
- sgl-kernel/include/sgl_kernel_ops.h
- Updated function declarations for
silu_and_mul,gelu_tanh_and_mul, andgelu_and_multo match the newbool enable_pdlparameter.
- Updated function declarations for
- sgl-kernel/python/sgl_kernel/elementwise.py
- Added
enable_pdl: bool = Falseparameter to the Python wrappers forsilu_and_mul,gelu_tanh_and_mul, andgelu_and_mul. - Passed the
enable_pdlflag to the underlying C++ kernel calls.
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Library updated,
Kernels launch with new control,
PDL now enabled.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively upgrades the FlashInfer dependency to version 0.2.5 and integrates Programmatic Dependent Launch (PDL) capabilities into the activation functions. The changes are well-contained and consistent across the C++, Python, and build files.
I've identified a couple of areas for improvement, primarily concerning Python docstring updates and a suggestion for refactoring duplicated CUDA kernel launch logic to enhance maintainability. Addressing these would further improve the quality of the codebase.
Also, a minor typo in the pull request title: "activaiton" should be "activation".
Summary of Findings
- Code Duplication in CUDA Kernel Launch: The CUDA kernel launch configuration and execution logic using
cudaLaunchKernelExis duplicated acrosssilu_and_mul,gelu_tanh_and_mul, andgelu_and_mulfunctions insgl-kernel/csrc/elementwise/activation.cu. This was suggested to be refactored into a helper function to improve maintainability. (Severity: medium) - Missing Python Docstring Updates: The Python docstrings for
silu_and_mul,gelu_tanh_and_mul, andgelu_and_mulinsgl-kernel/python/sgl_kernel/elementwise.pyhave not been updated to include the newenable_pdlparameter. (Severity: medium) - PR Title Typo: Minor typo in PR title: "activaiton" should be "activation". (Not commented on directly, mentioned in general feedback)
Merge Readiness
The pull request is well-structured and successfully upgrades FlashInfer to enable Programmatic Dependent Launch (PDL). However, there are a couple of medium-severity issues that should be addressed:
- Python docstrings for the modified activation functions need to be updated to reflect the new
enable_pdlparameter. - There's an opportunity to refactor duplicated CUDA kernel launch code in
activation.cufor better maintainability.
I recommend addressing these points, especially the docstring updates, before merging to ensure API clarity and codebase quality. As an AI reviewer, I am not authorized to approve pull requests; please ensure further review and approval by team members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request upgrades FlashInfer to version 0.2.5 and refactors activation function kernels (silu_and_mul, gelu_tanh_and_mul, gelu_and_mul) to support Programmatic Dependent Launch (PDL). The changes in CMakeLists.txt, common_extension.cc, sgl_kernel_ops.h, and elementwise.py are consistent with this goal and seem correct.
The core changes in csrc/elementwise/activation.cu correctly implement the mechanism to enable/disable PDL using cudaLaunchKernelEx and cudaLaunchAttributeProgrammaticStreamSerialization.
However, there are a few critical issues in the kernel launch configuration within activation.cu that need addressing, related to potential zero values for blockDim and gridDim. Additionally, there's an opportunity to reduce code duplication in that file.
No specific style guide was provided, so feedback is based on common C++/CUDA and Python best practices (e.g., Google C++ Style Guide, PEP 8 for Python).
Summary of Findings
- CUDA Kernel Launch Configuration: Zero Block Dimension: In
sgl-kernel/csrc/elementwise/activation.cu, theblockDimfor CUDA kernels can be calculated as 0 ifd < vec_size. A block dimension of 0 is invalid and will causecudaLaunchKernelExto fail. This affectssilu_and_mul,gelu_tanh_and_mul, andgelu_and_mul. - CUDA Kernel Launch Configuration: Zero Grid Dimension: In
sgl-kernel/csrc/elementwise/activation.cu, ifinput.numel()is 0,num_tokens(used forgridDim) will be 0. A grid dimension of 0 is invalid and will causecudaLaunchKernelExto fail. This affects all three activation functions. - Code Duplication in Kernel Launch Logic: The logic for setting up
cudaLaunchConfig_t, PDL attributes, launching the kernel, and error checking is largely duplicated across the three activation functions insgl-kernel/csrc/elementwise/activation.cu. Refactoring into a helper function would improve maintainability. - Python Docstring Update (Low Severity): The Python docstrings for
silu_and_mul,gelu_tanh_and_mul, andgelu_and_mulinsgl-kernel/python/sgl_kernel/elementwise.pyshould be updated to include the newenable_pdlparameter. (Not commented inline due to severity filter settings).
Merge Readiness
The pull request makes good progress in enabling PDL for activation functions by upgrading FlashInfer and modifying the kernel launch mechanism. However, there are critical issues in sgl-kernel/csrc/elementwise/activation.cu related to CUDA kernel launch configurations (blockDim and gridDim potentially being zero) that must be addressed before merging. Additionally, refactoring the duplicated kernel launch logic is recommended for better maintainability.
I am unable to approve pull requests, but I recommend that these changes, especially the critical ones, be made and reviewed before this PR is merged.
|
I see, this modification help some activation kernel use PDL? @Edenzzzz |
Yes this should reduce launch overhead further combined with PDL in norm. |
|
Build successful on H100 |
Co-authored-by: Chang Su <[email protected]>
Co-authored-by: Chunyuan WU <[email protected]> Co-authored-by: Thien Tran <[email protected]>
Signed-off-by: storyicon <[email protected]>
|
@Fridge003 Have you seen the deepep failure elsewhere? it should kernel launch fail, but I didn't change launch parameters. |
|
Yes this is basically migrating the launch logic from flashinfer while supporting ROCM in the other path |
Motivation
Dependent on #5981, upgrades flashinfer in sgl-kernel to enable PDL in activation functions.
Will update to verify compilation
Modifications
Checklist