Skip to content

Conversation

@Edenzzzz
Copy link
Contributor

@Edenzzzz Edenzzzz commented May 29, 2025

Motivation

Dependent on #5981, upgrades flashinfer in sgl-kernel to enable PDL in activation functions.
Will update to verify compilation

Modifications

Checklist

@Edenzzzz Edenzzzz changed the title sgl-kernel: upgrade to flashinfer 0.2.5 to support PDL in activaiton sgl-kernel: upgrade to flashinfer 0.2.5 to enable PDL in activaiton May 29, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Edenzzzz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR focuses on upgrading the FlashInfer dependency within sgl-kernel to version 0.2.5. The primary motivation, as stated in the description, is to enable support for Programmatic Dependent Launch (PDL) in the activation functions. This involves updating the FlashInfer submodule reference and modifying the kernel implementations and their Python bindings to accept and utilize an enable_pdl flag instead of the raw CUDA stream pointer.

Highlights

  • FlashInfer Upgrade: The FlashInfer dependency is updated to a new commit (592b110a78725e034daf3a3c978053e7dae92e5c) in sgl-kernel/CMakeLists.txt, moving from the previous commit (9220fb3443b5a5d274f00ca5552f798e225239b7). This upgrade is necessary to gain access to FlashInfer features that support PDL.
  • PDL Integration in Activation Kernels: The core activation kernels (silu_and_mul, gelu_tanh_and_mul, gelu_and_mul) in sgl-kernel/csrc/elementwise/activation.cu are modified to use cudaLaunchKernelEx with the cudaLaunchAttributeProgrammaticStreamSerialization attribute. This attribute is conditionally enabled based on a new bool enable_pdl parameter passed to the functions. This replaces the previous method of explicitly passing and using a cuda_stream pointer.
  • API Signature Changes: The C++ function signatures for the activation kernels (silu_and_mul, gelu_tanh_and_mul, gelu_and_mul) are updated in sgl-kernel/include/sgl_kernel_ops.h and their PyTorch bindings in sgl-kernel/csrc/common_extension.cc. The int64_t cuda_stream parameter is replaced with bool enable_pdl.
  • Python Wrapper Updates: The Python wrapper functions for the activation kernels in sgl-kernel/python/sgl_kernel/elementwise.py are updated to accept an optional enable_pdl boolean argument (defaulting to False). This flag is then passed down to the underlying C++ kernel calls via torch.ops.

Changelog

Click here to see the changelog
  • sgl-kernel/CMakeLists.txt
    • Updated FlashInfer GIT_TAG from 9220fb3443b5a5d274f00ca5552f798e225239b7 to 592b110a78725e034daf3a3c978053e7dae92e5c (line 64).
  • sgl-kernel/csrc/common_extension.cc
    • Changed the signature of silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul in the TORCH_LIBRARY_FRAGMENT definition from accepting int cuda_stream to bool enable_pdl (lines 73, 76, 79).
  • sgl-kernel/csrc/elementwise/activation.cu
    • Modified silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul implementations to accept bool enable_pdl instead of int64_t cuda_stream (lines 36, 68, 100).
    • Updated kernel launches from <<<grid, block, 0, stream>>> syntax to cudaLaunchKernelEx with cudaLaunchConfig_t (lines 45-46, 60-61, 77-78, 90-91, 109-110, 120-121).
    • Added cudaLaunchAttributeProgrammaticStreamSerialization attribute to the launch configuration, controlled by the enable_pdl parameter (lines 50-52, 82-84, 114-116).
    • Added error checking after kernel launches using cudaGetLastError and TORCH_CHECK (lines 61-62, 93-94, 124-125).
  • sgl-kernel/include/sgl_kernel_ops.h
    • Updated function declarations for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul to accept bool enable_pdl instead of int64_t cuda_stream (lines 110, 111, 112).
  • sgl-kernel/python/sgl_kernel/elementwise.py
    • Added enable_pdl: bool = False parameter to the Python wrapper functions silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul (lines 160, 177, 194).
    • Passed the enable_pdl parameter to the underlying torch.ops.sgl_kernel calls (lines 172, 189, 206).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Kernel code, a deep dive,
PDL helps threads thrive.
FlashInfer updated,
Performance elevated.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Edenzzzz, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, authored by Edenzzzz, focuses on upgrading the FlashInfer dependency within the sgl-kernel module to version 0.2.5. The primary motivation behind this upgrade is to enable the use of Programmatic Dependent Launch (PDL) within activation functions, specifically silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul. The changes involve updating the FlashInfer GIT tag in the CMake configuration, modifying the C++ kernel implementations to use cudaLaunchKernelEx with the PDL attribute, and updating the corresponding Python wrappers to expose an enable_pdl flag.

Highlights

  • FlashInfer Upgrade: The FlashInfer dependency is updated to a newer commit (from 9220fb3... to 592b110...), which is necessary to access features enabling PDL.
  • PDL Support in Activations: The silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul activation functions are modified to accept an enable_pdl boolean flag. Their underlying CUDA kernel launches are updated to use cudaLaunchKernelEx and set the cudaLaunchAttributeProgrammaticStreamSerialization attribute based on this flag.
  • API Changes: The C++ and Python interfaces for the affected activation functions are updated to replace the cuda_stream integer argument with the new enable_pdl boolean argument.

Changelog

Click here to see the changelog
  • sgl-kernel/CMakeLists.txt
    • Updated FlashInfer GIT_TAG to 592b110a78725e034daf3a3c978053e7dae92e5c.
  • sgl-kernel/csrc/common_extension.cc
    • Changed the function signatures for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul to accept bool enable_pdl instead of int cuda_stream.
  • sgl-kernel/csrc/elementwise/activation.cu
    • Modified silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul implementations to use cudaLaunchKernelEx.
    • Added logic to set the cudaLaunchAttributeProgrammaticStreamSerialization attribute based on the enable_pdl parameter.
    • Replaced retrieval of stream from int64_t cuda_stream with at::cuda::getCurrentCUDAStream().
  • sgl-kernel/include/sgl_kernel_ops.h
    • Updated function declarations for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul to match the new bool enable_pdl parameter.
  • sgl-kernel/python/sgl_kernel/elementwise.py
    • Added enable_pdl: bool = False parameter to the Python wrappers for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul.
    • Passed the enable_pdl flag to the underlying C++ kernel calls.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Library updated,
Kernels launch with new control,
PDL now enabled.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively upgrades the FlashInfer dependency to version 0.2.5 and integrates Programmatic Dependent Launch (PDL) capabilities into the activation functions. The changes are well-contained and consistent across the C++, Python, and build files.

I've identified a couple of areas for improvement, primarily concerning Python docstring updates and a suggestion for refactoring duplicated CUDA kernel launch logic to enhance maintainability. Addressing these would further improve the quality of the codebase.

Also, a minor typo in the pull request title: "activaiton" should be "activation".

Summary of Findings

  • Code Duplication in CUDA Kernel Launch: The CUDA kernel launch configuration and execution logic using cudaLaunchKernelEx is duplicated across silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul functions in sgl-kernel/csrc/elementwise/activation.cu. This was suggested to be refactored into a helper function to improve maintainability. (Severity: medium)
  • Missing Python Docstring Updates: The Python docstrings for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul in sgl-kernel/python/sgl_kernel/elementwise.py have not been updated to include the new enable_pdl parameter. (Severity: medium)
  • PR Title Typo: Minor typo in PR title: "activaiton" should be "activation". (Not commented on directly, mentioned in general feedback)

Merge Readiness

The pull request is well-structured and successfully upgrades FlashInfer to enable Programmatic Dependent Launch (PDL). However, there are a couple of medium-severity issues that should be addressed:

  1. Python docstrings for the modified activation functions need to be updated to reflect the new enable_pdl parameter.
  2. There's an opportunity to refactor duplicated CUDA kernel launch code in activation.cu for better maintainability.

I recommend addressing these points, especially the docstring updates, before merging to ensure API clarity and codebase quality. As an AI reviewer, I am not authorized to approve pull requests; please ensure further review and approval by team members.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request upgrades FlashInfer to version 0.2.5 and refactors activation function kernels (silu_and_mul, gelu_tanh_and_mul, gelu_and_mul) to support Programmatic Dependent Launch (PDL). The changes in CMakeLists.txt, common_extension.cc, sgl_kernel_ops.h, and elementwise.py are consistent with this goal and seem correct.

The core changes in csrc/elementwise/activation.cu correctly implement the mechanism to enable/disable PDL using cudaLaunchKernelEx and cudaLaunchAttributeProgrammaticStreamSerialization.

However, there are a few critical issues in the kernel launch configuration within activation.cu that need addressing, related to potential zero values for blockDim and gridDim. Additionally, there's an opportunity to reduce code duplication in that file.

No specific style guide was provided, so feedback is based on common C++/CUDA and Python best practices (e.g., Google C++ Style Guide, PEP 8 for Python).

Summary of Findings

  • CUDA Kernel Launch Configuration: Zero Block Dimension: In sgl-kernel/csrc/elementwise/activation.cu, the blockDim for CUDA kernels can be calculated as 0 if d < vec_size. A block dimension of 0 is invalid and will cause cudaLaunchKernelEx to fail. This affects silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul.
  • CUDA Kernel Launch Configuration: Zero Grid Dimension: In sgl-kernel/csrc/elementwise/activation.cu, if input.numel() is 0, num_tokens (used for gridDim) will be 0. A grid dimension of 0 is invalid and will cause cudaLaunchKernelEx to fail. This affects all three activation functions.
  • Code Duplication in Kernel Launch Logic: The logic for setting up cudaLaunchConfig_t, PDL attributes, launching the kernel, and error checking is largely duplicated across the three activation functions in sgl-kernel/csrc/elementwise/activation.cu. Refactoring into a helper function would improve maintainability.
  • Python Docstring Update (Low Severity): The Python docstrings for silu_and_mul, gelu_tanh_and_mul, and gelu_and_mul in sgl-kernel/python/sgl_kernel/elementwise.py should be updated to include the new enable_pdl parameter. (Not commented inline due to severity filter settings).

Merge Readiness

The pull request makes good progress in enabling PDL for activation functions by upgrading FlashInfer and modifying the kernel launch mechanism. However, there are critical issues in sgl-kernel/csrc/elementwise/activation.cu related to CUDA kernel launch configurations (blockDim and gridDim potentially being zero) that must be addressed before merging. Additionally, refactoring the duplicated kernel launch logic is recommended for better maintainability.

I am unable to approve pull requests, but I recommend that these changes, especially the critical ones, be made and reviewed before this PR is merged.

@FlamingoPg
Copy link
Collaborator

I see, this modification help some activation kernel use PDL? @Edenzzzz

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented May 29, 2025

I see, this modification help some activation kernel use PDL? @Edenzzzz

Yes this should reduce launch overhead further combined with PDL in norm.

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented May 29, 2025

Build successful on H100

@Edenzzzz Edenzzzz changed the title [sgl-kernel] Support PDL in activatons [sgl-kernel] Support PDL for activatons Oct 31, 2025
@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Nov 4, 2025

@Fridge003 Have you seen the deepep failure elsewhere? it should kernel launch fail, but I didn't change launch parameters.

@b8zhong
Copy link
Collaborator

b8zhong commented Nov 7, 2025

To my understanding, this PR is still needed because currently we use the CPP interface of activations from Flashinfer, right? Because I found for other kernels from FI, PDL is automatically determined

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Nov 8, 2025

To my understanding, this PR is still needed because currently we use the CPP interface of activations from Flashinfer, right? Because I found for other kernels from FI, PDL is automatically determined

Yes this is basically migrating the launch logic from flashinfer while supporting ROCM in the other path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

amd ready-to-merge The PR is ready to merge after the CI is green. run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.