-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Support compile sgl-kernel on cuda 13.0 #9721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support compile sgl-kernel on cuda 13.0 #9721
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @rainj-me, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces comprehensive support for CUDA 13.0, enabling compatibility with newer NVIDIA GPU architectures and features. It updates the build system to incorporate specific CUDA 13.0 flags and integrates a custom version of FlashInfer. Additionally, it refactors the Marlin MoE kernel compilation process for improved organization and addresses CUB library compatibility issues with CUDA 13.0 by adopting newer CUDA functional APIs for reduction operations.
Highlights
- CUDA 13.0 Compatibility: This PR introduces comprehensive support for CUDA 13.0, enabling the project to leverage the latest NVIDIA GPU architectures and features. This includes adding specific
sm_110andsm_121compute capabilities and the--compress-mode=sizeflag for CUDA 13.0 builds, while ensuring backward compatibility for older CUDA versions. - FlashInfer Dependency Update: The FlashInfer dependency has been updated to a specific commit (
018b551825c8e5579206e6eb9d3229fa679202b3), indicating the integration of a custom or updated version of FlashInfer that is likely optimized or required for CUDA 13.0 compatibility. - Marlin MoE Kernel Refactoring: The Marlin MoE kernel compilation process has been refactored. Individual
.cukernel files are now renamed to.cuhand consolidated into a single generated header file,kernel_marlin.cuh. This streamlines the build process and improves code organization by centralizing kernel includes. - CUB API Modernization for CUDA 13.0: To maintain compatibility and address deprecations in CUDA 13.0, the
moe_topk_softmax_kernels.cufile has been updated to usecuda::functionalAPIs (cuda::maximum,cuda::minimum) for reduction operations when building with CUDA 13.0 or newer. This ensures the code remains robust with the latest CUDA toolkit.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for CUDA 13.0. The changes include updating the flashinfer dependency, adding new CUDA architecture flags for compilation, and refactoring the marlin MoE kernels to be included as headers. It also addresses API deprecations in CUB for CUDA 13.0. The changes look good and are well-motivated. My feedback focuses on improving the maintainability and portability of the build scripts and code comments.
| def remove_old_kernels(): | ||
| for filename in glob.glob(os.path.dirname(__file__) + "/kernel_*.cu"): | ||
| for filename in glob.glob(os.path.dirname(__file__) + "/kernel_*.cuh"): | ||
| subprocess.call(["rm", "-f", filename]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using subprocess.call with rm is not platform-independent. It's better to use Python's os module for file operations to ensure portability. os.path.join should also be used for constructing paths. This change improves the robustness and portability of the script.
| def remove_old_kernels(): | |
| for filename in glob.glob(os.path.dirname(__file__) + "/kernel_*.cu"): | |
| for filename in glob.glob(os.path.dirname(__file__) + "/kernel_*.cuh"): | |
| subprocess.call(["rm", "-f", filename]) | |
| def remove_old_kernels(): | |
| for filename in glob.glob(os.path.join(os.path.dirname(__file__), "kernel_*.cuh")): | |
| try: | |
| os.remove(filename) | |
| except OSError as e: | |
| print(f"Error removing file {filename}: {e}") |
| // Define reduction operators based on CUDA version | ||
| // CUDA 13 (12.9+) deprecated cub::Max/Min in favor of cuda::maximum/minimum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment explaining the CUDA version check is a bit unclear. The version 12090 is likely for a pre-release of CUDA 13.0. A more explicit comment would improve clarity for future maintainers.
// Define reduction operators based on CUDA version.
// In CUDA 13.0, cub::Max/Min were deprecated in favor of cuda::maximum/minimum.
// The version check for 12090 handles pre-release versions of CUDA 13.0.
FlamingoPg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @zhyncs any other comment?
|
@rainj-me need rebase master |
Motivation
#9490
PR 9495
Test
diff --git a/python/pyproject.toml b/python/pyproject.toml index c23efbc2e..b29789d45 100644 --- a/python/pyproject.toml +++ b/python/pyproject.toml @@ -49,7 +49,7 @@ runtime_common = [ "scipy", "timm==1.0.16", "tiktoken", - "torchao==0.9.0", + "torchao==0.12.0+git", "transformers==4.55.2", "uvicorn", "uvloop", @@ -59,21 +59,19 @@ runtime_common = [ srt = [ "sglang[runtime_common]", "sgl-kernel==0.3.5", - "torch==2.8.0", - "torchaudio==2.8.0", + "torch==2.8.0a0+34c6371d24.nv25.8", "torchvision", "cuda-python", - "flashinfer_python==0.2.11.post3", + "flashinfer_python==0.2.14.post1", ] blackwell = [ "sglang[runtime_common]", "sgl-kernel", - "torch==2.8.0", - "torchaudio==2.8.0", + "torch==2.8.0a0+34c6371d24.nv25.8", "torchvision", "cuda-python", - "flashinfer_python==0.2.11.post3", + "flashinfer_python==0.2.14.post1", ]Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist