[MUSA][8/N] Port CUDA kernels that are compatible with MUSA#17946
Conversation
Summary of ChangesHello @yafengio, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly advances the support for Moore Threads GPUs by porting a wide array of existing CUDA kernels to the MUSA architecture. The changes introduce MUSA-specific implementations for critical operations like all-reduce and fused normalization, ensuring optimal performance and compatibility. This work is a crucial step towards enabling efficient Large Language Model (LLM) inference on Moore Threads hardware, maintaining a unified codebase through conditional compilation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request ports a significant number of CUDA kernels to the MUSA architecture, enabling GPU acceleration on Moore Threads hardware. The changes include adding MUSA-specific implementations using preprocessor guards, updating build configurations, and introducing a custom AllReduce implementation for MUSA. The porting effort is extensive and seems well-executed. I've found one critical issue related to a missing kernel dispatch path for the float data type and a few minor maintainability issues. Overall, great work on this large porting task.
0d68ad3 to
607c88f
Compare
607c88f to
3dd4964
Compare
|
/tag-and-rerun-ci |
|
Please also rebase onto |
30bfc6c to
ce7e6a4
Compare
3ccac0f to
4226d4c
Compare
|
/rerun-failed-ci |
|
/rerun-failed-ci |
4226d4c to
43fe4c0
Compare
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
d890804 to
7753f29
Compare
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
7753f29 to
2470b79
Compare
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
2470b79 to
121ebf8
Compare
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
121ebf8 to
0d7bd1d
Compare
Signed-off-by: yafeng.li <yafeng.li@mthreads.com>
0d7bd1d to
8c7dd1e
Compare
|
/rerun-failed-ci |
|
Is this just CI Flakiness or what are issues for merging this? |
We've been unable to get all NVIDIA CI checks to pass despite multiple attempts. Could you please help merge this? Thanks! |
|
/rerun-failed-ci |
2 similar comments
|
/rerun-failed-ci |
|
/rerun-failed-ci |
…ect#17946) Signed-off-by: yafeng.li <yafeng.li@mthreads.com> Co-authored-by: Alex Nails <alex.nails@radixark.ai>
Motivation
This PR continues the ongoing effort (tracked in #16565) to add full support for Moore Threads GPUs in SGLang by leveraging MUSA (Meta-computing Unified System Architecture) for LLM inference.
The primary goal of this submission is to enable core kernel functionality on MUSA by porting CUDA kernels that are compatible with the MUSA programming model, while keeping the codebase unified across CUDA, ROCm, and MUSA backends.
What’s Changed
This PR focuses on the following areas:
1. CUDA Kernel Porting to MUSA
Ported a set of CUDA kernels that are compatible with MUSA to native MUSA implementations.
Covered kernel categories include:
Conditional compilation via
USE_MUSAis used to preserve multi-backend compatibility.2. Custom AllReduce for MUSA
custom_all_reduce_2shot.3. Build System Integration
Added and updated MUSA-specific build configuration:
setup_musa.pypyproject_musa.tomlUpdated dependencies to ensure compatibility (e.g. bumped
torchada).Verified that
sgl-kernelcan be built and installed in a cleantorch_musacontainer.Testing Done
Tested in a clean torch_musa container.
<===Click to expand log details===>
Launch the server:
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci