[AMD]Integrate aiter's fused_topk for softmax scoring in topk function#21421
[AMD]Integrate aiter's fused_topk for softmax scoring in topk function#21421HaiShaw merged 2 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request aims to significantly improve the performance of Mixture-of-Experts (MoE) TopK operations, especially on ROCm/HIP platforms, by leveraging the aiter library's fused softmax+topk kernels. The changes introduce conditional logic to utilize aiter's optimized functions when available, providing an auto-dispatch mechanism for efficient computation, while maintaining a graceful fallback for environments where aiter is not enabled. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request integrates aiter's fused_topk functionality into the fused_topk function for softmax scoring when _use_aiter is enabled. The changes involve adding a new import for topk_softmax and conditionally using aiter.fused_moe.fused_topk. Feedback suggests that the newly added topk_softmax import is unused and should be removed. Additionally, the aiter.fused_moe.fused_topk import, currently located within the fused_topk function, should be moved to the top-level try-except block for better code organization and centralization of aiter imports.
05b6cac to
f6fc125
Compare
…hancing performance with auto-dispatch capabilities. Fall back to topk_softmax if aiter is not available.
…softmax scoring implementation, ensuring compatibility with aiter's features.
f6fc125 to
b870fa8
Compare
sgl-project#21421) Co-authored-by: Chen, Todd <zhenchen@amd.com>
sgl-project#21421) Co-authored-by: Chen, Todd <zhenchen@amd.com>
sgl-project#21421) Co-authored-by: Chen, Todd <zhenchen@amd.com>
Motivation
Enable AIter-backed paths for ROCm/HIP to fuse softmax+topk: MoE TopK.
Modifications
When aiter is enabled, by default use
aiter.fused_topkAccuracy Tests
Before
After
Comparation for different impls of topk_softmax.
Baseline sgl-kernel.
SUMMARY — deepseek-ai/DeepSeek-V3 (E=[256], topk=[8])
SUMMARY — Qwen/Qwen3.5-397B-A17B (E=[512], topk=[10])
Benchmark
bs=64, 1k1k
Before
After
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci