-
Notifications
You must be signed in to change notification settings - Fork 581
Fix moe fp8 failure for sm121 #2061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe SM check in get_candidate_tiles for FP8 with GROUPED_GEMM was extended to include SM 121 alongside SM 89 and SM 120, causing the function to return the same candidate CutlassTileConfig options for SM121 as for SM89/SM120. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @yongwww, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where FP8 Mixture-of-Experts (MoE) operations were failing on SM121 GPU architectures. The fix involves updating the internal Cutlass heuristic configurations to correctly recognize and support SM121 for grouped GEMM operations, thereby stabilizing performance and preventing pipeline failures on these devices. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1)
161-161: LGTM! Correct fix for SM 121 support.The addition of
|| sm == 121properly enables FP8 GROUPED_GEMM tile configurations for SM 121 (Blackwell architecture). Without this change, SM 121 would fall through to the else block and return an empty configuration set, causing the failure referenced in the PR.Optional observation: There's a minor style inconsistency between line 161 (explicit enumeration:
sm == 89 || sm == 120 || sm == 121) and line 174 (range check:sm == 89 || sm >= 120). Both are functionally correct for excluding SM 90, but consider standardizing the approach for maintainability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Deploy Docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the sm121 architecture for FP8 grouped GEMM operations. The change is minimal and correct, extending an existing condition to include the new architecture. This seems to be a targeted fix for a new hardware variant, and it aligns with how other parts of the codebase appear to handle architecture-specific configurations. The change is approved.
|
/bot run |
yzh119
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on the fix @yongwww !
|
/bot run |
|
This PR was created before #2020 was merged, and it seems #2020 have updated the condition to: which address the CI failure. But this PR (change the condition to |
|
Thanks, @yzh119 for the insights! |
I agree with @yzh119 that being explicit about |
📌 Description
fix the failure for sm121 in pipeline
🔍 Related Issues
🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.
✅ Pre-commit Checks
pre-commitby runningpip install pre-commit(or used your preferred method).pre-commit install.pre-commit run --all-filesand fixed any reported issues.🧪 Tests
unittest, etc.).Reviewer Notes
Summary by CodeRabbit