Fix moe fp8 failure for sm121 #2061

yongwww · 2025-11-07T05:35:21Z

📌 Description

fix the failure for sm121 in pipeline

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

Bug Fixes
- Extended FP8 grouped matrix-multiplication support to include an additional GPU architecture (SM121), providing the same optimized tile configuration options as the previously supported SM variants, improving performance consistency and broader hardware compatibility for FP8 workloads.

coderabbitai · 2025-11-07T05:35:31Z

Walkthrough

The SM check in get_candidate_tiles for FP8 with GROUPED_GEMM was extended to include SM 121 alongside SM 89 and SM 120, causing the function to return the same candidate CutlassTileConfig options for SM121 as for SM89/SM120.

Changes

Cohort / File(s)	Change Summary
CUTLASS heuristic adjustment `csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp`	Extended the SM check in `get_candidate_tiles` for FP8 + GROUPED_GEMM to also match `sm == 121`, aligning SM121 with existing SM89/SM120 candidate tile selections.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Verify SM121 compatibility with selected tile configs and any SM-specific assumptions in get_candidate_tiles.
Check downstream code paths that consume CutlassTileConfig for SM-specific behavior and run relevant tests/benchmarks.

Suggested reviewers

yongwww
cyx-6
wenscarl
djmmoss

Poem

🐰 I hopped through lines with eager eyes,
SM121 found its ties,
Tiles now match, a gentle tune,
Kernel hops beneath the moon,
Crunching numbers — carrot prize! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	❓ Inconclusive	The PR description is incomplete. While it follows the template structure, the Description section only contains a vague reference to a CI pipeline failure without explaining what was fixed, why the fix was needed, or how it addresses the issue.	Expand the Description section to explain the root cause of the SM121 failure and how the code change (matching SM121 with SM89/120 tile configurations) resolves it. This will provide reviewers with better context for understanding the fix.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix moe fp8 failure for sm121' directly relates to the main change in the PR, which adds SM121 support to the FP8 GROUPED_GEMM configuration in CutlassTileConfig.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 266c639 and 4c7bfaa.

📒 Files selected for processing (1)

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2025-11-07T05:35:39Z

Summary of Changes

Hello @yongwww, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where FP8 Mixture-of-Experts (MoE) operations were failing on SM121 GPU architectures. The fix involves updating the internal Cutlass heuristic configurations to correctly recognize and support SM121 for grouped GEMM operations, thereby stabilizing performance and preventing pipeline failures on these devices.

Highlights

Fix for SM121 FP8 MoE Failures: Resolved a pipeline failure specifically affecting SM121 GPUs when performing FP8 Mixture-of-Experts (MoE) operations.
Cutlass Heuristic Configuration Update: Extended the Cutlass heuristic to include SM121 architecture, ensuring proper selection of candidate tiles for FP8 grouped GEMM operations.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1)

161-161: LGTM! Correct fix for SM 121 support.

The addition of || sm == 121 properly enables FP8 GROUPED_GEMM tile configurations for SM 121 (Blackwell architecture). Without this change, SM 121 would fall through to the else block and return an empty configuration set, causing the failure referenced in the PR.

Optional observation: There's a minor style inconsistency between line 161 (explicit enumeration: sm == 89 || sm == 120 || sm == 121) and line 174 (range check: sm == 89 || sm >= 120). Both are functionally correct for excluding SM 90, but consider standardizing the approach for maintainability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between adcc5dd and 7beaad0.

📒 Files selected for processing (1)

csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Deploy Docs

gemini-code-assist

Code Review

This pull request adds support for the sm121 architecture for FP8 grouped GEMM operations. The change is minimal and correct, extending an existing condition to include the new architecture. This seems to be a targeted fix for a new hardware variant, and it aligns with how other parts of the codebase appear to handle architecture-specific configurations. The change is approved.

yzh119 · 2025-11-07T05:40:39Z

/bot run

flashinfer-bot · 2025-11-07T05:41:20Z

GitLab MR !120 has been created, and the CI pipeline #38057358 is currently running. I'll report back once the pipeline job completes.

yzh119

Thanks for working on the fix @yongwww !

yzh119 · 2025-11-07T10:01:08Z

/bot run

flashinfer-bot · 2025-11-07T10:01:58Z

GitLab MR !120 has been updated with latest changes, and the CI pipeline #38069122 is currently running. I'll report back once the pipeline job completes.

yzh119 · 2025-11-07T20:02:54Z

This PR was created before #2020 was merged, and it seems #2020 have updated the condition to:

if (sm == 89 || sm >= 120) {

which address the CI failure.

But this PR (change the condition to if (sm == 89 || sm == 120 || sm == 121) {) still seems reasonable to me (if you consider future architectures).

yongwww · 2025-11-07T21:23:26Z

Thanks, @yzh119 for the insights! if (sm == 89 || sm >= 120) { should fix the issue for SM121 now, if (sm == 89 || sm == 120 || sm == 121) { makes more sense to me (don't have a strong idea though). Feel free to close this pr if we'd like to go with if (sm == 89 || sm >= 120) {.

bkryu · 2025-11-07T21:53:58Z

Thanks, @yzh119 for the insights! if (sm == 89 || sm >= 120) { should fix the issue for SM121 now, if (sm == 89 || sm == 120 || sm == 121) { makes more sense to me (don't have a strong idea though). Feel free to close this pr if we'd like to go with if (sm == 89 || sm >= 120) {.

I agree with @yzh119 that being explicit about sm == 89 || sm == 120 || sm == 121 is preferred over the sm == 9 || sm >= 120 for future SM archs.

yongwww requested review from cyx-6, djmmoss, wenscarl and yzh119 as code owners November 7, 2025 05:35

coderabbitai bot reviewed Nov 7, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

yzh119 approved these changes Nov 7, 2025

View reviewed changes

Fix moe fp8 failure for sm121

266c639

yzh119 force-pushed the moe_fp8 branch from 7beaad0 to 266c639 Compare November 7, 2025 10:00

Merge branch 'main' into moe_fp8

4c7bfaa

yongwww merged commit e450c7d into flashinfer-ai:main Nov 7, 2025
4 checks passed

yongwww deleted the moe_fp8 branch November 7, 2025 22:54

Fix moe fp8 failure for sm121 #2061

Fix moe fp8 failure for sm121 #2061

Uh oh!

Conversation

yongwww commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

gemini-code-assist bot commented Nov 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

yzh119 commented Nov 7, 2025

Uh oh!

flashinfer-bot commented Nov 7, 2025

Uh oh!

yzh119 left a comment

Choose a reason for hiding this comment

Uh oh!

yzh119 commented Nov 7, 2025

Uh oh!

flashinfer-bot commented Nov 7, 2025

Uh oh!

yzh119 commented Nov 7, 2025

Uh oh!

yongwww commented Nov 7, 2025

Uh oh!

bkryu commented Nov 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yongwww commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2025 •

edited

Loading