Skip to content

Conversation

@yongwww
Copy link
Member

@yongwww yongwww commented Nov 7, 2025

📌 Description

fix the failure for sm121 in pipeline

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

  • I have installed pre-commit by running pip install pre-commit (or used your preferred method).
  • I have installed the hooks with pre-commit install.
  • I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

  • Tests have been added or updated as needed.
  • All tests are passing (unittest, etc.).

Reviewer Notes

Summary by CodeRabbit

  • Bug Fixes
    • Extended FP8 grouped matrix-multiplication support to include an additional GPU architecture (SM121), providing the same optimized tile configuration options as the previously supported SM variants, improving performance consistency and broader hardware compatibility for FP8 workloads.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

Walkthrough

The SM check in get_candidate_tiles for FP8 with GROUPED_GEMM was extended to include SM 121 alongside SM 89 and SM 120, causing the function to return the same candidate CutlassTileConfig options for SM121 as for SM89/SM120.

Changes

Cohort / File(s) Change Summary
CUTLASS heuristic adjustment
csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp
Extended the SM check in get_candidate_tiles for FP8 + GROUPED_GEMM to also match sm == 121, aligning SM121 with existing SM89/SM120 candidate tile selections.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

  • Verify SM121 compatibility with selected tile configs and any SM-specific assumptions in get_candidate_tiles.
  • Check downstream code paths that consume CutlassTileConfig for SM-specific behavior and run relevant tests/benchmarks.

Suggested reviewers

  • yongwww
  • cyx-6
  • wenscarl
  • djmmoss

Poem

🐰 I hopped through lines with eager eyes,
SM121 found its ties,
Tiles now match, a gentle tune,
Kernel hops beneath the moon,
Crunching numbers — carrot prize! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ❓ Inconclusive The PR description is incomplete. While it follows the template structure, the Description section only contains a vague reference to a CI pipeline failure without explaining what was fixed, why the fix was needed, or how it addresses the issue. Expand the Description section to explain the root cause of the SM121 failure and how the code change (matching SM121 with SM89/120 tile configurations) resolves it. This will provide reviewers with better context for understanding the fix.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix moe fp8 failure for sm121' directly relates to the main change in the PR, which adds SM121 support to the FP8 GROUPED_GEMM configuration in CutlassTileConfig.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 266c639 and 4c7bfaa.

📒 Files selected for processing (1)
  • csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @yongwww, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where FP8 Mixture-of-Experts (MoE) operations were failing on SM121 GPU architectures. The fix involves updating the internal Cutlass heuristic configurations to correctly recognize and support SM121 for grouped GEMM operations, thereby stabilizing performance and preventing pipeline failures on these devices.

Highlights

  • Fix for SM121 FP8 MoE Failures: Resolved a pipeline failure specifically affecting SM121 GPUs when performing FP8 Mixture-of-Experts (MoE) operations.
  • Cutlass Heuristic Configuration Update: Extended the Cutlass heuristic to include SM121 architecture, ensuring proper selection of candidate tiles for FP8 grouped GEMM operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1)

161-161: LGTM! Correct fix for SM 121 support.

The addition of || sm == 121 properly enables FP8 GROUPED_GEMM tile configurations for SM 121 (Blackwell architecture). Without this change, SM 121 would fall through to the else block and return an empty configuration set, causing the failure referenced in the PR.

Optional observation: There's a minor style inconsistency between line 161 (explicit enumeration: sm == 89 || sm == 120 || sm == 121) and line 174 (range check: sm == 89 || sm >= 120). Both are functionally correct for excluding SM 90, but consider standardizing the approach for maintainability.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between adcc5dd and 7beaad0.

📒 Files selected for processing (1)
  • csrc/nv_internal/tensorrt_llm/kernels/cutlass_kernels/cutlass_heuristic.cpp (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Deploy Docs

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the sm121 architecture for FP8 grouped GEMM operations. The change is minimal and correct, extending an existing condition to include the new architecture. This seems to be a targeted fix for a new hardware variant, and it aligns with how other parts of the codebase appear to handle architecture-specific configurations. The change is approved.

@yzh119
Copy link
Collaborator

yzh119 commented Nov 7, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !120 has been created, and the CI pipeline #38057358 is currently running. I'll report back once the pipeline job completes.

Copy link
Collaborator

@yzh119 yzh119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on the fix @yongwww !

@yzh119
Copy link
Collaborator

yzh119 commented Nov 7, 2025

/bot run

@flashinfer-bot
Copy link
Collaborator

GitLab MR !120 has been updated with latest changes, and the CI pipeline #38069122 is currently running. I'll report back once the pipeline job completes.

@yzh119
Copy link
Collaborator

yzh119 commented Nov 7, 2025

This PR was created before #2020 was merged, and it seems #2020 have updated the condition to:

if (sm == 89 || sm >= 120) {

which address the CI failure.

But this PR (change the condition to if (sm == 89 || sm == 120 || sm == 121) {) still seems reasonable to me (if you consider future architectures).

@yongwww
Copy link
Member Author

yongwww commented Nov 7, 2025

Thanks, @yzh119 for the insights! if (sm == 89 || sm >= 120) { should fix the issue for SM121 now, if (sm == 89 || sm == 120 || sm == 121) { makes more sense to me (don't have a strong idea though). Feel free to close this pr if we'd like to go with if (sm == 89 || sm >= 120) {.

@bkryu
Copy link
Collaborator

bkryu commented Nov 7, 2025

Thanks, @yzh119 for the insights! if (sm == 89 || sm >= 120) { should fix the issue for SM121 now, if (sm == 89 || sm == 120 || sm == 121) { makes more sense to me (don't have a strong idea though). Feel free to close this pr if we'd like to go with if (sm == 89 || sm >= 120) {.

I agree with @yzh119 that being explicit about sm == 89 || sm == 120 || sm == 121 is preferred over the sm == 9 || sm >= 120 for future SM archs.

@yongwww yongwww merged commit e450c7d into flashinfer-ai:main Nov 7, 2025
4 checks passed
@yongwww yongwww deleted the moe_fp8 branch November 7, 2025 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants