Skip to content

docs: Fix incorrect column-major scale layout in FP8 GEMM docstrings#2614

Open
bledden wants to merge 2 commits intoflashinfer-ai:mainfrom
bledden:fix/fp8-scale-docstring-layout
Open

docs: Fix incorrect column-major scale layout in FP8 GEMM docstrings#2614
bledden wants to merge 2 commits intoflashinfer-ai:mainfrom
bledden:fix/fp8-scale-docstring-layout

Conversation

@bledden
Copy link
Copy Markdown

@bledden bledden commented Feb 21, 2026

Summary

Fixes the a_scale parameter docstrings in three FP8 GEMM functions that incorrectly described the scale tensor layout as "Column-major" when the kernel actually expects standard contiguous (row-major) tensors.

Functions fixed:

  • gemm_fp8_nt_groupwise
  • group_gemm_fp8_nt_groupwise
  • group_gemm_mxfp8_mxfp4_nt_groupwise

Validation

I verified the correct layout by cross-referencing three sources:

  1. quantize_fp8 in flashinfer/testing/utils.py — produces standard contiguous PyTorch tensors (row-major) for the scales. The returned x_scale is never transposed before being returned.

  2. Test suite (tests/gemm/test_groupwise_scaled_gemm_fp8.py) — creates a_scale via quantize_fp8() and passes the resulting contiguous tensor directly to the GEMM functions without any transposition. These tests pass, confirming row-major is the correct layout.

  3. Existing b_scale docs — already correctly say "Row-major scale tensor for b" in the same docstrings. The a_scale description was the only inconsistency.

Fixes #2147

Summary by CodeRabbit

  • Documentation
    • Clarified docstrings for FP8/FP4 groupwise GEMM routines: a_scale and b_scale are Row-major when scale_major_mode is "K"; tensor shape conventions otherwise remain unchanged. No functional behavior changed.

… docstrings

The a_scale parameter docstrings in gemm_fp8_nt_groupwise,
group_gemm_fp8_nt_groupwise, and group_gemm_mxfp8_mxfp4_nt_groupwise
incorrectly described the scale tensor as "Column-major". The kernel
actually expects standard contiguous (row-major) tensors, consistent
with what quantize_fp8 produces and the test suite passes.

Changed "Column-major" to "Row-major" in all three a_scale descriptions
to match the b_scale docs, which already correctly say "Row-major".

Fixes flashinfer-ai#2147

Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @bledden, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an inconsistency in the documentation for several FP8 GEMM functions. It rectifies the description of the a_scale parameter, clarifying that it expects a row-major tensor rather than a column-major one. This correction ensures that users correctly understand the required input format, preventing potential issues arising from misinterpretation of the docstrings.

Highlights

  • Docstring Correction for FP8 GEMM a_scale: Updated the a_scale parameter docstrings in gemm_fp8_nt_groupwise, group_gemm_fp8_nt_groupwise, and group_gemm_mxfp8_mxfp4_nt_groupwise functions from "Column-major" to "Row-major" to accurately reflect the expected tensor layout.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • flashinfer/gemm/gemm_base.py
    • Updated the a_scale docstring in gemm_fp8_nt_groupwise to specify "Row-major" instead of "Column-major".
    • Updated the a_scale docstring in group_gemm_fp8_nt_groupwise to specify "Row-major" instead of "Column-major".
    • Updated the a_scale docstring in group_gemm_mxfp8_mxfp4_nt_groupwise to specify "Row-major" instead of "Column-major".
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 21, 2026

No actionable comments were generated in the recent review. 🎉


📝 Walkthrough

Walkthrough

Docstring corrections: updated scale-tensor memory-layout descriptions for several FP8/FP4 groupwise GEMM functions in flashinfer/gemm/gemm_base.py to specify Row-major layout when scale_major_mode is "K". No executable code changed.

Changes

Cohort / File(s) Summary
FP8/FP4 groupwise GEMM docstrings
flashinfer/gemm/gemm_base.py
Updated docstrings for gemm_fp8_nt_groupwise, group_gemm_fp8_nt_groupwise, and group_gemm_mxfp8_mxfp4_nt_groupwise to describe a_scale/b_scale as Row-major when scale_major_mode is "K"; preserved existing shape descriptions for other modes. No logic changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

Suggested reviewers

  • nvmbreughe
  • cyx-6
  • yzh119

Poem

🐰 I hopped through docstrings, tidy and spry,
Row-major scales now catch the right eye.
Three functions aligned, no code to rewrite,
Just clearer directions — precise and polite. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the primary change: fixing incorrect docstring documentation for column-major scale layouts in FP8 GEMM functions.
Description check ✅ Passed The PR description comprehensively explains the fix, validation methodology across three sources, affected functions, and links to the related issue #2147.
Linked Issues check ✅ Passed The PR successfully addresses issue #2147 by correcting a_scale docstrings from column-major to row-major, matching the implementation and test behavior verified against quantize_fp8 utility and test suite.
Out of Scope Changes check ✅ Passed All changes are scoped to docstring corrections in FP8 GEMM functions; no unrelated code modifications or functional changes are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request correctly addresses a documentation inaccuracy in the FP8 GEMM functions. The a_scale parameter was previously described as having a column-major layout, which is incorrect as the underlying kernels expect standard row-major (contiguous) PyTorch tensors. This fix ensures that the documentation is consistent with the implementation and the b_scale parameter description. The changes are applied to gemm_fp8_nt_groupwise, group_gemm_fp8_nt_groupwise, and group_gemm_mxfp8_mxfp4_nt_groupwise.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
flashinfer/gemm/gemm_base.py (1)

4213-4215: ⚠️ Potential issue | 🟡 Minor

Pre-existing typo in adjacent b_scale docstring: scale_major_kscale_major_mode.

While not introduced by this PR, fixing it here keeps the docstring fully consistent since scale_major_k is not a valid parameter name.

📝 Proposed fix
-        Row-major scale tensor for b, shape ``(n // block_size, k // block_size)`` if scale_major_k is ``K``
+        Row-major scale tensor for b, shape ``(n // block_size, k // block_size)`` if scale_major_mode is ``K``
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@flashinfer/gemm/gemm_base.py` around lines 4213 - 4215, Fix the typo in the
b_scale docstring in gemm_base.py: replace the incorrect parameter name
"scale_major_k" with the correct "scale_major_mode" in the sentence describing
the row-major scale tensor shapes for backend "cutlass" so the docstring for
parameter b_scale consistently references scale_major_mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4213-4215: Fix the typo in the b_scale docstring in gemm_base.py:
replace the incorrect parameter name "scale_major_k" with the correct
"scale_major_mode" in the sentence describing the row-major scale tensor shapes
for backend "cutlass" so the docstring for parameter b_scale consistently
references scale_major_mode.

Per CodeRabbit feedback, the adjacent b_scale docstring had
scale_major_k instead of scale_major_mode — fixing while I'm
already editing these docstrings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[DOC] FlashInfer Blockwise FP8 Scale Layout Issue

1 participant