docs: Fix incorrect column-major scale layout in FP8 GEMM docstrings#2614
docs: Fix incorrect column-major scale layout in FP8 GEMM docstrings#2614bledden wants to merge 2 commits intoflashinfer-ai:mainfrom
Conversation
… docstrings The a_scale parameter docstrings in gemm_fp8_nt_groupwise, group_gemm_fp8_nt_groupwise, and group_gemm_mxfp8_mxfp4_nt_groupwise incorrectly described the scale tensor as "Column-major". The kernel actually expects standard contiguous (row-major) tensors, consistent with what quantize_fp8 produces and the test suite passes. Changed "Column-major" to "Row-major" in all three a_scale descriptions to match the b_scale docs, which already correctly say "Row-major". Fixes flashinfer-ai#2147 Signed-off-by: Blake Ledden <bledden@users.noreply.github.com>
Summary of ChangesHello @bledden, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an inconsistency in the documentation for several FP8 GEMM functions. It rectifies the description of the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
No actionable comments were generated in the recent review. 🎉 📝 WalkthroughWalkthroughDocstring corrections: updated scale-tensor memory-layout descriptions for several FP8/FP4 groupwise GEMM functions in Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
The pull request correctly addresses a documentation inaccuracy in the FP8 GEMM functions. The a_scale parameter was previously described as having a column-major layout, which is incorrect as the underlying kernels expect standard row-major (contiguous) PyTorch tensors. This fix ensures that the documentation is consistent with the implementation and the b_scale parameter description. The changes are applied to gemm_fp8_nt_groupwise, group_gemm_fp8_nt_groupwise, and group_gemm_mxfp8_mxfp4_nt_groupwise.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
flashinfer/gemm/gemm_base.py (1)
4213-4215:⚠️ Potential issue | 🟡 MinorPre-existing typo in adjacent
b_scaledocstring:scale_major_k→scale_major_mode.While not introduced by this PR, fixing it here keeps the docstring fully consistent since
scale_major_kis not a valid parameter name.📝 Proposed fix
- Row-major scale tensor for b, shape ``(n // block_size, k // block_size)`` if scale_major_k is ``K`` + Row-major scale tensor for b, shape ``(n // block_size, k // block_size)`` if scale_major_mode is ``K``🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@flashinfer/gemm/gemm_base.py` around lines 4213 - 4215, Fix the typo in the b_scale docstring in gemm_base.py: replace the incorrect parameter name "scale_major_k" with the correct "scale_major_mode" in the sentence describing the row-major scale tensor shapes for backend "cutlass" so the docstring for parameter b_scale consistently references scale_major_mode.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@flashinfer/gemm/gemm_base.py`:
- Around line 4213-4215: Fix the typo in the b_scale docstring in gemm_base.py:
replace the incorrect parameter name "scale_major_k" with the correct
"scale_major_mode" in the sentence describing the row-major scale tensor shapes
for backend "cutlass" so the docstring for parameter b_scale consistently
references scale_major_mode.
Per CodeRabbit feedback, the adjacent b_scale docstring had scale_major_k instead of scale_major_mode — fixing while I'm already editing these docstrings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Fixes the
a_scaleparameter docstrings in three FP8 GEMM functions that incorrectly described the scale tensor layout as "Column-major" when the kernel actually expects standard contiguous (row-major) tensors.Functions fixed:
gemm_fp8_nt_groupwisegroup_gemm_fp8_nt_groupwisegroup_gemm_mxfp8_mxfp4_nt_groupwiseValidation
I verified the correct layout by cross-referencing three sources:
quantize_fp8inflashinfer/testing/utils.py— produces standard contiguous PyTorch tensors (row-major) for the scales. The returnedx_scaleis never transposed before being returned.Test suite (
tests/gemm/test_groupwise_scaled_gemm_fp8.py) — createsa_scaleviaquantize_fp8()and passes the resulting contiguous tensor directly to the GEMM functions without any transposition. These tests pass, confirming row-major is the correct layout.Existing
b_scaledocs — already correctly say "Row-major scale tensor for b" in the same docstrings. Thea_scaledescription was the only inconsistency.Fixes #2147
Summary by CodeRabbit