Skip to content

fix: snap weight_scale_vec_size to handle block_scale_interleave padding for SM120#2898

Open
samuellees wants to merge 7 commits intoflashinfer-ai:mainfrom
samuellees:fix/sm120-mxfp4-moe-weight-scale
Open

fix: snap weight_scale_vec_size to handle block_scale_interleave padding for SM120#2898
samuellees wants to merge 7 commits intoflashinfer-ai:mainfrom
samuellees:fix/sm120-mxfp4-moe-weight-scale

Conversation

@samuellees
Copy link
Copy Markdown
Collaborator

@samuellees samuellees commented Mar 26, 2026

Fixes #2847

SM120 GPUs (RTX PRO 6000, RTX 5090) cannot use MXFP4 MoE CUTLASS kernels because the weight_scale_vec_size check in
trtllm_fp4_block_scale_moe rejects valid inputs.

Root cause: block_scale_interleave pads scale columns via round_up(cols, 4), inflating
gemm1_weights_scale.numel(). When hidden_size / sf_block_size is not a multiple of 4 (e.g. gpt-oss-120b:
hidden_size=2880, sf_block_size=32 → 90 scale cols padded to 92), the reverse-computed weight_scale_vec_size becomes 31
instead of 32, failing the strict equality check.

Fix: Replace the hard-coded equality check with snap-to-nearest-valid (16 or 32) plus a round-trip validation
ensuring the actual scale tensor numel is at least as large as the unpadded expectation.

Changes

  • csrc/trtllm_fused_moe_kernel_launcher.cu: Replace weight_scale_vec_size check logic in trtllm_fp4_block_scale_moe
  • tests/moe/test_trtllm_cutlass_fused_moe.py: Add test_moe_nvfp4_unaligned_hidden_size and
    test_moe_mxfp8_mxfp4_unaligned_hidden_size with hidden_size=2880 to cover the padding scenario

Test plan

  • Existing MoE tests pass without regression: pytest tests/moe/test_trtllm_cutlass_fused_moe.py -x
  • New unaligned hidden_size tests pass (requires SM100/SM110/SM120 GPU)
  • pre-commit lint passes

Summary by CodeRabbit

  • Bug Fixes

    • More tolerant fused MoE weight/scale sizing with vec-size snapped to 16/32 and clearer validation errors for unaligned hidden sizes.
    • FP4 quantization scale layout now accounts for padded/interleaved columns (aligned to blocks of 4) so scale buffers are interpreted correctly.
  • Tests

    • Added CUDA regression test for fused MoE with an unaligned hidden size under NVFP4 quantization.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical compatibility issue affecting NVIDIA SM120 GPUs when executing MXFP4 Mixture-of-Experts (MoE) CUTLASS kernels. The fix addresses an erroneous calculation of the weight_scale_vec_size that arose from padding in the block_scale_interleave process, which previously rejected valid inputs. By implementing a more robust scale vector size determination and validation, this change ensures correct and efficient operation of MoE kernels across supported GPU architectures, particularly for models with specific hidden size configurations that previously triggered the bug.

Highlights

  • Issue Resolution: Fixed a bug preventing SM120 GPUs from using MXFP4 MoE CUTLASS kernels due to an incorrect weight_scale_vec_size check.
  • Root Cause Addressed: Corrected the miscalculation of weight_scale_vec_size which was caused by block_scale_interleave padding scale columns, leading to an inflated numel().
  • Improved Calculation Logic: Replaced the strict equality check for weight_scale_vec_size with a snap-to-nearest-valid (16 or 32) approach, coupled with a round-trip validation to ensure tensor size integrity.
  • Enhanced Test Coverage: Introduced new test cases (test_moe_nvfp4_unaligned_hidden_size and test_moe_mxfp8_mxfp4_unaligned_hidden_size) specifically designed to validate the fix for scenarios involving unaligned hidden sizes (e.g., hidden_size=2880).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adjusted FP4/MXFP MoE scale-vector sizing in the TRT-LLM CUDA launcher to compute a raw vec size, snap it to 16 or 32, validate against padded host scale tensors, added a CUDA-only regression test for an unaligned hidden size, and made FP4 scale-factor reshaping padding-aware.

Changes

Cohort / File(s) Summary
Kernel weight-scale sizing
csrc/trtllm_fused_moe_kernel_launcher.cu
Compute weight_scale_vec_size_raw, snap to 16 or 32, perform round‑trip validation (expected_unpadded) against gemm1_weights_scale.numel(), and select mDtypeWeights based on snapped vec size (removed strict equality ICHECK).
MoE CUDA regression tests
tests/moe/test_trtllm_cutlass_fused_moe.py
Import SM12x support check and add CUDA-only regression test test_moe_nvfp4_unaligned_hidden_size that constructs padded FP4 expert weights/scales (hidden_size=2880) and validates fused_moe output against a dequantized host reference.
FP4 quantization reshape
flashinfer/quantization/fp4_quantization.py
Make returned scale-factor tensor reshape layout-aware: when is_sf_swizzled_layout is true, compute padded columns (sf_cols = round_up(..., 4)) and reshape sf to reflect 4-column padding; retain existing transpose logic for column-major outputs.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • cyx-6
  • yzh119
  • bkryu
  • nv-yunzheq
  • djmmoss
  • jimmyzho

Poem

🐰 I hopped through columns, snapped them neat,
From raw to sixteen or thirty-two,
Padded scales and tests made paths complete,
Host and kernel now agree anew.
🥕 Hooray — MoE hops quick and true!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR addresses issue #2847's weight_scale_vec_size handling requirement (#3) by replacing strict equality check with snap-to-nearest logic and round-trip validation. Test coverage is added for unaligned hidden_size scenarios, matching the issue's intent. The PR only implements requirement #3 (weight_scale_vec_size padding fix) but omits requirements #1 (JIT compilation filter to include major 12) and #2 (arch assertion updates). These must be added to fully enable SM120 CUTLASS MXFP4 MoE support.
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly describes the main fix: snapping weight_scale_vec_size to handle block_scale_interleave padding for SM120 GPUs, directly matching the core change in the code.
Description check ✅ Passed Description provides comprehensive coverage: root cause analysis, specific fix details, file changes, and test plan. All required template sections are addressed adequately.
Out of Scope Changes check ✅ Passed All file changes are in-scope: trtllm kernel launcher changes handle padding logic, test file adds regression tests for unaligned hidden_size, and fp4_quantization layout-dependent reshaping supports the scale padding scenario. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug (issue #2847) where the weight_scale_vec_size was incorrectly calculated in fused MoE kernels when the hidden_size was not aligned to sf_block_size * 4. The fix in csrc/trtllm_fused_moe_kernel_launcher.cu introduces a more robust calculation for weight_scale_vec_size, snapping it to the nearest valid value (16 or 32) and adding a validation check for the scale tensor's numel(). To verify this, two new tests, test_moe_nvfp4_unaligned_hidden_size and test_moe_mxfp8_mxfp4_unaligned_hidden_size, have been added to tests/moe/test_trtllm_cutlass_fused_moe.py to specifically target and confirm the correction under these unaligned conditions. There are no review comments to provide feedback on.

@samuellees
Copy link
Copy Markdown
Collaborator Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !464 has been created, and the CI pipeline #47049224 is currently running. I'll report back once the pipeline job completes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/moe/test_trtllm_cutlass_fused_moe.py (1)

1848-1848: Optional: Replace lambda with def to satisfy Ruff E731.

Static analysis flags the lambda assignment. While this is consistent with line 504 in the same file, consider using a local function for clarity.

Suggested fix
-    round_up = lambda x, y: (x + y - 1) // y * y
+    def round_up(x, y):
+        return (x + y - 1) // y * y
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_cutlass_fused_moe.py` at line 1848, The inline lambda
assigned to round_up triggers Ruff E731; replace the lambda with a local
function definition (e.g., def round_up(x, y): ...) to satisfy the linter and
improve clarity—update any subsequent references to round_up unchanged and keep
the same behavior ((x + y - 1) // y * y) inside the new function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Line 1848: The inline lambda assigned to round_up triggers Ruff E731; replace
the lambda with a local function definition (e.g., def round_up(x, y): ...) to
satisfy the linter and improve clarity—update any subsequent references to
round_up unchanged and keep the same behavior ((x + y - 1) // y * y) inside the
new function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a03b96fe-45b6-49ec-926e-efd493379b15

📥 Commits

Reviewing files that changed from the base of the PR and between d426b18 and 2b4b8f4.

📒 Files selected for processing (2)
  • csrc/trtllm_fused_moe_kernel_launcher.cu
  • tests/moe/test_trtllm_cutlass_fused_moe.py

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47049224: 8/20 passed

@samuellees
Copy link
Copy Markdown
Collaborator Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !464 has been created, and the CI pipeline #47074816 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47074816: 7/20 passed

@samuellees
Copy link
Copy Markdown
Collaborator Author

samuellees commented Mar 28, 2026

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47192512 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47192512: 8/20 passed

samuellees and others added 3 commits March 30, 2026 06:37
…ing (flashinfer-ai#2847)

block_scale_interleave pads scale columns to a multiple of 4 via
round_up(cols, 4), which inflates gemm1_weights_scale.numel(). When
hidden_size / sf_block_size is not a multiple of 4 (e.g. gpt-oss-120b
with hidden_size=2880, sf_block_size=32 → 90 scale cols padded to 92),
the reverse-computed weight_scale_vec_size becomes 31 instead of 32,
failing the strict equality check and blocking SM120 MXFP4 MoE kernels.

Replace the hard-coded equality check with snap-to-nearest-valid (16 or
32) plus a round-trip validation ensuring the actual scale tensor is at
least as large as the unpadded expectation.

Add regression tests with hidden_size=2880 for both NVFP4 and
MXFP8xMXFP4 MoE paths.

Closes flashinfer-ai#2847
…e padding

block_scale_interleave pads scale columns to a multiple of 4. For
hidden_size=2880 (sf_cols=90, padded to 92), the kernel returns
sf.numel()=rows*92, and sf.reshape((-1, 90)) fails because 92*rows
is not divisible by 90.

Fix: swap to sf.reshape((input.shape[-2], -1)) so the row count is
fixed and the column count is inferred from the actual (possibly padded)
tensor size.

AI-assisted

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
block_scale_interleave pads in both dimensions:
- K/sf_vec_size is rounded up to the next multiple of 4 (scale cols)
- M is rounded up to the next multiple of 128 (scale rows)

The previous fix sf.reshape((input.shape[-2], -1)) handles padded
columns but fails when M is not a multiple of 128 (e.g. M=48 → padded
to 128 by kernel, but 1024/48 is not an integer).

Fix: use sf_cols = round_up(K//sf_vec_size, 4) as the fixed column
dimension and let -1 infer the (possibly padded) row count.  This
handles both unaligned K (e.g. hidden_size=2880: 90→92 cols) and
unaligned M (e.g. 48 rows → 128 in sf).

AI-assisted

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@samuellees samuellees force-pushed the fix/sm120-mxfp4-moe-weight-scale branch from 06de617 to 2ed47f5 Compare March 30, 2026 13:38
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/moe/test_trtllm_cutlass_fused_moe.py (1)

1972-1972: Consider using def instead of lambda for consistency with linting rules.

Static analysis flags this lambda assignment. However, this pattern is already used elsewhere in the file (line 505), so keeping it consistent with the existing codebase is also reasonable.

♻️ Optional: Replace lambda with def
-    round_up = lambda x, y: (x + y - 1) // y * y
+    def round_up(x, y):
+        return (x + y - 1) // y * y
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_cutlass_fused_moe.py` at line 1972, The lambda assigned
to round_up should be replaced with a regular function to satisfy linting and
consistency: implement def round_up(x, y): return (x + y - 1) // y * y and keep
all existing call sites unchanged; ensure the new function name round_up is used
in place of the lambda and no other behavior changes are introduced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Line 1972: The lambda assigned to round_up should be replaced with a regular
function to satisfy linting and consistency: implement def round_up(x, y):
return (x + y - 1) // y * y and keep all existing call sites unchanged; ensure
the new function name round_up is used in place of the lambda and no other
behavior changes are introduced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7663e963-68b1-4fe5-9588-c0c7a153fe6f

📥 Commits

Reviewing files that changed from the base of the PR and between 06de617 and 2ed47f5.

📒 Files selected for processing (3)
  • csrc/trtllm_fused_moe_kernel_launcher.cu
  • flashinfer/quantization/fp4_quantization.py
  • tests/moe/test_trtllm_cutlass_fused_moe.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • flashinfer/quantization/fp4_quantization.py

@samuellees
Copy link
Copy Markdown
Collaborator Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47256853 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47256853: 7/20 passed

…ble MXFP4 test

The sf reshape fix must be conditional on is_sf_swizzled_layout:
- Swizzled: sf includes row/column padding from block_scale_interleave,
  so use round_up(K/sf_vec_size, 4) as column count
- Non-swizzled: sf has exactly M * (K/sf_vec_size) elements, no padding,
  so use K/sf_vec_size directly (original behavior)

Remove test_moe_mxfp8_mxfp4_unaligned_hidden_size because the MXFP4 MoE
kernel requires hidden_size % 128 == 0, and when that constraint holds,
hidden_size/32 is always a multiple of 4, making the column-padding
scenario impossible.

AI-assisted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/moe/test_trtllm_cutlass_fused_moe.py (1)

2084-2087: Avoid ambiguous Unicode multiplication sign in comments.

Replace × with x to satisfy RUF003 and avoid encoding/search inconsistencies in tooling.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_cutlass_fused_moe.py` around lines 2084 - 2087, The
comment string containing "MXFP8×MXFP4" uses a Unicode multiplication sign which
triggers RUF003; edit the comment that starts "NOTE: No MXFP8×MXFP4
unaligned-hidden_size test here..." (reference the MXFP8×MXFP4 text) and replace
the Unicode × with a plain ASCII 'x' so it reads "MXFP8xMXFP4", ensuring the
rest of the comment remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Around line 1944-1947: Replace the direct torch.cuda.get_device_capability()
check in the pytest.mark.skipif with the flashinfer.utils capability helpers:
import get_compute_capability and/or is_sm100a_supported from flashinfer.utils
and use those to determine support for NVFP4 (e.g., use is_sm100a_supported() or
compare get_compute_capability() via that helper) in the skipif predicate
instead of torch.cuda.get_device_capability(); update the skip reason text if
needed to reflect the helper usage and keep the skip guard consistent with other
tests.
- Line 1972: Replace the lambda assignment "round_up = lambda x, y: (x + y - 1)
// y * y" with a local named function to satisfy Ruff E731: define a function
named round_up with parameters x and y that returns (x + y - 1) // y * y, and
update any usages to call round_up(x, y); reference the existing symbol
"round_up" and ensure the function is defined in the same scope where the lambda
was previously assigned.

---

Nitpick comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Around line 2084-2087: The comment string containing "MXFP8×MXFP4" uses a
Unicode multiplication sign which triggers RUF003; edit the comment that starts
"NOTE: No MXFP8×MXFP4 unaligned-hidden_size test here..." (reference the
MXFP8×MXFP4 text) and replace the Unicode × with a plain ASCII 'x' so it reads
"MXFP8xMXFP4", ensuring the rest of the comment remains unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 58ed0de1-db3d-4239-81de-ff5da2658ea1

📥 Commits

Reviewing files that changed from the base of the PR and between 2ed47f5 and 95423e8.

📒 Files selected for processing (2)
  • flashinfer/quantization/fp4_quantization.py
  • tests/moe/test_trtllm_cutlass_fused_moe.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • flashinfer/quantization/fp4_quantization.py

@samuellees
Copy link
Copy Markdown
Collaborator Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47349540 is currently running. I'll report back once the pipeline job completes.

- Use flashinfer.utils helpers (is_sm100a_supported, is_sm12x_supported)
  instead of torch.cuda.get_device_capability() in skipif
- Replace lambda with def for round_up to satisfy Ruff E731
- Replace Unicode multiplication sign with ASCII 'x' (RUF003)

AI-assisted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Around line 1971-1975: Pre-commit formatting failed for this test file; run
the project's formatter and commit only the formatting changes: run `pre-commit
run --all-files` (or `ruff format`), which will reformat the block containing
torch.manual_seed(42), quant_blocksize = 16, the round_up(...) function, and e =
num_experts, then stage and commit the resulting formatting-only diff.
- Line 1972: The test uses quant_blocksize = 16 which with hidden_size = 2880
yields k//quant_blocksize = 180 and no padding, so the padded-scale-columns path
isn't exercised; change quant_blocksize (or hidden_size) in the test to a value
that forces non-multiple-of-4 blocks (e.g., quant_blocksize = 32 so
k//quant_blocksize becomes 90 and round_up(90, 4) -> 92) to hit the padding
codepath referenced in the docstring; update all related occurrences (the
variables quant_blocksize, the calculation using round_up, and any assertions in
the same test function) so the test actually validates the padded columns
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e91d0da-a2f7-4c95-8242-606e92c38758

📥 Commits

Reviewing files that changed from the base of the PR and between 95423e8 and a62280e.

📒 Files selected for processing (1)
  • tests/moe/test_trtllm_cutlass_fused_moe.py

Comment on lines +1971 to +1975
torch.manual_seed(42)
quant_blocksize = 16
def round_up(x, y):
return (x + y - 1) // y * y
e = num_experts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pre-commit formatting is still failing on this file.

CI reports ruff-format changed this file. Please run pre-commit run --all-files (or ruff format) and commit the formatting-only delta.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_cutlass_fused_moe.py` around lines 1971 - 1975,
Pre-commit formatting failed for this test file; run the project's formatter and
commit only the formatting changes: run `pre-commit run --all-files` (or `ruff
format`), which will reformat the block containing torch.manual_seed(42),
quant_blocksize = 16, the round_up(...) function, and e = num_experts, then
stage and commit the resulting formatting-only diff.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47349540: 6/20 passed

samuellees and others added 2 commits April 1, 2026 05:18
hidden_size=2880 with quant_blocksize=16 gives 2880/16=180 scale cols,
which is already divisible by 4 -- no padding occurs, so the test did
not exercise the weight_scale_vec_size snap fix.

Change to hidden_size=288: 288/16=18 cols, round_up(18,4)=20, which
triggers the block_scale_interleave padding path.

Also fix ruff-format: add blank lines around nested def.

AI-assisted

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@samuellees
Copy link
Copy Markdown
Collaborator Author

/bot run

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47443267 is currently running. I'll report back once the pipeline job completes.

@flashinfer-bot
Copy link
Copy Markdown
Collaborator

[FAILED] Pipeline #47443267: 5/20 passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SM120] Enable MXFP4 MoE kernels for Blackwell workstation GPUs (RTX PRO 6000, RTX 5090)

3 participants