fix: snap weight_scale_vec_size to handle block_scale_interleave padding for SM120 by samuellees · Pull Request #2898 · flashinfer-ai/flashinfer

samuellees · 2026-03-26T14:04:31Z

SM120 GPUs (RTX PRO 6000, RTX 5090) cannot use MXFP4 MoE CUTLASS kernels because the weight_scale_vec_size check in
trtllm_fp4_block_scale_moe rejects valid inputs.

Root cause: block_scale_interleave pads scale columns via round_up(cols, 4), inflating
gemm1_weights_scale.numel(). When hidden_size / sf_block_size is not a multiple of 4 (e.g. gpt-oss-120b:
hidden_size=2880, sf_block_size=32 → 90 scale cols padded to 92), the reverse-computed weight_scale_vec_size becomes 31
instead of 32, failing the strict equality check.

Fix: Replace the hard-coded equality check with snap-to-nearest-valid (16 or 32) plus a round-trip validation
ensuring the actual scale tensor numel is at least as large as the unpadded expectation.

Changes

csrc/trtllm_fused_moe_kernel_launcher.cu: Replace weight_scale_vec_size check logic in trtllm_fp4_block_scale_moe
tests/moe/test_trtllm_cutlass_fused_moe.py: Add test_moe_nvfp4_unaligned_hidden_size and
test_moe_mxfp8_mxfp4_unaligned_hidden_size with hidden_size=2880 to cover the padding scenario

Test plan

Existing MoE tests pass without regression: pytest tests/moe/test_trtllm_cutlass_fused_moe.py -x
New unaligned hidden_size tests pass (requires SM100/SM110/SM120 GPU)
pre-commit lint passes

Summary by CodeRabbit

Bug Fixes
- More tolerant fused MoE weight/scale sizing with vec-size snapped to 16/32 and clearer validation errors for unaligned hidden sizes.
- FP4 quantization scale layout now accounts for padded/interleaved columns (aligned to blocks of 4) so scale buffers are interpreted correctly.
Tests
- Added CUDA regression test for fused MoE with an unaligned hidden size under NVFP4 quantization.

gemini-code-assist · 2026-03-26T14:04:52Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a critical compatibility issue affecting NVIDIA SM120 GPUs when executing MXFP4 Mixture-of-Experts (MoE) CUTLASS kernels. The fix addresses an erroneous calculation of the weight_scale_vec_size that arose from padding in the block_scale_interleave process, which previously rejected valid inputs. By implementing a more robust scale vector size determination and validation, this change ensures correct and efficient operation of MoE kernels across supported GPU architectures, particularly for models with specific hidden size configurations that previously triggered the bug.

Highlights

Issue Resolution: Fixed a bug preventing SM120 GPUs from using MXFP4 MoE CUTLASS kernels due to an incorrect weight_scale_vec_size check.
Root Cause Addressed: Corrected the miscalculation of weight_scale_vec_size which was caused by block_scale_interleave padding scale columns, leading to an inflated numel().
Improved Calculation Logic: Replaced the strict equality check for weight_scale_vec_size with a snap-to-nearest-valid (16 or 32) approach, coupled with a round-trip validation to ensure tensor size integrity.
Enhanced Test Coverage: Introduced new test cases (test_moe_nvfp4_unaligned_hidden_size and test_moe_mxfp8_mxfp4_unaligned_hidden_size) specifically designed to validate the fix for scenarios involving unaligned hidden sizes (e.g., hidden_size=2880).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-26T14:04:59Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adjusted FP4/MXFP MoE scale-vector sizing in the TRT-LLM CUDA launcher to compute a raw vec size, snap it to 16 or 32, validate against padded host scale tensors, added a CUDA-only regression test for an unaligned hidden size, and made FP4 scale-factor reshaping padding-aware.

Changes

Cohort / File(s)	Summary
Kernel weight-scale sizing `csrc/trtllm_fused_moe_kernel_launcher.cu`	Compute `weight_scale_vec_size_raw`, snap to 16 or 32, perform round‑trip validation (`expected_unpadded`) against `gemm1_weights_scale.numel()`, and select `mDtypeWeights` based on snapped vec size (removed strict equality ICHECK).
MoE CUDA regression tests `tests/moe/test_trtllm_cutlass_fused_moe.py`	Import SM12x support check and add CUDA-only regression test `test_moe_nvfp4_unaligned_hidden_size` that constructs padded FP4 expert weights/scales (hidden_size=2880) and validates fused_moe output against a dequantized host reference.
FP4 quantization reshape `flashinfer/quantization/fp4_quantization.py`	Make returned scale-factor tensor reshape layout-aware: when `is_sf_swizzled_layout` is true, compute padded columns (`sf_cols = round_up(..., 4)`) and reshape `sf` to reflect 4-column padding; retain existing transpose logic for column-major outputs.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: int32 overflow in trtllm_fp4_block_scale_moe causing "Unsupported hidden state scale shape" for EP32+ configs #2853 — Modifies weight_scale_vec_size computation in the same trtllm_fp4_block_scale_moe code path.
Added missing padding #2726 — Changes FP4 quantization padding/scale-factor layout logic that interacts with this PR's validation.
feat: Add support for TRTLLM MXFP8 non-gated MoE with ReLU2 #2707 — Touches the same MoE FP4 function and may overlap in activation/weight handling.

Suggested reviewers

cyx-6
yzh119
bkryu
nv-yunzheq
djmmoss
jimmyzho

Poem

🐰 I hopped through columns, snapped them neat,
From raw to sixteen or thirty-two,
Padded scales and tests made paths complete,
Host and kernel now agree anew.
🥕 Hooray — MoE hops quick and true!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR addresses issue `#2847`'s weight_scale_vec_size handling requirement (`#3`) by replacing strict equality check with snap-to-nearest logic and round-trip validation. Test coverage is added for unaligned hidden_size scenarios, matching the issue's intent.	The PR only implements requirement `#3` (weight_scale_vec_size padding fix) but omits requirements `#1` (JIT compilation filter to include major 12) and `#2` (arch assertion updates). These must be added to fully enable SM120 CUTLASS MXFP4 MoE support.
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly describes the main fix: snapping weight_scale_vec_size to handle block_scale_interleave padding for SM120 GPUs, directly matching the core change in the code.
Description check	✅ Passed	Description provides comprehensive coverage: root cause analysis, specific fix details, file changes, and test plan. All required template sections are addressed adequately.
Out of Scope Changes check	✅ Passed	All file changes are in-scope: trtllm kernel launcher changes handle padding logic, test file adds regression tests for unaligned hidden_size, and fp4_quantization layout-dependent reshaping supports the scale padding scenario. No unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request fixes a bug (issue #2847) where the weight_scale_vec_size was incorrectly calculated in fused MoE kernels when the hidden_size was not aligned to sf_block_size * 4. The fix in csrc/trtllm_fused_moe_kernel_launcher.cu introduces a more robust calculation for weight_scale_vec_size, snapping it to the nearest valid value (16 or 32) and adding a validation check for the scale tensor's numel(). To verify this, two new tests, test_moe_nvfp4_unaligned_hidden_size and test_moe_mxfp8_mxfp4_unaligned_hidden_size, have been added to tests/moe/test_trtllm_cutlass_fused_moe.py to specifically target and confirm the correction under these unaligned conditions. There are no review comments to provide feedback on.

samuellees · 2026-03-26T14:07:34Z

/bot run

flashinfer-bot · 2026-03-26T14:08:46Z

GitLab MR !464 has been created, and the CI pipeline #47049224 is currently running. I'll report back once the pipeline job completes.

coderabbitai

🧹 Nitpick comments (1)

tests/moe/test_trtllm_cutlass_fused_moe.py (1)
1848-1848: Optional: Replace lambda with def to satisfy Ruff E731.

Static analysis flags the lambda assignment. While this is consistent with line 504 in the same file, consider using a local function for clarity.
Suggested fix
-    round_up = lambda x, y: (x + y - 1) // y * y
+    def round_up(x, y):
+        return (x + y - 1) // y * y
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_cutlass_fused_moe.py` at line 1848, The inline lambda
assigned to round_up triggers Ruff E731; replace the lambda with a local
function definition (e.g., def round_up(x, y): ...) to satisfy the linter and
improve clarity—update any subsequent references to round_up unchanged and keep
the same behavior ((x + y - 1) // y * y) inside the new function.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Line 1848: The inline lambda assigned to round_up triggers Ruff E731; replace
the lambda with a local function definition (e.g., def round_up(x, y): ...) to
satisfy the linter and improve clarity—update any subsequent references to
round_up unchanged and keep the same behavior ((x + y - 1) // y * y) inside the
new function.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a03b96fe-45b6-49ec-926e-efd493379b15

📥 Commits

Reviewing files that changed from the base of the PR and between d426b18 and 2b4b8f4.

📒 Files selected for processing (2)

csrc/trtllm_fused_moe_kernel_launcher.cu
tests/moe/test_trtllm_cutlass_fused_moe.py

flashinfer-bot · 2026-03-26T23:57:33Z

[FAILED] Pipeline #47049224: 8/20 passed

samuellees · 2026-03-27T02:00:14Z

/bot run

flashinfer-bot · 2026-03-27T02:00:36Z

GitLab MR !464 has been created, and the CI pipeline #47074816 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-27T14:05:53Z

[FAILED] Pipeline #47074816: 7/20 passed

samuellees · 2026-03-28T03:29:53Z

/bot run

flashinfer-bot · 2026-03-29T13:14:20Z

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47192512 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-29T17:28:08Z

[FAILED] Pipeline #47192512: 8/20 passed

…ing (flashinfer-ai#2847) block_scale_interleave pads scale columns to a multiple of 4 via round_up(cols, 4), which inflates gemm1_weights_scale.numel(). When hidden_size / sf_block_size is not a multiple of 4 (e.g. gpt-oss-120b with hidden_size=2880, sf_block_size=32 → 90 scale cols padded to 92), the reverse-computed weight_scale_vec_size becomes 31 instead of 32, failing the strict equality check and blocking SM120 MXFP4 MoE kernels. Replace the hard-coded equality check with snap-to-nearest-valid (16 or 32) plus a round-trip validation ensuring the actual scale tensor is at least as large as the unpadded expectation. Add regression tests with hidden_size=2880 for both NVFP4 and MXFP8xMXFP4 MoE paths. Closes flashinfer-ai#2847

…e padding block_scale_interleave pads scale columns to a multiple of 4. For hidden_size=2880 (sf_cols=90, padded to 92), the kernel returns sf.numel()=rows*92, and sf.reshape((-1, 90)) fails because 92*rows is not divisible by 90. Fix: swap to sf.reshape((input.shape[-2], -1)) so the row count is fixed and the column count is inferred from the actual (possibly padded) tensor size. AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

block_scale_interleave pads in both dimensions: - K/sf_vec_size is rounded up to the next multiple of 4 (scale cols) - M is rounded up to the next multiple of 128 (scale rows) The previous fix sf.reshape((input.shape[-2], -1)) handles padded columns but fails when M is not a multiple of 128 (e.g. M=48 → padded to 128 by kernel, but 1024/48 is not an integer). Fix: use sf_cols = round_up(K//sf_vec_size, 4) as the fixed column dimension and let -1 infer the (possibly padded) row count. This handles both unaligned K (e.g. hidden_size=2880: 90→92 cols) and unaligned M (e.g. 48 rows → 128 in sf). AI-assisted Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

tests/moe/test_trtllm_cutlass_fused_moe.py (1)
1972-1972: Consider using def instead of lambda for consistency with linting rules.

Static analysis flags this lambda assignment. However, this pattern is already used elsewhere in the file (line 505), so keeping it consistent with the existing codebase is also reasonable.
♻️ Optional: Replace lambda with def
-    round_up = lambda x, y: (x + y - 1) // y * y
+    def round_up(x, y):
+        return (x + y - 1) // y * y
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_cutlass_fused_moe.py` at line 1972, The lambda assigned
to round_up should be replaced with a regular function to satisfy linting and
consistency: implement def round_up(x, y): return (x + y - 1) // y * y and keep
all existing call sites unchanged; ensure the new function name round_up is used
in place of the lambda and no other behavior changes are introduced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Line 1972: The lambda assigned to round_up should be replaced with a regular
function to satisfy linting and consistency: implement def round_up(x, y):
return (x + y - 1) // y * y and keep all existing call sites unchanged; ensure
the new function name round_up is used in place of the lambda and no other
behavior changes are introduced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7663e963-68b1-4fe5-9588-c0c7a153fe6f

📥 Commits

Reviewing files that changed from the base of the PR and between 06de617 and 2ed47f5.

📒 Files selected for processing (3)

csrc/trtllm_fused_moe_kernel_launcher.cu
flashinfer/quantization/fp4_quantization.py
tests/moe/test_trtllm_cutlass_fused_moe.py

🚧 Files skipped from review as they are similar to previous changes (1)

flashinfer/quantization/fp4_quantization.py

samuellees · 2026-03-30T13:50:44Z

/bot run

flashinfer-bot · 2026-03-30T13:52:10Z

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47256853 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-03-30T20:11:09Z

[FAILED] Pipeline #47256853: 7/20 passed

…ble MXFP4 test The sf reshape fix must be conditional on is_sf_swizzled_layout: - Swizzled: sf includes row/column padding from block_scale_interleave, so use round_up(K/sf_vec_size, 4) as column count - Non-swizzled: sf has exactly M * (K/sf_vec_size) elements, no padding, so use K/sf_vec_size directly (original behavior) Remove test_moe_mxfp8_mxfp4_unaligned_hidden_size because the MXFP4 MoE kernel requires hidden_size % 128 == 0, and when that constraint holds, hidden_size/32 is always a multiple of 4, making the column-padding scenario impossible. AI-assisted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/moe/test_trtllm_cutlass_fused_moe.py (1)
2084-2087: Avoid ambiguous Unicode multiplication sign in comments.

Replace × with x to satisfy RUF003 and avoid encoding/search inconsistencies in tooling.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/moe/test_trtllm_cutlass_fused_moe.py` around lines 2084 - 2087, The
comment string containing "MXFP8×MXFP4" uses a Unicode multiplication sign which
triggers RUF003; edit the comment that starts "NOTE: No MXFP8×MXFP4
unaligned-hidden_size test here..." (reference the MXFP8×MXFP4 text) and replace
the Unicode × with a plain ASCII 'x' so it reads "MXFP8xMXFP4", ensuring the
rest of the comment remains unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Around line 1944-1947: Replace the direct torch.cuda.get_device_capability()
check in the pytest.mark.skipif with the flashinfer.utils capability helpers:
import get_compute_capability and/or is_sm100a_supported from flashinfer.utils
and use those to determine support for NVFP4 (e.g., use is_sm100a_supported() or
compare get_compute_capability() via that helper) in the skipif predicate
instead of torch.cuda.get_device_capability(); update the skip reason text if
needed to reflect the helper usage and keep the skip guard consistent with other
tests.
- Line 1972: Replace the lambda assignment "round_up = lambda x, y: (x + y - 1)
// y * y" with a local named function to satisfy Ruff E731: define a function
named round_up with parameters x and y that returns (x + y - 1) // y * y, and
update any usages to call round_up(x, y); reference the existing symbol
"round_up" and ensure the function is defined in the same scope where the lambda
was previously assigned.

---

Nitpick comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Around line 2084-2087: The comment string containing "MXFP8×MXFP4" uses a
Unicode multiplication sign which triggers RUF003; edit the comment that starts
"NOTE: No MXFP8×MXFP4 unaligned-hidden_size test here..." (reference the
MXFP8×MXFP4 text) and replace the Unicode × with a plain ASCII 'x' so it reads
"MXFP8xMXFP4", ensuring the rest of the comment remains unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 58ed0de1-db3d-4239-81de-ff5da2658ea1

📥 Commits

Reviewing files that changed from the base of the PR and between 2ed47f5 and 95423e8.

📒 Files selected for processing (2)

flashinfer/quantization/fp4_quantization.py
tests/moe/test_trtllm_cutlass_fused_moe.py

🚧 Files skipped from review as they are similar to previous changes (1)

flashinfer/quantization/fp4_quantization.py

tests/moe/test_trtllm_cutlass_fused_moe.py

samuellees · 2026-03-31T12:30:33Z

/bot run

flashinfer-bot · 2026-03-31T12:31:05Z

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47349540 is currently running. I'll report back once the pipeline job completes.

- Use flashinfer.utils helpers (is_sm100a_supported, is_sm12x_supported) instead of torch.cuda.get_device_capability() in skipif - Replace lambda with def for round_up to satisfy Ruff E731 - Replace Unicode multiplication sign with ASCII 'x' (RUF003) AI-assisted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/moe/test_trtllm_cutlass_fused_moe.py`:
- Around line 1971-1975: Pre-commit formatting failed for this test file; run
the project's formatter and commit only the formatting changes: run `pre-commit
run --all-files` (or `ruff format`), which will reformat the block containing
torch.manual_seed(42), quant_blocksize = 16, the round_up(...) function, and e =
num_experts, then stage and commit the resulting formatting-only diff.
- Line 1972: The test uses quant_blocksize = 16 which with hidden_size = 2880
yields k//quant_blocksize = 180 and no padding, so the padded-scale-columns path
isn't exercised; change quant_blocksize (or hidden_size) in the test to a value
that forces non-multiple-of-4 blocks (e.g., quant_blocksize = 32 so
k//quant_blocksize becomes 90 and round_up(90, 4) -> 92) to hit the padding
codepath referenced in the docstring; update all related occurrences (the
variables quant_blocksize, the calculation using round_up, and any assertions in
the same test function) so the test actually validates the padded columns
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7e91d0da-a2f7-4c95-8242-606e92c38758

📥 Commits

Reviewing files that changed from the base of the PR and between 95423e8 and a62280e.

📒 Files selected for processing (1)

tests/moe/test_trtllm_cutlass_fused_moe.py

coderabbitai · 2026-03-31T15:44:33Z

tests/moe/test_trtllm_cutlass_fused_moe.py

+    torch.manual_seed(42)
+    quant_blocksize = 16
+    def round_up(x, y):
+        return (x + y - 1) // y * y
+    e = num_experts


⚠️ Potential issue | 🟡 Minor

Pre-commit formatting is still failing on this file.

CI reports ruff-format changed this file. Please run pre-commit run --all-files (or ruff format) and commit the formatting-only delta.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/moe/test_trtllm_cutlass_fused_moe.py` around lines 1971 - 1975, Pre-commit formatting failed for this test file; run the project's formatter and commit only the formatting changes: run `pre-commit run --all-files` (or `ruff format`), which will reformat the block containing torch.manual_seed(42), quant_blocksize = 16, the round_up(...) function, and e = num_experts, then stage and commit the resulting formatting-only diff.

tests/moe/test_trtllm_cutlass_fused_moe.py

flashinfer-bot · 2026-03-31T16:42:21Z

[FAILED] Pipeline #47349540: 6/20 passed

…eight-scale

hidden_size=2880 with quant_blocksize=16 gives 2880/16=180 scale cols, which is already divisible by 4 -- no padding occurs, so the test did not exercise the weight_scale_vec_size snap fix. Change to hidden_size=288: 288/16=18 cols, round_up(18,4)=20, which triggers the block_scale_interleave padding path. Also fix ruff-format: add blank lines around nested def. AI-assisted Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

samuellees · 2026-04-01T12:21:23Z

/bot run

flashinfer-bot · 2026-04-01T12:21:35Z

GitLab MR !464 has been updated with latest changes, and the CI pipeline #47443267 is currently running. I'll report back once the pipeline job completes.

flashinfer-bot · 2026-04-01T17:08:40Z

[FAILED] Pipeline #47443267: 5/20 passed

samuellees requested review from IwakuraRein, aleozlx, bkryu, cyx-6, jiahanc, jimmyzho, kahyunnam, nv-yunzheq, saltyminty, sricketts, yongwww, yyihuang and yzh119 as code owners March 26, 2026 14:04

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

samuellees mentioned this pull request Mar 26, 2026

[SM120] Enable MXFP4 MoE kernels for Blackwell workstation GPUs (RTX PRO 6000, RTX 5090) #2847

Open

coderabbitai bot reviewed Mar 26, 2026

View reviewed changes

samuellees and others added 3 commits March 30, 2026 06:37

samuellees force-pushed the fix/sm120-mxfp4-moe-weight-scale branch from 06de617 to 2ed47f5 Compare March 30, 2026 13:38

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

samuellees added the run-ci label Mar 30, 2026

flashinfer-bot added the op: moe label Mar 31, 2026

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

tests/moe/test_trtllm_cutlass_fused_moe.py Show resolved Hide resolved

tests/moe/test_trtllm_cutlass_fused_moe.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

samuellees and others added 2 commits April 1, 2026 05:18

Merge remote-tracking branch 'origin/main' into fix/sm120-mxfp4-moe-w…

121f8d2

…eight-scale

aleozlx approved these changes Apr 1, 2026

View reviewed changes

Conversation

samuellees commented Mar 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Test plan

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (2 warnings)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

samuellees commented Mar 26, 2026

Uh oh!

flashinfer-bot commented Mar 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

flashinfer-bot commented Mar 26, 2026

Uh oh!

samuellees commented Mar 27, 2026

Uh oh!

flashinfer-bot commented Mar 27, 2026

Uh oh!

flashinfer-bot commented Mar 27, 2026

Uh oh!

samuellees commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flashinfer-bot commented Mar 29, 2026

Uh oh!

flashinfer-bot commented Mar 29, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

samuellees commented Mar 30, 2026

Uh oh!

flashinfer-bot commented Mar 30, 2026

Uh oh!

flashinfer-bot commented Mar 30, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

samuellees commented Mar 31, 2026

Uh oh!

flashinfer-bot commented Mar 31, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flashinfer-bot commented Mar 31, 2026

Uh oh!

samuellees commented Apr 1, 2026

Uh oh!

flashinfer-bot commented Apr 1, 2026

samuellees commented Mar 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 26, 2026 •

edited

Loading

samuellees commented Mar 28, 2026 •

edited

Loading