[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions by SandishKumarHN · Pull Request #37010 · vllm-project/vllm

SandishKumarHN · 2026-03-13T21:02:16Z

Summary

When DeepEP/NIXL EP backends round up hidden_size for alignment (e.g., 2688 → 3072), FusedMoE weight parameters are allocated with the padded size but
checkpoint weights have the original size. This causes RuntimeError: The size of tensor a (3072) must match the size of tensor b (2688) during
expert_data.copy_(loaded_weight) in weight loading.

Added _narrow_expert_data_for_padding() static method that narrows padded parameter dimensions to match checkpoint weights before copying
Applied to _load_w2, _load_w13, and _load_per_channel_weight_scale (3 paths)
Excluded BitsAndBytes w2 path — BnB params are flat packed-integer tensors where copy_() is intercepted by __torch_function__ for in-flight quantization;
shapes are intentionally different
When hidden_size is not padded (common case), the helper is a no-op since all dimensions already match
Follows the existing narrowing pattern used by the mxfp4 quantization path (line 1069-1078)

Not a duplicate: Checked open PRs — #34285 and #30647 address roundup refactoring and forward-pass padding, not weight loading.

Test plan

New unit tests for _narrow_expert_data_for_padding (7 cases: matching shapes, w2/w13 dims, 3D tensors, 1D scales, scalar weights, storage sharing)
New integration tests for padded weight loading (w2, w13, no-padding no-op)
python -m pytest tests/kernels/moe/test_moe_weight_loading_padded.py -v — 10/10 pass
python -m pytest tests/kernels/moe/ -v -k "not deepep" — existing MoE tests

gemini-code-assist

Code Review

This pull request addresses a bug in FusedMoE weight loading where padded hidden dimensions cause shape mismatches. The solution introduces a _narrow_expert_data_for_padding static method to correctly slice the parameter tensor before copying weights, which is a sound approach. The changes are applied to all relevant weight loading paths and are accompanied by a comprehensive set of new tests. My review found a potential IndexError in the new helper function if the input tensors have different ranks. I've provided a suggestion to make the implementation more robust. Overall, this is a good fix for the issue.

gemini-code-assist · 2026-03-13T21:06:12Z

vllm/model_executor/layers/fused_moe/layer.py

+            for d in range(loaded_weight.ndim):
+                if expert_data.shape[d] != loaded_weight.shape[d]:
+                    expert_data = expert_data.narrow(d, 0, loaded_weight.shape[d])


The loop for d in range(loaded_weight.ndim): is not entirely safe. If expert_data.ndim < loaded_weight.ndim, this will raise an IndexError when accessing expert_data.shape[d]. While this might not be a typical scenario for padding issues, making the code more robust against such cases would prevent potential crashes.

I recommend iterating up to min(expert_data.ndim, loaded_weight.ndim). This ensures the loop doesn't go out of bounds. If the tensor ranks are different, it's likely not a simple padding problem, and the subsequent copy_ operation will correctly fail with a shape mismatch error, providing a clearer signal of the underlying issue.

Suggested change

for d in range(loaded_weight.ndim):

if expert_data.shape[d] != loaded_weight.shape[d]:

expert_data = expert_data.narrow(d, 0, loaded_weight.shape[d])

for d in range(min(expert_data.ndim, loaded_weight.ndim)):

if expert_data.shape[d] != loaded_weight.shape[d]:

expert_data = expert_data.narrow(d, 0, loaded_weight.shape[d])

… backends round up hidden_size (e.g., 2688 -> 3072) Signed-off-by: SandishKumarHN <sandish@fb.com>

SandishKumarHN requested review from WoosukKwon, mgoin, pavanimajety, tlrmchlsmth and yewentao256 as code owners March 13, 2026 21:02

mergify bot added the bug Something isn't working label Mar 13, 2026

gemini-code-assist bot reviewed Mar 13, 2026

View reviewed changes

Fix FusedMoE weight loading with padded hidden dimensions When DeepEP…

1181bd0

… backends round up hidden_size (e.g., 2688 -> 3072) Signed-off-by: SandishKumarHN <sandish@fb.com>

SandishKumarHN force-pushed the issue-36926 branch from ee71cce to 1181bd0 Compare March 13, 2026 21:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions#37010

[Bugfix] Fix FusedMoE weight loading with padded hidden dimensions#37010
SandishKumarHN wants to merge 1 commit intovllm-project:mainfrom
SandishKumarHN:issue-36926

SandishKumarHN commented Mar 13, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

SandishKumarHN commented Mar 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SandishKumarHN commented Mar 13, 2026 •

edited by github-actions bot

Loading