Skip to content

[Bugfix] Add regression test for MoE quant_config under torch.compile#34335

Merged
DarkLight1337 merged 1 commit intovllm-project:mainfrom
mgehre-amd:matthias.fix_moe_quant_compile
Feb 20, 2026
Merged

[Bugfix] Add regression test for MoE quant_config under torch.compile#34335
DarkLight1337 merged 1 commit intovllm-project:mainfrom
mgehre-amd:matthias.fix_moe_quant_compile

Conversation

@mgehre-amd
Copy link
Copy Markdown
Contributor

@mgehre-amd mgehre-amd commented Feb 11, 2026

Update: The fix landed separately via #34371 and this PR only adds the regression test.

Summary

After the MoE Refactor (#32344), w4a16 models fail with AssertionError: Hidden size mismatch 2048 != 1024 under torch.compile.
This is because ensure_moe_quant_config_init() is called in FusedMoE.forward_native(). When torch.compile is active, forward_native is traced by Dynamo, but the side effect of setting self.quant_method.moe_quant_config (an attribute mutation) is not replayed at runtime. This causes moe_quant_config to remain None when DefaultMoERunner.forward_impl executes inside the moe_forward custom op at runtime.

For W4A16-quantized MoE models (e.g. AWQ 4-bit), this means use_int4_w4a16 is False instead of True, causing the assertion hidden_states.size(1) == w1.size(2) to fail because packed 4-bit weights have half the expected dimension.

Fix: Call layer.ensure_moe_quant_config_init() at the start of DefaultMoERunner.forward_impl(), which runs inside the moe_forward custom op and is therefore not compiled by Dynamo.

Reproducer:

vllm serve RedHatAI/Qwen3-30B-A3B-Instruct-2507.w4a16 \
  --max-model-len 1024 --gpu-memory-utilization 0.85
# Without --enforce-eager, this hits:
# AssertionError: Hidden size mismatch 2048 != 1024
# With --enforce-eager it works because forward_native runs eagerly
# and the attribute mutation takes effect.

Test plan

  • Added regression test test_w4a16_moe_torch_compile that loads a tiny W4A16 MoE model (nm-testing/tinysmokeqwen3moe-W4A16-first-only-CTstable) with enforce_eager=False and verifies inference succeeds.
  • Verified the test fails without the fix (AssertionError: Hidden size mismatch) and passes with it.
  • Ran tests/kernels/moe/test_moe.py (537 passed, 1 unrelated OOM skip).

@mergify mergify bot added the bug Something isn't working label Feb 11, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where the MoE quantization configuration was not being initialized at runtime when using torch.compile. The issue stems from Dynamo not replaying attribute mutation side effects from a traced function. The fix, which involves moving the initialization call into DefaultMoERunner.forward_impl (a function executed eagerly within a custom op), is sound and directly addresses the problem. The addition of a targeted regression test (test_w4a16_moe_torch_compile) is excellent, as it successfully reproduces the failure and validates the fix. The changes are minimal, well-commented, and effective.

# is compiled by torch.compile/dynamo, and the attribute mutation
# side effect is not replayed at runtime. forward_impl runs inside
# the moe_forward custom op, so it is not compiled by dynamo.
layer.ensure_moe_quant_config_init()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try putting this in _moe_forward and _moe_forward_shared instead?

@bnellnm
Copy link
Copy Markdown
Collaborator

bnellnm commented Feb 12, 2026

This should be fixed on main now.

@mgehre-amd
Copy link
Copy Markdown
Contributor Author

This should be fixed on main now.

Right, the fix landed on main via #34371 (31d992d).
However, #34371 did not add a regression test. This PR's test_w4a16_moe_torch_compile test case is still valuable to prevent future regressions.
@bnellnm, should I rebase this PR onto main (dropping the now-redundant code change) and merge it for just the test?

@bnellnm
Copy link
Copy Markdown
Collaborator

bnellnm commented Feb 12, 2026

This should be fixed on main now.

Right, the fix landed on main via #34371 (31d992d). However, #34371 did not add a regression test. This PR's test_w4a16_moe_torch_compile test case is still valuable to prevent future regressions. @bnellnm, should I rebase this PR onto main (dropping the now-redundant code change) and merge it for just the test?

Sure, sounds good to me.

The code fix landed via vllm-project#34371 (31d992d). This adds a regression test
to prevent future regressions: test_w4a16_moe_torch_compile loads a
W4A16 MoE model with enforce_eager=False and verifies inference
succeeds without the "Hidden size mismatch" assertion error.

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
@mgehre-amd mgehre-amd force-pushed the matthias.fix_moe_quant_compile branch from b95bc01 to 8b88807 Compare February 18, 2026 09:49
@mgehre-amd mgehre-amd changed the title [Bugfix] Fix MoE quant_config not initialized under torch.compile [Bugfix] Add regression test for MoE quant_config under torch.compile Feb 18, 2026
@mgehre-amd mgehre-amd requested a review from bnellnm February 18, 2026 09:51
@mgehre-amd
Copy link
Copy Markdown
Contributor Author

This should be fixed on main now.

Right, the fix landed on main via #34371 (31d992d). However, #34371 did not add a regression test. This PR's test_w4a16_moe_torch_compile test case is still valuable to prevent future regressions. @bnellnm, should I rebase this PR onto main (dropping the now-redundant code change) and merge it for just the test?

Sure, sounds good to me.

Done. This PR now only brings in the regression test.

Copy link
Copy Markdown
Member

@yewentao256 yewentao256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@yewentao256 yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 18, 2026
@DarkLight1337 DarkLight1337 merged commit 4e2c7ca into vllm-project:main Feb 20, 2026
17 of 18 checks passed
yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026
jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026
llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026
tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026
askliar pushed a commit to askliar/vllm that referenced this pull request Mar 9, 2026
…vllm-project#34335)

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026
EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 1, 2026
…vllm-project#34335)

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Signed-off-by: EricccYang <yangyang4991@gmail.com>
liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants