[Bugfix] Add regression test for MoE quant_config under torch.compile by mgehre-amd · Pull Request #34335 · vllm-project/vllm

mgehre-amd · 2026-02-11T12:07:26Z

Update: The fix landed separately via #34371 and this PR only adds the regression test.

Summary

After the MoE Refactor (#32344), w4a16 models fail with AssertionError: Hidden size mismatch 2048 != 1024 under torch.compile.
This is because ensure_moe_quant_config_init() is called in FusedMoE.forward_native(). When torch.compile is active, forward_native is traced by Dynamo, but the side effect of setting self.quant_method.moe_quant_config (an attribute mutation) is not replayed at runtime. This causes moe_quant_config to remain None when DefaultMoERunner.forward_impl executes inside the moe_forward custom op at runtime.

For W4A16-quantized MoE models (e.g. AWQ 4-bit), this means use_int4_w4a16 is False instead of True, causing the assertion hidden_states.size(1) == w1.size(2) to fail because packed 4-bit weights have half the expected dimension.

Fix: Call layer.ensure_moe_quant_config_init() at the start of DefaultMoERunner.forward_impl(), which runs inside the moe_forward custom op and is therefore not compiled by Dynamo.

Reproducer:

vllm serve RedHatAI/Qwen3-30B-A3B-Instruct-2507.w4a16 \
  --max-model-len 1024 --gpu-memory-utilization 0.85
# Without --enforce-eager, this hits:
# AssertionError: Hidden size mismatch 2048 != 1024
# With --enforce-eager it works because forward_native runs eagerly
# and the attribute mutation takes effect.

Test plan

Added regression test test_w4a16_moe_torch_compile that loads a tiny W4A16 MoE model (nm-testing/tinysmokeqwen3moe-W4A16-first-only-CTstable) with enforce_eager=False and verifies inference succeeds.
Verified the test fails without the fix (AssertionError: Hidden size mismatch) and passes with it.
Ran tests/kernels/moe/test_moe.py (537 passed, 1 unrelated OOM skip).

gemini-code-assist

Code Review

This pull request correctly fixes a bug where the MoE quantization configuration was not being initialized at runtime when using torch.compile. The issue stems from Dynamo not replaying attribute mutation side effects from a traced function. The fix, which involves moving the initialization call into DefaultMoERunner.forward_impl (a function executed eagerly within a custom op), is sound and directly addresses the problem. The addition of a targeted regression test (test_w4a16_moe_torch_compile) is excellent, as it successfully reproduces the failure and validates the fix. The changes are minimal, well-commented, and effective.

bnellnm · 2026-02-11T14:14:54Z

+        # is compiled by torch.compile/dynamo, and the attribute mutation
+        # side effect is not replayed at runtime. forward_impl runs inside
+        # the moe_forward custom op, so it is not compiled by dynamo.
+        layer.ensure_moe_quant_config_init()


Can you try putting this in _moe_forward and _moe_forward_shared instead?

bnellnm · 2026-02-12T00:29:45Z

This should be fixed on main now.

mgehre-amd · 2026-02-12T10:57:09Z

This should be fixed on main now.

Right, the fix landed on main via #34371 (31d992d).
However, #34371 did not add a regression test. This PR's test_w4a16_moe_torch_compile test case is still valuable to prevent future regressions.
@bnellnm, should I rebase this PR onto main (dropping the now-redundant code change) and merge it for just the test?

bnellnm · 2026-02-12T14:17:10Z

This should be fixed on main now.

Right, the fix landed on main via #34371 (31d992d). However, #34371 did not add a regression test. This PR's test_w4a16_moe_torch_compile test case is still valuable to prevent future regressions. @bnellnm, should I rebase this PR onto main (dropping the now-redundant code change) and merge it for just the test?

Sure, sounds good to me.

The code fix landed via vllm-project#34371 (31d992d). This adds a regression test to prevent future regressions: test_w4a16_moe_torch_compile loads a W4A16 MoE model with enforce_eager=False and verifies inference succeeds without the "Hidden size mismatch" assertion error. Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

mgehre-amd · 2026-02-18T11:00:34Z

This should be fixed on main now.

Right, the fix landed on main via #34371 (31d992d). However, #34371 did not add a regression test. This PR's test_w4a16_moe_torch_compile test case is still valuable to prevent future regressions. @bnellnm, should I rebase this PR onto main (dropping the now-redundant code change) and merge it for just the test?

Sure, sounds good to me.

Done. This PR now only brings in the regression test.

yewentao256

LGTM, thanks for the work!

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com> Signed-off-by: EricccYang <yangyang4991@gmail.com>

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

mgehre-amd requested review from mgoin, pavanimajety, robertgshaw2-redhat and yewentao256 as code owners February 11, 2026 12:07

mergify bot added the bug Something isn't working label Feb 11, 2026

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

mgehre-amd mentioned this pull request Feb 11, 2026

[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner #32344

Merged

5 tasks

bnellnm reviewed Feb 11, 2026

View reviewed changes

mgehre-amd force-pushed the matthias.fix_moe_quant_compile branch from b95bc01 to 8b88807 Compare February 18, 2026 09:49

mgehre-amd changed the title ~~[Bugfix] Fix MoE quant_config not initialized under torch.compile~~ [Bugfix] Add regression test for MoE quant_config under torch.compile Feb 18, 2026

mgehre-amd requested a review from bnellnm February 18, 2026 09:51

bnellnm approved these changes Feb 18, 2026

View reviewed changes

yewentao256 approved these changes Feb 18, 2026

View reviewed changes

yewentao256 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 18, 2026

DarkLight1337 merged commit 4e2c7ca into vllm-project:main Feb 20, 2026
17 of 18 checks passed

yugong333 pushed a commit to yugong333/vllm that referenced this pull request Feb 22, 2026

[Bugfix] Add regression test for MoE quant_config under torch.compile (…

bd6cbfb

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

jmamou pushed a commit to jmamou/vllm that referenced this pull request Feb 23, 2026

[Bugfix] Add regression test for MoE quant_config under torch.compile (…

f574423

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

llsj14 pushed a commit to llsj14/vllm that referenced this pull request Mar 1, 2026

[Bugfix] Add regression test for MoE quant_config under torch.compile (…

b56030d

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

tunglinwood pushed a commit to tunglinwood/vllm that referenced this pull request Mar 4, 2026

[Bugfix] Add regression test for MoE quant_config under torch.compile (…

4507468

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

Copilot AI pushed a commit to machov/vllm that referenced this pull request Mar 10, 2026

[Bugfix] Add regression test for MoE quant_config under torch.compile (…

9298b0f

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

liuchenbing2026 pushed a commit to liuchenbing2026/vllm that referenced this pull request Apr 4, 2026

[Bugfix] Add regression test for MoE quant_config under torch.compile (…

d25b800

…vllm-project#34335) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Add regression test for MoE quant_config under torch.compile#34335

[Bugfix] Add regression test for MoE quant_config under torch.compile#34335
DarkLight1337 merged 1 commit intovllm-project:mainfrom
mgehre-amd:matthias.fix_moe_quant_compile

mgehre-amd commented Feb 11, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

bnellnm Feb 11, 2026

Uh oh!

bnellnm commented Feb 12, 2026

Uh oh!

mgehre-amd commented Feb 12, 2026

Uh oh!

bnellnm commented Feb 12, 2026

Uh oh!

mgehre-amd commented Feb 18, 2026

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

mgehre-amd commented Feb 11, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

bnellnm Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

bnellnm commented Feb 12, 2026

Uh oh!

mgehre-amd commented Feb 12, 2026

Uh oh!

bnellnm commented Feb 12, 2026

Uh oh!

mgehre-amd commented Feb 18, 2026

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mgehre-amd commented Feb 11, 2026 •

edited by github-actions bot

Loading