[Bugfix] Fix qwen-moe packed_modules_mapping by jeejeelee · Pull Request #26634 · vllm-project/vllm

jeejeelee · 2025-10-11T11:43:08Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

gemini-code-assist

Code Review

This pull request aims to fix an issue with packed_modules_mapping for Qwen MoE models. The change conditionally adds gate_up_proj to the mapping. However, the implementation introduces a critical bug by modifying a class attribute (packed_modules_mapping) from an instance, which can cause state to leak between different model instances. Additionally, the condition used to determine the existence of dense MLP layers is not robust and can fail for certain model configurations. I've provided comments with suggested fixes for both qwen2_moe.py and qwen3_moe.py to address these issues by creating an instance-specific copy of the mapping and using a more accurate condition.

gemini-code-assist · 2025-10-11T11:45:13Z

vllm/model_executor/models/qwen2_moe.py

+        # Only perform the following mapping when Qwen2MoeMLP exists
+        if getattr(config, "mlp_only_layers", []):
+            self.packed_modules_mapping["gate_up_proj"] = (
+                [
+                    "gate_proj",
+                    "up_proj",
+                ],
+            )


This change introduces two critical issues:

Modification of a class attribute: self.packed_modules_mapping is modified in-place. Since packed_modules_mapping is a class attribute, this modification will affect all other instances of Qwen2MoeForCausalLM, which can lead to unexpected behavior if multiple models with different configurations are used in the same process. An instance-specific copy should be created before modification.

Incorrect condition for MLP existence: The condition if getattr(config, "mlp_only_layers", []) is not sufficient to determine if Qwen2MoeMLP layers (and thus gate_up_proj) exist. For example, a model with decoder_sparse_step > 1 and an empty mlp_only_layers list will have dense MLP layers, but this condition will be false, incorrectly omitting gate_up_proj from the mapping.

A more robust approach is to check if not all layers are sparse MoE layers. This is the case if mlp_only_layers is non-empty, or if there are no experts, or if decoder_sparse_step is not 1. The suggested change below addresses both issues.

# Create a copy of the mapping to avoid modifying the class attribute. self.packed_modules_mapping = self.packed_modules_mapping.copy() # Conditionally add gate_up_proj if dense MLP layers exist. A model has # dense MLP layers if not all layers are sparse MoE layers. if (bool(getattr(config, "mlp_only_layers", [])) or getattr(config, "num_experts", 0) == 0 or getattr(config, "decoder_sparse_step", 1) != 1): self.packed_modules_mapping["gate_up_proj"] = [ "gate_proj", "up_proj", ]

gemini-code-assist · 2025-10-11T11:45:13Z

vllm/model_executor/models/qwen3_moe.py

+        # Only perform the following mapping when Qwen3MoeMLP exists
+        if getattr(config, "mlp_only_layers", []):
+            self.packed_modules_mapping["gate_up_proj"] = (
+                [
+                    "gate_proj",
+                    "up_proj",
+                ],
+            )


This change introduces two critical issues:

Modification of a class attribute: self.packed_modules_mapping is modified in-place. Since packed_modules_mapping is a class attribute, this modification will affect all other instances of Qwen3MoeForCausalLM, which can lead to unexpected behavior if multiple models with different configurations are used in the same process. An instance-specific copy should be created before modification.

Incorrect condition for MLP existence: The condition if getattr(config, "mlp_only_layers", []) is not sufficient to determine if Qwen3MoeMLP layers (and thus gate_up_proj) exist. For example, a model with decoder_sparse_step > 1 and an empty mlp_only_layers list will have dense MLP layers, but this condition will be false, incorrectly omitting gate_up_proj from the mapping.

A more robust approach is to check if not all layers are sparse MoE layers. This is the case if mlp_only_layers is non-empty, or if there are no experts, or if decoder_sparse_step is not 1. The suggested change below addresses both issues.

Suggested change

# Only perform the following mapping when Qwen3MoeMLP exists

if getattr(config, "mlp_only_layers", []):

self.packed_modules_mapping["gate_up_proj"] = (

[

"gate_proj",

"up_proj",

],

)

# Create a copy of the mapping to avoid modifying the class attribute.

self.packed_modules_mapping = self.packed_modules_mapping.copy()

# Conditionally add gate_up_proj if dense MLP layers exist. A model has

# dense MLP layers if not all layers are sparse MoE layers.

if (bool(getattr(config, "mlp_only_layers", [])) or

getattr(config, "num_experts", 0) == 0 or

getattr(config, "decoder_sparse_step", 1) != 1):

self.packed_modules_mapping["gate_up_proj"] = [

"gate_proj",

"up_proj",

]

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-11T11:45:38Z

vllm/model_executor/models/qwen2_moe.py

+            self.packed_modules_mapping["gate_up_proj"] = (
+                [
+                    "gate_proj",
+                    "up_proj",
+                ],


Assign gate_up_proj mapping as list, not tuple

The new conditional adds gate_up_proj using self.packed_modules_mapping["gate_up_proj"] = (["gate_proj", "up_proj"],). Because of the parentheses and trailing comma this stores a tuple whose only element is a list, while the rest of the quantization helpers expect dict[str, list[str]]. When the mapping is consumed (e.g., get_layer_partition_names or LoRA utilities), the tuple is iterated and the list itself is passed to string operations such as removesuffix/replace, raising a TypeError. Any model with mlp_only_layers set will fail during packed-module handling. Assign the list directly without wrapping it in a tuple.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2025-10-11T11:45:38Z

vllm/model_executor/models/qwen3_moe.py

+        if getattr(config, "mlp_only_layers", []):
+            self.packed_modules_mapping["gate_up_proj"] = (
+                [
+                    "gate_proj",
+                    "up_proj",


Avoid tuple-wrapping gate_up_proj mapping

Same issue as above: self.packed_modules_mapping["gate_up_proj"] is assigned (["gate_proj", "up_proj"],), producing a tuple instead of the list that downstream quantization and LoRA helpers expect. Iterating this mapping yields the list itself and causes type errors when string concatenation is attempted, so models with mlp_only_layers enabled will crash when retrieving partition names or applying packed transforms. Assign a plain list here.

Useful? React with 👍 / 👎.

DarkLight1337

Hmm, packed_modules_mapping is supposed to be a ClassVar. Would editing it after initialization cause problems?

jeejeelee · 2025-10-11T12:15:01Z

Previously checked, packed_modules_mapping is only used after instantiation, so I think it should be safe.

DarkLight1337 · 2025-10-11T12:24:05Z

Can you update the interface definition then?

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee · 2025-10-11T12:37:55Z

Done in bbf7ef7

Isotr0py · 2025-10-11T12:51:30Z

vllm/model_executor/models/qwen2_moe.py

+        # Only perform the following mapping when Qwen2MoeMLP exists
+        if getattr(config, "mlp_only_layers", []):
+            self.packed_modules_mapping["gate_up_proj"] = (
+                [
+                    "gate_proj",
+                    "up_proj",
+                ],


I think this condition doesn't really fit Qwen2MoE's case. Because Qwen2MoE will also have shared expert needed packing inside sparse moe block:

vllm/vllm/model_executor/models/qwen2_moe.py

Lines 145 to 156 in a25f2ad

if config.shared_expert_intermediate_size > 0:

self.shared_expert = Qwen2MoeMLP(

hidden_size=config.hidden_size,

intermediate_size=config.shared_expert_intermediate_size,

hidden_act=config.hidden_act,

quant_config=quant_config,

reduce_results=False,

expert_gate=self.shared_expert_gate,

prefix=f"{prefix}.shared_expert",

)

else:

self.shared_expert = None

Good catch， fixed in 2b0ae9a

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

hmellor · 2025-10-12T09:53:45Z

The Codex review was correct. You have assigned a tuple[list[str]] to the value in packed_modules_mapping when it should be just list[str].

This PR causes pytest tests/evals/gsm8k/test_gsm8k_correctness.py::test_gsm8k_correctness_param[Qwen1.5-MoE-W4A16-CT-tp1] to be unrunnable.

hmellor · 2025-10-12T10:29:48Z

I have fixed it in #26633, but I still can't seem to run that eval test. I get:

[core.py:792] EngineCore encountered a fatal error.
[core.py:792] Traceback (most recent call last):
[core.py:792]   File "/home/harry/vllm/vllm/v1/engine/core.py", line 783, in run_engine_core
[core.py:792]     engine_core.run_busy_loop()
[core.py:792]   File "/home/harry/vllm/vllm/v1/engine/core.py", line 810, in run_busy_loop
[core.py:792]     self._process_engine_step()
[core.py:792]   File "/home/harry/vllm/vllm/v1/engine/core.py", line 839, in _process_engine_step
[core.py:792]     outputs, model_executed = self.step_fn()
[core.py:792]                               ^^^^^^^^^^^^^^
[core.py:792]   File "/home/harry/vllm/vllm/v1/engine/core.py", line 320, in step
[core.py:792]     scheduler_output = self.scheduler.schedule()
[core.py:792]                        ^^^^^^^^^^^^^^^^^^^^^^^^^
[core.py:792]   File "/home/harry/vllm/vllm/v1/core/sched/scheduler.py", line 256, in schedule
[core.py:792]     new_blocks = self.kv_cache_manager.allocate_slots(
[core.py:792]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[core.py:792]   File "/home/harry/vllm/vllm/v1/core/kv_cache_manager.py", line 317, in allocate_slots
[core.py:792]     self.coordinator.cache_blocks(request, num_tokens_to_cache)
[core.py:792]   File "/home/harry/vllm/vllm/v1/core/kv_cache_coordinator.py", line 138, in cache_blocks
[core.py:792]     manager.cache_blocks(request, num_computed_tokens)
[core.py:792]   File "/home/harry/vllm/vllm/v1/core/single_type_kv_cache_manager.py", line 156, in cache_blocks
[core.py:792]     self.block_pool.cache_full_blocks(
[core.py:792]   File "/home/harry/vllm/vllm/v1/core/block_pool.py", line 232, in cache_full_blocks
[core.py:792]     assert blk.block_hash is None
[core.py:792]            ^^^^^^^^^^^^^^^^^^^^^^
[core.py:792] AssertionError
[async_llm.py:518] AsyncLLM output_handler failed.
[async_llm.py:518] Traceback (most recent call last):
[async_llm.py:518]   File "/home/harry/vllm/vllm/v1/engine/async_llm.py", line 472, in output_handler
[async_llm.py:518]     outputs = await engine_core.get_output_async()
[async_llm.py:518]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[async_llm.py:518]   File "/home/harry/vllm/vllm/v1/engine/core_client.py", line 882, in get_output_async
[async_llm.py:518]     raise self._format_exception(outputs) from None
[async_llm.py:518] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.

But can't reproduce it outside of this test.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Done

1ddac43

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee requested a review from sighingnow as a code owner October 11, 2025 11:43

jeejeelee requested review from DarkLight1337 and Isotr0py October 11, 2025 11:43

mergify bot added the qwen Related to Qwen models label Oct 11, 2025

gemini-code-assist bot reviewed Oct 11, 2025

View reviewed changes

chatgpt-codex-connector bot reviewed Oct 11, 2025

View reviewed changes

DarkLight1337 reviewed Oct 11, 2025

View reviewed changes

jeejeelee added 2 commits October 11, 2025 12:37

Fix

bbf7ef7

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into fix-qwen-moe-mapping

5b9950c

DarkLight1337 approved these changes Oct 11, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) October 11, 2025 12:44

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 11, 2025

Isotr0py reviewed Oct 11, 2025

View reviewed changes

DarkLight1337 disabled auto-merge October 11, 2025 12:57

Fix

2b0ae9a

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Isotr0py approved these changes Oct 11, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) October 11, 2025 13:34

Isotr0py merged commit f0a30a0 into vllm-project:main Oct 11, 2025
54 checks passed

jeejeelee deleted the fix-qwen-moe-mapping branch October 11, 2025 16:06

1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025

[Bugfix] Fix qwen-moe packed_modules_mapping (vllm-project#26634)

9177b52

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Bugfix] Fix qwen-moe packed_modules_mapping (vllm-project#26634)

477d821

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025

[Bugfix] Fix qwen-moe packed_modules_mapping (vllm-project#26634)

15889bd

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bugfix] Fix qwen-moe packed_modules_mapping (vllm-project#26634)

a4343b4

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[Bugfix] Fix qwen-moe packed_modules_mapping (vllm-project#26634)

42ecc25

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Bugfix] Fix qwen-moe packed_modules_mapping (vllm-project#26634)

bd123dc

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

[Bugfix] Fix qwen-moe packed_modules_mapping (vllm-project#26634)

e78df76

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

-        # Only perform the following mapping when Qwen3MoeMLP exists
-        if getattr(config, "mlp_only_layers", []):
-            self.packed_modules_mapping["gate_up_proj"] = (
-                [
-                    "gate_proj",
-                    "up_proj",
-                ],
-            )
+        # Create a copy of the mapping to avoid modifying the class attribute.
+        self.packed_modules_mapping = self.packed_modules_mapping.copy()
+        # Conditionally add gate_up_proj if dense MLP layers exist. A model has
+        # dense MLP layers if not all layers are sparse MoE layers.
+        if (bool(getattr(config, "mlp_only_layers", [])) or
+                getattr(config, "num_experts", 0) == 0 or
+                getattr(config, "decoder_sparse_step", 1) != 1):
+            self.packed_modules_mapping["gate_up_proj"] = [
+                "gate_proj",
+                "up_proj",
+            ]

	if config.shared_expert_intermediate_size > 0:
	self.shared_expert = Qwen2MoeMLP(
	hidden_size=config.hidden_size,
	intermediate_size=config.shared_expert_intermediate_size,
	hidden_act=config.hidden_act,
	quant_config=quant_config,
	reduce_results=False,
	expert_gate=self.shared_expert_gate,
	prefix=f"{prefix}.shared_expert",
	)
	else:
	self.shared_expert = None

Uh oh!

Conversation

jeejeelee commented Oct 11, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Oct 11, 2025

Uh oh!

DarkLight1337 commented Oct 11, 2025

Uh oh!

jeejeelee commented Oct 11, 2025

Uh oh!

Isotr0py Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hmellor commented Oct 12, 2025

Uh oh!

hmellor commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeejeelee commented Oct 11, 2025 •

edited by github-actions bot

Loading