[Bugfix] Fix qwen-moe packed_modules_mapping#26634
[Bugfix] Fix qwen-moe packed_modules_mapping#26634Isotr0py merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request aims to fix an issue with packed_modules_mapping for Qwen MoE models. The change conditionally adds gate_up_proj to the mapping. However, the implementation introduces a critical bug by modifying a class attribute (packed_modules_mapping) from an instance, which can cause state to leak between different model instances. Additionally, the condition used to determine the existence of dense MLP layers is not robust and can fail for certain model configurations. I've provided comments with suggested fixes for both qwen2_moe.py and qwen3_moe.py to address these issues by creating an instance-specific copy of the mapping and using a more accurate condition.
| # Only perform the following mapping when Qwen2MoeMLP exists | ||
| if getattr(config, "mlp_only_layers", []): | ||
| self.packed_modules_mapping["gate_up_proj"] = ( | ||
| [ | ||
| "gate_proj", | ||
| "up_proj", | ||
| ], | ||
| ) |
There was a problem hiding this comment.
This change introduces two critical issues:
-
Modification of a class attribute:
self.packed_modules_mappingis modified in-place. Sincepacked_modules_mappingis a class attribute, this modification will affect all other instances ofQwen2MoeForCausalLM, which can lead to unexpected behavior if multiple models with different configurations are used in the same process. An instance-specific copy should be created before modification. -
Incorrect condition for MLP existence: The condition
if getattr(config, "mlp_only_layers", [])is not sufficient to determine ifQwen2MoeMLPlayers (and thusgate_up_proj) exist. For example, a model withdecoder_sparse_step > 1and an emptymlp_only_layerslist will have dense MLP layers, but this condition will be false, incorrectly omittinggate_up_projfrom the mapping.
A more robust approach is to check if not all layers are sparse MoE layers. This is the case if mlp_only_layers is non-empty, or if there are no experts, or if decoder_sparse_step is not 1. The suggested change below addresses both issues.
# Create a copy of the mapping to avoid modifying the class attribute.
self.packed_modules_mapping = self.packed_modules_mapping.copy()
# Conditionally add gate_up_proj if dense MLP layers exist. A model has
# dense MLP layers if not all layers are sparse MoE layers.
if (bool(getattr(config, "mlp_only_layers", [])) or
getattr(config, "num_experts", 0) == 0 or
getattr(config, "decoder_sparse_step", 1) != 1):
self.packed_modules_mapping["gate_up_proj"] = [
"gate_proj",
"up_proj",
]| # Only perform the following mapping when Qwen3MoeMLP exists | ||
| if getattr(config, "mlp_only_layers", []): | ||
| self.packed_modules_mapping["gate_up_proj"] = ( | ||
| [ | ||
| "gate_proj", | ||
| "up_proj", | ||
| ], | ||
| ) |
There was a problem hiding this comment.
This change introduces two critical issues:
-
Modification of a class attribute:
self.packed_modules_mappingis modified in-place. Sincepacked_modules_mappingis a class attribute, this modification will affect all other instances ofQwen3MoeForCausalLM, which can lead to unexpected behavior if multiple models with different configurations are used in the same process. An instance-specific copy should be created before modification. -
Incorrect condition for MLP existence: The condition
if getattr(config, "mlp_only_layers", [])is not sufficient to determine ifQwen3MoeMLPlayers (and thusgate_up_proj) exist. For example, a model withdecoder_sparse_step > 1and an emptymlp_only_layerslist will have dense MLP layers, but this condition will be false, incorrectly omittinggate_up_projfrom the mapping.
A more robust approach is to check if not all layers are sparse MoE layers. This is the case if mlp_only_layers is non-empty, or if there are no experts, or if decoder_sparse_step is not 1. The suggested change below addresses both issues.
| # Only perform the following mapping when Qwen3MoeMLP exists | |
| if getattr(config, "mlp_only_layers", []): | |
| self.packed_modules_mapping["gate_up_proj"] = ( | |
| [ | |
| "gate_proj", | |
| "up_proj", | |
| ], | |
| ) | |
| # Create a copy of the mapping to avoid modifying the class attribute. | |
| self.packed_modules_mapping = self.packed_modules_mapping.copy() | |
| # Conditionally add gate_up_proj if dense MLP layers exist. A model has | |
| # dense MLP layers if not all layers are sparse MoE layers. | |
| if (bool(getattr(config, "mlp_only_layers", [])) or | |
| getattr(config, "num_experts", 0) == 0 or | |
| getattr(config, "decoder_sparse_step", 1) != 1): | |
| self.packed_modules_mapping["gate_up_proj"] = [ | |
| "gate_proj", | |
| "up_proj", | |
| ] |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| self.packed_modules_mapping["gate_up_proj"] = ( | ||
| [ | ||
| "gate_proj", | ||
| "up_proj", | ||
| ], |
There was a problem hiding this comment.
Assign gate_up_proj mapping as list, not tuple
The new conditional adds gate_up_proj using self.packed_modules_mapping["gate_up_proj"] = (["gate_proj", "up_proj"],). Because of the parentheses and trailing comma this stores a tuple whose only element is a list, while the rest of the quantization helpers expect dict[str, list[str]]. When the mapping is consumed (e.g., get_layer_partition_names or LoRA utilities), the tuple is iterated and the list itself is passed to string operations such as removesuffix/replace, raising a TypeError. Any model with mlp_only_layers set will fail during packed-module handling. Assign the list directly without wrapping it in a tuple.
Useful? React with 👍 / 👎.
| if getattr(config, "mlp_only_layers", []): | ||
| self.packed_modules_mapping["gate_up_proj"] = ( | ||
| [ | ||
| "gate_proj", | ||
| "up_proj", |
There was a problem hiding this comment.
Avoid tuple-wrapping gate_up_proj mapping
Same issue as above: self.packed_modules_mapping["gate_up_proj"] is assigned (["gate_proj", "up_proj"],), producing a tuple instead of the list that downstream quantization and LoRA helpers expect. Iterating this mapping yields the list itself and causes type errors when string concatenation is attempted, so models with mlp_only_layers enabled will crash when retrieving partition names or applying packed transforms. Assign a plain list here.
Useful? React with 👍 / 👎.
DarkLight1337
left a comment
There was a problem hiding this comment.
Hmm, packed_modules_mapping is supposed to be a ClassVar. Would editing it after initialization cause problems?
|
Previously checked, |
|
Can you update the interface definition then? |
|
Done in bbf7ef7 |
| # Only perform the following mapping when Qwen2MoeMLP exists | ||
| if getattr(config, "mlp_only_layers", []): | ||
| self.packed_modules_mapping["gate_up_proj"] = ( | ||
| [ | ||
| "gate_proj", | ||
| "up_proj", | ||
| ], |
There was a problem hiding this comment.
I think this condition doesn't really fit Qwen2MoE's case. Because Qwen2MoE will also have shared expert needed packing inside sparse moe block:
vllm/vllm/model_executor/models/qwen2_moe.py
Lines 145 to 156 in a25f2ad
|
The Codex review was correct. You have assigned a This PR causes |
|
I have fixed it in #26633, but I still can't seem to run that eval test. I get: But can't reproduce it outside of this test. |
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Purpose
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.