Skip to content

[Bugfix][ROCm] Fix WNA16 MoE quant config init and Qwen3-VL tie_word_embeddings#34630

Closed
laudney wants to merge 2 commits intovllm-project:mainfrom
mmonad:fix/rocm-moe-bugfixes
Closed

[Bugfix][ROCm] Fix WNA16 MoE quant config init and Qwen3-VL tie_word_embeddings#34630
laudney wants to merge 2 commits intovllm-project:mainfrom
mmonad:fix/rocm-moe-bugfixes

Conversation

@laudney
Copy link
Copy Markdown
Contributor

@laudney laudney commented Feb 16, 2026

Summary

Two small bug fixes found while testing on ROCm/RDNA4:

  • WNA16 MoE quant config not initialized before first apply(): The FusedMoEQuantConfig was not being set up before the first forward pass in the WNA16 quantization path, causing failures on first inference.
  • Qwen3MoeLLMForCausalLM tie_word_embeddings AttributeError: Some Qwen3-VL MoE checkpoint configs lack the tie_word_embeddings field entirely. Changed direct attribute access to getattr(..., False) for safety.

Both fixes are defensive and should not affect existing behavior on any platform.

Test plan

  • Verify WNA16 MoE models load and run inference without error
  • Verify Qwen3-VL MoE models with and without tie_word_embeddings in config
  • Existing CI should pass (no behavioral change for configs that have the field)

L.B.R. added 2 commits February 16, 2026 17:06
Both MoeWNA16Method and CompressedTensorsWNA16MoEMethod pass
self.moe_quant_config to fused_experts() without ensuring it has been
initialized. When it is still None, fused_experts() falls back to
FUSED_MOE_UNQUANTIZED_CONFIG (use_int4_w4a16=False), making the int4
packed weight dimension assertion fail (hidden_size 2048 != w1 1024).

Add lazy init guard in both apply() methods so the quant config is
built on first use if ensure_moe_quant_config_init() hasn't run yet.
Some Qwen3-VL MoE configs lack tie_word_embeddings, causing
AttributeError during model init. Use getattr with False default.
@mergify mergify bot added qwen Related to Qwen models rocm Related to AMD ROCm bug Something isn't working labels Feb 16, 2026
@github-project-automation github-project-automation bot moved this to Todo in AMD Feb 16, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces defensive bug fixes for ROCm/RDNA4 environments and Qwen3-VL MoE models. Specifically, it addresses an issue where the FusedMoEQuantConfig was not initialized before the first apply() call in the WNA16 quantization path, which could lead to incorrect kernel execution or failures during the first inference pass, especially when using torch.compile. Additionally, it adds safety to the tie_word_embeddings attribute access in Qwen3MoeLLMForCausalLM to prevent AttributeError when the field is missing from checkpoint configurations. These changes improve the robustness of the model executor without altering existing behavior for standard configurations.

Comment on lines +1985 to +1986
if self.moe_quant_config is None:
self.moe_quant_config = self.get_fused_moe_quant_config(layer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The lazy initialization of moe_quant_config here is critical for correctness when the standard initialization sequence is bypassed, such as during the first compiled forward pass. Without this, fused_experts would default to an unquantized configuration, leading to incorrect results for WNA16 quantized layers.

Comment on lines +382 to +383
if self.moe_quant_config is None:
self.moe_quant_config = self.get_fused_moe_quant_config(layer)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the fix in compressed_tensors_moe.py, this lazy initialization ensures that the quantization configuration is available before the first kernel invocation. This is particularly important for backends that rely on fused_experts receiving a valid quant_config to select the appropriate optimized kernels.

prefix=maybe_prefix(prefix, "lm_head"),
)
if self.config.tie_word_embeddings:
if getattr(self.config, "tie_word_embeddings", False):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using getattr with a default value of False is a safer approach for accessing tie_word_embeddings. This prevents potential AttributeError crashes when loading checkpoints that do not explicitly define this field in their configuration, which has been observed in some Qwen3-VL MoE variants.

@robertgshaw2-redhat
Copy link
Copy Markdown
Collaborator

Do you have this Pr in your branch?

I think that this should have solved the quant config issue

@laudney
Copy link
Copy Markdown
Contributor Author

laudney commented Feb 16, 2026

Thanks for the pointer — PR #34371 does cover the WNA16 quant config init issue. The ensure_moe_quant_config_init() call in _moe_forward/_moe_forward_shared runs before forward_impl on all production paths, making the lazy-init guard in apply() redundant.

The other change here (defensive getattr for tie_word_embeddings) is speculative and inconsistent with the rest of the codebase — not worth keeping. Closing this PR.

@laudney laudney closed this Feb 16, 2026
@github-project-automation github-project-automation bot moved this from Todo to Done in AMD Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working qwen Related to Qwen models rocm Related to AMD ROCm

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants