Skip to content

Fix Qwen3OmniMoe Talker loading and config initialization#43091

Closed
Krish2002 wants to merge 10 commits intohuggingface:mainfrom
Krish2002:fix-qwen3-omni-moe-loading
Closed

Fix Qwen3OmniMoe Talker loading and config initialization#43091
Krish2002 wants to merge 10 commits intohuggingface:mainfrom
Krish2002:fix-qwen3-omni-moe-loading

Conversation

@Krish2002
Copy link
Copy Markdown

What does this PR do?

This PR fixes two issues that prevented the model
Qwen/Qwen3-Omni-30B-A3B-Instruct from loading correctly with
AutoModelForMultimodalLM.


Fix 1: AttributeError: Qwen3OmniMoeTalkerForConditionalGeneration has no attribute 'lm_head'

Issue

Qwen3OmniMoeTalkerForConditionalGeneration deletes lm_head in its __init__
method, but it inherits _tied_weights_keys from its parent class
(Qwen3MoeForCausalLM), which references lm_head.weight.

During model loading, mark_tied_weights_as_initialized() attempts to access
lm_head.weight, resulting in an AttributeError.

Fix

Explicitly set _tied_weights_keys = {} for
Qwen3OmniMoeTalkerForConditionalGeneration, since this model does not use tied
weights.

This change is implemented in:

  • src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py

Fix 2: AttributeError: ... object has no attribute 'initializer_range'

Issue

Several composite config classes were missing the initializer_range attribute:

  • Qwen3OmniMoeTalkerConfig
  • Qwen3OmniMoeCode2WavConfig
  • Qwen3OmniMoeConfig

When _initialize_missing_keys() runs during model loading, it may call
_init_weights() for modules that are not loaded from the checkpoint.
_init_weights() expects self.config.initializer_range, which caused an
AttributeError.

Fix

Add an initializer_range parameter (default: 0.02) and store it as an
attribute in the affected config classes.

This change is implemented in:

  • src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py

Breaking changes

None.


Tests

Added a unit-level regression test in:

tests/models/qwen3_omni_moe/test_configuration_and_loading.py

The test verifies that:

  • Qwen3OmniMoeTalkerForConditionalGeneration has empty tied weight keys.
  • Qwen3OmniMoeTalkerConfig, Qwen3OmniMoeCode2WavConfig, and
    Qwen3OmniMoeConfig all define the initializer_range attribute.

Additionally, verified locally that the model loads successfully using the
following reproduction script:

from transformers import AutoModelForMultimodalLM
import torch

model = AutoModelForMultimodalLM.from_pretrained(
    "Qwen/Qwen3-Omni-30B-A3B-Instruct",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jan 5, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe

@Rocketknight1
Copy link
Copy Markdown
Member

cc @zucchini-nlp

@zucchini-nlp
Copy link
Copy Markdown
Member

resolved in #43084, closing as duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants