Skip to content

fix: AttributeError for Qwen3_omni_moe#43593

Merged
zucchini-nlp merged 2 commits intohuggingface:mainfrom
Vallabh-1504:fix-attribute-error
Feb 4, 2026
Merged

fix: AttributeError for Qwen3_omni_moe#43593
zucchini-nlp merged 2 commits intohuggingface:mainfrom
Vallabh-1504:fix-attribute-error

Conversation

@Vallabh-1504
Copy link
Contributor

What does this PR do?

This PR fixes a crash when initializing Qwen3OmniMoeTalkerCodePredictorConfig due to a missing attribute reference.

Specifically, it:

  1. Removes the reference to the non-existent use_sliding_window attribute, which was causing an AttributeError.
  2. Adds the missing max_window_layers initialization (defaulting to 28). The existing layer_types logic relies on this attribute, and without it, sliding_window causes a secondary crash.
  3. Adds a new test case (test_code_predictor_config_init) to ensure the configuration initializes correctly.

Fixes #43531

Before submitting

Who can review?

@Rocketknight1
Copy link
Member

cc @vasqu since you commented on the issue!

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_window_layers seems like an unnecessary attribute added + let's move the test, please no new files

rope_parameters: int | None = None,
attention_bias: bool | None = False,
sliding_window: int | None = None,
max_window_layers: int = 28,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems irrelevant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should be still under test_modeling_xxx

@Vallabh-1504
Copy link
Contributor Author

Vallabh-1504 commented Jan 29, 2026

hi, @vasqu, thanks for the review.
when I was running pytest I was getting an error regarding max_window_layers.
This attribute is used in the layer_types list comprehension (around line 612).

    def __getattribute__(self, key):
        if key != "attribute_map" and key in super().__getattribute__("attribute_map"):
            key = super().__getattribute__("attribute_map")[key]
>       return super().__getattribute__(key)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E       AttributeError: 'Qwen3OmniMoeTalkerCodePredictorConfig' object has no attribute 'max_window_layers'

src\transformers\configuration_utils.py:163: AttributeError

self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.sliding_window = sliding_window if self.use_sliding_window else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I checked what happened and it seems it was unintentionally added in #41541, but since I'm not super familiar with this model I'd rather wait for @zucchini-nlp to answer here

Imo we should just do self.sliding_window = sliding_window (use_sliding_window was never used at all and should be removed from the docstring) - max_window_layers should be removed alongside it (not reintroduced)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the changes should be done in modular and then reapplied via python utils/modular_model_converter.py qwen3_omni_moe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed a bad copy, no need for use_sliding_window. Model always uses sliding layers together with full attention

**kwargs,
):
self.sliding_window = sliding_window
self.max_window_layers = max_window_layers
Copy link
Member

@zucchini-nlp zucchini-nlp Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is a recurring pattern in many qwen-multimodal models and I think authors incorrectly copied it when adding Qwen3-Omni-MoE. The official checkpoints have max_window_layers saved in config therefore we don't see errore at inference

IMO we need to keep it to match the model's default behavior and to match with docstring

@zucchini-nlp
Copy link
Member

@Vallabh-1504 hey, any updates on the PR? Would be great to merge it before the next planned release, which I believe should be around this week

@zucchini-nlp zucchini-nlp added the for patch Tag issues / labels that should be included in the next patch label Feb 3, 2026
@Vallabh-1504
Copy link
Contributor Author

hi @zucchini-nlp, I was working on it but hitting a bit of a wall.

I was applying changes through modular_qwen3_omni_moe.py, as previously i did manual changes which was wrong. but for some reason, the modular_model_converter.py keeps regenerating the use_sliding_window logic in the final config, even after i removed it from modular_qwen3_omni_moe.py.

kinda stuck in a loop where the generator keeps the code i'm trying to remove.

I also don't know what to do with the max_window_layers, as this is also not initialized anywhere.

Can you guide me through this. If i'm missing a step or if there's some cache with the converter?

@zucchini-nlp
Copy link
Member

Smth like this should work, we need to delete unused attributed explicitly and re-assign the "differing" attr. For max_window_layers, let's keep it without deleting

https://github.com/Vallabh-1504/transformers/blob/9041720191cdef9348f09f7c1695db606223d353/src/transformers/models/qwen3_omni_moe/modular_qwen3_omni_moe.py#L571-L572

@Vallabh-1504
Copy link
Contributor Author

@zucchini-nlp, I have applied the changes as requested!

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny comment and let's wait for @vasqu's review

self.num_code_groups = num_code_groups
self.vocab_size = vocab_size
self.max_position_embeddings = max_position_embeddings
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.sliding_window = sliding_window if self.use_sliding_window else None
self.sliding_window = sliding_window
self.max_window_layers = max_window_layers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: duplicate

@@ -473,6 +475,7 @@ def __init__(
**kwargs,
):
self.sliding_window = sliding_window
self.max_window_layers = max_window_layers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed, super assigns it already which is why we got duplicates in auto-generated code :)

Comment on lines +917 to +924
class TestQwen3OmniMoeCodePredictorConfig(unittest.TestCase):
def test_code_predictor_config_init(self):
"""
Test that Qwen3OmniMoeTalkerCodePredictorConfig initializes correctly
and accepts max_window_layers while removing use_sliding_window.
"""

config = Qwen3OmniMoeTalkerCodePredictorConfig(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally we need a complete model test for the 'TalkerModel' model. But I realize that the model doesn't follow transformers standards and we'll skip anyway many tests from ModelTesterMixin, or override a lot of them

I'm fine with deleting the current test in that case, @vasqu wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, we can add a small regression test under

class Qwen3OmniMoeThinkerForConditionalGenerationTester:
at least, we don't need a separate class for this

Just a tad weird because the naming is weird but if we don't test it we ought to repeat it in a refactor 😬

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have much to add onto @zucchini-nlp's comments, I'd just prefer to move the test under the general tester class

Comment on lines +917 to +924
class TestQwen3OmniMoeCodePredictorConfig(unittest.TestCase):
def test_code_predictor_config_init(self):
"""
Test that Qwen3OmniMoeTalkerCodePredictorConfig initializes correctly
and accepts max_window_layers while removing use_sliding_window.
"""

config = Qwen3OmniMoeTalkerCodePredictorConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, we can add a small regression test under

class Qwen3OmniMoeThinkerForConditionalGenerationTester:
at least, we don't need a separate class for this

Just a tad weird because the naming is weird but if we don't test it we ought to repeat it in a refactor 😬

@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe

@zucchini-nlp zucchini-nlp enabled auto-merge (squash) February 4, 2026 10:36
@zucchini-nlp zucchini-nlp merged commit 257a602 into huggingface:main Feb 4, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sliding_window issue with Qwen3-MoE models

5 participants