fix: AttributeError for Qwen3_omni_moe#43593
Conversation
|
cc @vasqu since you commented on the issue! |
vasqu
left a comment
There was a problem hiding this comment.
max_window_layers seems like an unnecessary attribute added + let's move the test, please no new files
| rope_parameters: int | None = None, | ||
| attention_bias: bool | None = False, | ||
| sliding_window: int | None = None, | ||
| max_window_layers: int = 28, |
There was a problem hiding this comment.
The test should be still under test_modeling_xxx
|
hi, @vasqu, thanks for the review. |
| self.intermediate_size = intermediate_size | ||
| self.num_hidden_layers = num_hidden_layers | ||
| self.num_attention_heads = num_attention_heads | ||
| self.sliding_window = sliding_window if self.use_sliding_window else None |
There was a problem hiding this comment.
Ok I checked what happened and it seems it was unintentionally added in #41541, but since I'm not super familiar with this model I'd rather wait for @zucchini-nlp to answer here
Imo we should just do self.sliding_window = sliding_window (use_sliding_window was never used at all and should be removed from the docstring) - max_window_layers should be removed alongside it (not reintroduced)
There was a problem hiding this comment.
Also the changes should be done in modular and then reapplied via python utils/modular_model_converter.py qwen3_omni_moe
There was a problem hiding this comment.
Indeed a bad copy, no need for use_sliding_window. Model always uses sliding layers together with full attention
| **kwargs, | ||
| ): | ||
| self.sliding_window = sliding_window | ||
| self.max_window_layers = max_window_layers |
There was a problem hiding this comment.
it is a recurring pattern in many qwen-multimodal models and I think authors incorrectly copied it when adding Qwen3-Omni-MoE. The official checkpoints have max_window_layers saved in config therefore we don't see errore at inference
IMO we need to keep it to match the model's default behavior and to match with docstring
|
@Vallabh-1504 hey, any updates on the PR? Would be great to merge it before the next planned release, which I believe should be around this week |
|
hi @zucchini-nlp, I was working on it but hitting a bit of a wall. I was applying changes through kinda stuck in a loop where the generator keeps the code i'm trying to remove. I also don't know what to do with the Can you guide me through this. If i'm missing a step or if there's some cache with the converter? |
|
Smth like this should work, we need to delete unused attributed explicitly and re-assign the "differing" attr. For |
9041720 to
d75ecdb
Compare
|
@zucchini-nlp, I have applied the changes as requested! |
zucchini-nlp
left a comment
There was a problem hiding this comment.
One tiny comment and let's wait for @vasqu's review
| self.num_code_groups = num_code_groups | ||
| self.vocab_size = vocab_size | ||
| self.max_position_embeddings = max_position_embeddings | ||
| self.hidden_size = hidden_size | ||
| self.intermediate_size = intermediate_size | ||
| self.num_hidden_layers = num_hidden_layers | ||
| self.num_attention_heads = num_attention_heads | ||
| self.sliding_window = sliding_window if self.use_sliding_window else None | ||
| self.sliding_window = sliding_window | ||
| self.max_window_layers = max_window_layers |
| @@ -473,6 +475,7 @@ def __init__( | |||
| **kwargs, | |||
| ): | |||
| self.sliding_window = sliding_window | |||
| self.max_window_layers = max_window_layers | |||
There was a problem hiding this comment.
not needed, super assigns it already which is why we got duplicates in auto-generated code :)
| class TestQwen3OmniMoeCodePredictorConfig(unittest.TestCase): | ||
| def test_code_predictor_config_init(self): | ||
| """ | ||
| Test that Qwen3OmniMoeTalkerCodePredictorConfig initializes correctly | ||
| and accepts max_window_layers while removing use_sliding_window. | ||
| """ | ||
|
|
||
| config = Qwen3OmniMoeTalkerCodePredictorConfig( |
There was a problem hiding this comment.
ideally we need a complete model test for the 'TalkerModel' model. But I realize that the model doesn't follow transformers standards and we'll skip anyway many tests from ModelTesterMixin, or override a lot of them
I'm fine with deleting the current test in that case, @vasqu wdyt?
There was a problem hiding this comment.
Imo, we can add a small regression test under
at least, we don't need a separate class for thisJust a tad weird because the naming is weird but if we don't test it we ought to repeat it in a refactor 😬
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
vasqu
left a comment
There was a problem hiding this comment.
Don't have much to add onto @zucchini-nlp's comments, I'd just prefer to move the test under the general tester class
| class TestQwen3OmniMoeCodePredictorConfig(unittest.TestCase): | ||
| def test_code_predictor_config_init(self): | ||
| """ | ||
| Test that Qwen3OmniMoeTalkerCodePredictorConfig initializes correctly | ||
| and accepts max_window_layers while removing use_sliding_window. | ||
| """ | ||
|
|
||
| config = Qwen3OmniMoeTalkerCodePredictorConfig( |
There was a problem hiding this comment.
Imo, we can add a small regression test under
at least, we don't need a separate class for thisJust a tad weird because the naming is weird but if we don't test it we ought to repeat it in a refactor 😬
|
[For maintainers] Suggested jobs to run (before merge) run-slow: qwen3_omni_moe |
What does this PR do?
This PR fixes a crash when initializing
Qwen3OmniMoeTalkerCodePredictorConfigdue to a missing attribute reference.Specifically, it:
use_sliding_windowattribute, which was causing anAttributeError.max_window_layersinitialization (defaulting to28). The existinglayer_typeslogic relies on this attribute, and without it,sliding_windowcauses a secondary crash.test_code_predictor_config_init) to ensure the configuration initializes correctly.Fixes #43531
Before submitting
Pull Request section?
to it if that's the case. (Issue
sliding_windowissue with Qwen3-MoE models #43531)documentation guidelines, and
here are tips on formatting docstrings.
Who can review?