Skip to content

Fix few issues in Qwen_3_Omni_Moe#44848

Merged
ydshieh merged 13 commits intohuggingface:mainfrom
Sai-Suraj-27:fix_qwen3_omni_config
Mar 30, 2026
Merged

Fix few issues in Qwen_3_Omni_Moe#44848
ydshieh merged 13 commits intohuggingface:mainfrom
Sai-Suraj-27:fix_qwen3_omni_config

Conversation

@Sai-Suraj-27
Copy link
Copy Markdown
Contributor

@Sai-Suraj-27 Sai-Suraj-27 commented Mar 19, 2026

What does this PR do?

Update Qwen3_Omni_Moe, to fix these attribute errors Qwen3OmniModelIntegrationTests

image

Almost same issue was fixed initally in #43084 but the config refactor in #41250 dropped/missed the initializer_range from Qwen3OmniMoeConfig.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@Rocketknight1 @vasqu

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 19, 2026

run-slow: qwen3_omni_moe

vasqu
vasqu previously approved these changes Mar 19, 2026
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, checking with CI 🫡

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 19, 2026

Ok it fixes one issue and reveals some other ones 😓 can you recheck or rather rename the PR since it does unblock partially at least

@vasqu vasqu dismissed their stale review March 19, 2026 08:39

Slow tests still fail

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 49a84fd9 workflow commit (merge commit)
PR f4a7c27a branch commit (from PR)
main 529504b2 base commit (on main)

Model CI Report

3 new failed tests from this PR 😭

  • qwen3_omni_moe:
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test (❌ ⟹ ❌)
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_batch (❌ ⟹ ❌)
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_w_audio (❌ ⟹ ❌)

@Sai-Suraj-27
Copy link
Copy Markdown
Contributor Author

Ok it fixes one issue and reveals some other ones 😓 can you recheck or rather rename the PR since it does unblock partially at least

Hey, @vasqu. Gave fix to this one, atleast this multiple values for argument 'next_sequence_length' error should be gone now. Can you check & trigger run-slow 👀.

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 19, 2026

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN d6da4c2c workflow commit (merge commit)
PR 68ca6b4a branch commit (from PR)
main be8d8a4c base commit (on main)

Model CI Report

3 new failed tests from this PR 😭

  • qwen3_omni_moe:
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test (❌ ⟹ ❌)
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_batch (❌ ⟹ ❌)
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_w_audio (❌ ⟹ ❌)

@Sai-Suraj-27
Copy link
Copy Markdown
Contributor Author

Hey, Pushed a potential fix for these. I think The _no_split_modules should cover the full AudioEncoder & VisionEncoder so that device_map="auto" will not keep child modules of Qwen3OmniMoeAudioEncoder on separate devices incase of multi-gpu env. Followed similar to Qwen2_5Omni.

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 25, 2026

run-slow: qwen3_omni_moe

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 25, 2026

Sorry, I was off for a few days. Now back 🤗 @Sai-Suraj-27 checking run-slow

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3bf2b531 workflow commit (merge commit)
PR a04a9b98 branch commit (from PR)
main 28af8184 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 25, 2026

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN c5f85c8b workflow commit (merge commit)
PR 28d3bd61 branch commit (from PR)
main 2f624917 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 25, 2026

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 3d85820e workflow commit (merge commit)
PR 47123f8d branch commit (from PR)
main c9faacd7 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 26, 2026

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 11fa0cf6 workflow commit (merge commit)
PR 22f647b1 branch commit (from PR)
main da37a4d9 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 27, 2026

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 27, 2026

no longer crash, but just

FAILED tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test - AssertionError: "user[101 chars]on, here is a breakdown of what you're hearing and seeing:\n\n" != "user[101 chars]on, here is a breakdown of what you're hearing and seeing:-"
  user
  What's that sound and what kind of dog is this?
  assistant
- Based on the audio and visual information, here is a breakdown of what you're hearing and seeing:
?                                                                                                  ^
+ Based on the audio and visual information, here is a breakdown of what you're hearing and seeing:-?                                                                                                  ^
-
FAILED tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_batch - AssertionError: Lists differ: ["use[99 chars]ation provided:\n\nThe sound you hear is the d[191 chars]hed"] != ["use[99 chars]ation, here is a breakdown of what you're hear[187 chars]n\n"]

First differing element 0:
"user[98 chars]ation provided:\n\nThe sound you hear is the d[17 chars]ched"
"user[98 chars]ation, here is a breakdown of what you're hear[15 chars]\n\n"

Diff is 672 characters long. Set self.maxDiff to None to see it.
FAILED tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_w_audio - RuntimeError: Tensor on device meta is not on the expected device cuda:0!

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 27, 2026

Yea, seems reasonable - the test didn't run at all before and crashed, this PR at least let's the integration tests produce output again @ydshieh

Can we change the title tho @Sai-Suraj-27? Also looks like the meta device one is not specific to the multi-gpu case (which we talked about before)

@Sai-Suraj-27
Copy link
Copy Markdown
Contributor Author

Yea, seems reasonable - the test didn't run at all before and crashed, this PR at least let's the integration tests produce output again @ydshieh

Can we change the title tho @Sai-Suraj-27? Also looks like the meta device one is not specific to the multi-gpu case (which we talked about before)

Yes, but for the text expectation mismatch failures, should I try & update the expected text maybe?

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 27, 2026

Nope, not for now - imo I would like to have a failure for now / xmark. Something somewhere changed and arbitrarily changing the values is not good

@Sai-Suraj-27 Sai-Suraj-27 changed the title Fix failing Qwen3OmniModelIntegrationTests Fix few issues in Qwen_3_Omni_Moe Mar 27, 2026
@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Mar 27, 2026

Do you want to investigate the meta device issue? Otherwise, I would merge as is for now

@Sai-Suraj-27
Copy link
Copy Markdown
Contributor Author

Do you want to investigate the meta device issue? Otherwise, I would merge as is for now

Sure, Let me check that over the weekend.

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN b57b7dab workflow commit (merge commit)
PR 22f647b1 branch commit (from PR)
main 7b00e3ba base commit (on main)

Model CI Report

3 new failed tests from this PR 😭

  • qwen3_omni_moe:
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test (❌ ⟹ ❌)
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_batch (❌ ⟹ ❌)
    tests/models/qwen3_omni_moe/test_modeling_qwen3_omni_moe.py::Qwen3OmniModelIntegrationTest::test_small_model_integration_test_w_audio (❌ ⟹ ❌)

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 30, 2026

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe

@ydshieh
Copy link
Copy Markdown
Collaborator

ydshieh commented Mar 30, 2026

@Sai-Suraj-27 To move fast, I pushed some commits that should work well (on our CI runner), including the fixes for meta device.

I will merge once @vasqu have a final look 🙏 .

Thanks again for the work !

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/qwen3_omni_moe"]
quantizations: []

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will also try to investigate a bit more later because something clearly goes wrong within the model

)
self.assertFalse(torch.isnan(output[1]).any().item())

@run_first
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we want this to run first?


if "inputs_embeds" in model_kwargs:
return torch.ones((batch_size, 0), dtype=torch.long, device=self.device)
return torch.ones(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comment with reference to here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohoh yes, forgot before push

Copy link
Copy Markdown
Contributor Author

@Sai-Suraj-27 Sai-Suraj-27 Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @ydshieh. Thanks for pushing the fix. I was able to run the test on RTX PRO 6000, & it's running fine without the meta device issue. But incase of A-10 GPU the device_map="auto" is offloading the talker module to CPU & iiuc from accelerate code, it keeps the parameters of cpu/disk offloaded modules as meta tensors (which is why model.talker.device is giving "meta" in case of A10) & only loads the real-weights on to the GPU later just before forward.

Since the test ran fine on the big gpu but failing on A10, I think, I can confrim with this fix & that the issue is with how we are using self.device in this method. So, Maybe we can add a comment regarding this accelerate behaviour here pointing to this accelerate code.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Sai-Suraj-27

a comment is added (a few line below)

                # Use the device of the existing tensor to avoid any potential `meta` device isssue.
                # See PR #44848. (Previously, it used `self.device`.)

I think it's enough with the reference to this PR.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44848&sha=0865d5

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN e36b0474 workflow commit (merge commit)
PR 09d23fa4 branch commit (from PR)
main 02063e68 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_omni_moe

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=44848&sha=f57a22

@ydshieh ydshieh merged commit 813c7c6 into huggingface:main Mar 30, 2026
27 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants