-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Fix few issues in Qwen_3_Omni_Moe #44848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ydshieh
merged 13 commits into
huggingface:main
from
Sai-Suraj-27:fix_qwen3_omni_config
Mar 30, 2026
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
f4a7c27
Fix Qwen3OmniMoeConfig has no attribute initializer_range
Sai-Suraj-27 bf06736
Merge branch 'main' of github.com:huggingface/transformers into fix_q…
Sai-Suraj-27 68ca6b4
Fix passing of args
Sai-Suraj-27 d7a6fb3
Merge branch 'main' of github.com:huggingface/transformers into fix_q…
Sai-Suraj-27 a04a9b9
Fix no_split_modules
Sai-Suraj-27 28d3bd6
Merge branch 'main' into fix_qwen3_omni_config
vasqu 47123f8
Merge branch 'main' into fix_qwen3_omni_config
vasqu 22f647b
fix
ydshieh 45fbffe
fix and improve
ydshieh 09d23fa
format
ydshieh 0865d56
fix modular
ydshieh 3b4581a
fix modular
ydshieh f57a226
comment
ydshieh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -38,6 +38,7 @@ | |
| require_flash_attn, | ||
| require_torch, | ||
| require_torch_accelerator, | ||
| run_first, | ||
| slow, | ||
| torch_device, | ||
| ) | ||
|
|
@@ -677,7 +678,27 @@ def test_code_predictor_config_init(self): | |
|
|
||
| @require_torch | ||
| class Qwen3OmniModelIntegrationTest(unittest.TestCase): | ||
| @classmethod | ||
| def setUpClass(cls): | ||
| cls.model = None | ||
|
|
||
| @classmethod | ||
| def get_model(cls): | ||
| if cls.model is None: | ||
| cls.model = Qwen3OmniMoeForConditionalGeneration.from_pretrained( | ||
| "Qwen/Qwen3-Omni-30B-A3B-Instruct", dtype=torch.bfloat16, device_map="auto" | ||
| ) | ||
| return cls.model | ||
|
|
||
| @classmethod | ||
| def tearDownClass(cls): | ||
| if hasattr(cls, "model"): | ||
| del cls.model | ||
| cleanup(torch_device, gc_collect=True) | ||
|
|
||
| def setUp(self): | ||
| cleanup(torch_device, gc_collect=True) | ||
|
|
||
| self.processor = AutoProcessor.from_pretrained( | ||
| "Qwen/Qwen3-Omni-30B-A3B-Instruct", min_pixels=28 * 28, max_pixels=56 * 56 | ||
| ) | ||
|
|
@@ -710,9 +731,7 @@ def tearDown(self): | |
|
|
||
| @slow | ||
| def test_small_model_integration_test(self): | ||
| model = Qwen3OmniMoeForConditionalGeneration.from_pretrained( | ||
| "Qwen/Qwen3-Omni-30B-A3B-Instruct", dtype=torch.bfloat16, device_map="auto" | ||
| ) | ||
| model = self.get_model() | ||
|
|
||
| text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True) | ||
| inputs = self.processor( | ||
|
|
@@ -764,7 +783,7 @@ def test_small_model_integration_test(self): | |
| ) | ||
|
|
||
| EXPECTED_DECODED_TEXT = Expectations({ | ||
| ("cuda", (8, 6)): "user\nWhat's that sound and what kind of dog is this?\nassistant\nBased on the audio and visual information, here is a breakdown of what you're hearing and seeing:-", | ||
| ("cuda", (8, 6)): "user\nWhat's that sound and what kind of dog is this?\nassistant\nBased on the audio and visual information, here is a breakdown of what you're hearing and seeing:\n\n", | ||
| ("rocm", (9, 4)): "system\nYou are a helpful assistant.\nuser\nWhat's that sound and what kind of dog is this?\nassistant\nThe sound is glass shattering, and the dog is a Labrador Retriever.", | ||
| }).get_expectation() # fmt: skip | ||
|
|
||
|
|
@@ -773,9 +792,7 @@ def test_small_model_integration_test(self): | |
|
|
||
| @slow | ||
| def test_small_model_integration_test_batch(self): | ||
| model = Qwen3OmniMoeForConditionalGeneration.from_pretrained( | ||
| "Qwen/Qwen3-Omni-30B-A3B-Instruct", dtype=torch.bfloat16, device_map="auto" | ||
| ) | ||
| model = self.get_model() | ||
| text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True) | ||
| inputs = self.processor( | ||
| text=[text] * 2, | ||
|
|
@@ -791,13 +808,9 @@ def test_small_model_integration_test_batch(self): | |
|
|
||
| EXPECTED_DECODED_TEXTS = Expectations( | ||
| { | ||
| ("cuda", 7) : [ | ||
| "system\nYou are a helpful assistant.\nuser\nWhat's that sound and what kind of dog is this?\nassistant\nThe sound is of glass shattering, and the dog in the picture is a Labrador Retriever", | ||
| "system\nYou are a helpful assistant.\nuser\nWhat's that sound and what kind of dog is this?\nassistant\nThe sound is of glass shattering, and the dog in the picture is a Labrador Retriever", | ||
| ], | ||
| ("cuda", 8): [ | ||
| "user\nWhat's that sound and what kind of dog is this?\nassistant\nBased on the audio and visual information, here is a breakdown of what you're hearing and seeing:\n\n", | ||
| "user\nWhat's that sound and what kind of dog is this?\nassistant\nBased on the audio and visual information, here is a breakdown of what you're hearing and seeing:\n\n" | ||
| "user\nWhat's that sound and what kind of dog is this?\nassistant\nBased on the audio and visual information provided:\n\nThe sound you hear is the distinct, high-pitched", | ||
| "user\nWhat's that sound and what kind of dog is this?\nassistant\nBased on the audio and visual information provided:\n\nThe sound you hear is the distinct, high-pitched", | ||
| ], | ||
| ("rocm", (9, 4)): [ | ||
| "system\nYou are a helpful assistant.\nuser\nWhat's that sound and what kind of dog is this?\nassistant\nThe sound is glass shattering, and the dog is a Labrador Retriever.", | ||
|
|
@@ -811,9 +824,7 @@ def test_small_model_integration_test_batch(self): | |
|
|
||
| @slow | ||
| def test_small_model_integration_test_multiturn(self): | ||
| model = Qwen3OmniMoeForConditionalGeneration.from_pretrained( | ||
| "Qwen/Qwen3-Omni-30B-A3B-Instruct", dtype=torch.bfloat16, device_map="auto" | ||
| ) | ||
| model = self.get_model() | ||
|
|
||
| messages = [ | ||
| self.messages[0], | ||
|
|
@@ -857,9 +868,7 @@ def test_small_model_integration_test_multiturn(self): | |
|
|
||
| @slow | ||
| def test_small_model_integration_test_w_audio(self): | ||
| model = Qwen3OmniMoeForConditionalGeneration.from_pretrained( | ||
| "Qwen/Qwen3-Omni-30B-A3B-Instruct", dtype=torch.bfloat16, device_map="auto" | ||
| ) | ||
| model = self.get_model() | ||
| audio_url = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/guess_age_gender.wav" | ||
|
|
||
| messages = [ | ||
|
|
@@ -894,8 +903,7 @@ def test_small_model_integration_test_w_audio(self): | |
|
|
||
| EXPECTED_DECODED_TEXTS = Expectations( | ||
| { | ||
| ("cuda", 7): "system\nYou are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.\nuser\n\nassistant\nWell, I can try. But it's not always that accurate. I might be able to make", | ||
| ("cuda", 8): "'system\nYou are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.\nuser\n\nassistant\nYes, I can analyze audio inputs to understand spoken content, and I can also make inferences about'", | ||
| ("cuda", 8): "system\nYou are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech.\nuser\n\nassistant\nYes, I can analyze audio inputs to understand spoken content, and I can also process and respond to", | ||
| } | ||
| ) # fmt: skip | ||
| EXPECTED_DECODED_TEXT = EXPECTED_DECODED_TEXTS.get_expectation() | ||
|
|
@@ -906,6 +914,10 @@ def test_small_model_integration_test_w_audio(self): | |
| ) | ||
| self.assertFalse(torch.isnan(output[1]).any().item()) | ||
|
|
||
| # Run this test first because it needs to load the model with `flash_attention_2`. For other tests, we need to keep | ||
| # the loaded model (without FA) in `cls.model`. If this test is not run first, when loading the flash attention | ||
| # model here, there is already a previous loaded model `cls.model` and we will get GPU OOM. | ||
| @run_first | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason we want this to run first? |
||
| @slow | ||
| @require_flash_attn | ||
| @require_torch_accelerator | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comment with reference to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohoh yes, forgot before push
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @ydshieh. Thanks for pushing the fix. I was able to run the test on RTX PRO 6000, & it's running fine without the meta device issue. But incase of A-10 GPU the
device_map="auto"is offloading thetalkermodule to CPU & iiuc from accelerate code, it keeps the parameters ofcpu/diskoffloaded modules asmeta tensors(which is whymodel.talker.deviceis giving "meta" in case ofA10) & only loads the real-weights on to the GPU later just before forward.Since the test ran fine on the big gpu but failing on A10, I think, I can confrim with this fix & that the issue is with how we are using
self.devicein this method. So, Maybe we can add a comment regarding this accelerate behaviour here pointing to this accelerate code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Sai-Suraj-27
a comment is added (a few line below)
I think it's enough with the reference to this PR.