[BugFix] Fix load_weights error when loading HunyuanImage3.0#1598
[BugFix] Fix load_weights error when loading HunyuanImage3.0#1598hsliuustc0106 merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 01276c1bea
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| custom_pos_emb = self.get_pos_emb(custom_pos_emb, position_ids) | ||
|
|
||
| inputs_embeds = self.model.wte(input_ids) | ||
| inputs_embeds = self.model.embed_tokens(input_ids) |
There was a problem hiding this comment.
Avoid calling embed_tokens on non-first PP ranks
forward_call now unconditionally does self.model.embed_tokens(input_ids), but HunyuanImage3Model.__init__ only creates embed_tokens on the first PP rank (or last when tied embeddings); other pipeline-parallel ranks get PPMissingLayer. With pipeline_parallel_size > 1 and default tie_word_embeddings=False, this change makes non-first ranks invoke a missing layer and fail during inference, whereas the previous self.model.wte path existed on every rank.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Avoid calling embed_tokens on non-first PP ranks
forward_callnow unconditionally doesself.model.embed_tokens(input_ids), butHunyuanImage3Model.__init__only createsembed_tokenson the first PP rank (or last when tied embeddings); other pipeline-parallel ranks getPPMissingLayer. Withpipeline_parallel_size > 1and defaulttie_word_embeddings=False, this change makes non-first ranks invoke a missing layer and fail during inference, whereas the previousself.model.wtepath existed on every rank.Useful? React with 👍 / 👎.
Current model do not support PP, so PP.is_first_rank is always true. No need to check for now.
01276c1 to
883a04c
Compare
Move some submodule load weights code of HunyuanImage3Pipeline to AutoWeightsLoader:load_weights, fix weights not initialized error. Signed-off-by: Semmer2 <semmer@live.cn>
883a04c to
86bbf58
Compare
…oject#1598) Signed-off-by: Semmer2 <semmer@live.cn> Signed-off-by: jader <yjader@foxmail.com>
…oject#1598) Signed-off-by: Semmer2 <semmer@live.cn> Signed-off-by: jader <yjader@foxmail.com>
…oject#1598) Signed-off-by: Semmer2 <semmer@live.cn>
### vllm-omni-perf - Source: [PR #1619](vllm-project/vllm-omni#1619) - [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context - Changes: - Bug fix: [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context ### vllm-omni-contrib - Source: [PR #1615](vllm-project/vllm-omni#1615) - [Doc] Fix links in the configuration doc - Changes: - Bug fix: [Doc] Fix links in the configuration doc ### vllm-omni-perf - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-image-gen - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Additions: - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image ### vllm-omni-api - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-serving - Source: [PR #1602](vllm-project/vllm-omni#1602) - [Bugfix] fix kernel error for qwen3-omni - Changes: - Bug fix: [Bugfix] fix kernel error for qwen3-omni ### vllm-omni-perf - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-image-gen - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Additions: - HunyuanImage3 - HunyuanImage3Pipeline - HunyuanImage3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage-3 ### vllm-omni-quantization - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-distributed - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-contrib - Source: [PR #1576](vllm-project/vllm-omni#1576) - 0.16.0 release ### vllm-omni-audio-tts - Source: [PR #1570](vllm-project/vllm-omni#1570) - [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio - Changes: - Bug fix: [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio ### vllm-omni-api - Source: [PR #1566](vllm-project/vllm-omni#1566) - [Bugfix] Import InputPreprocessor into Renderer - Changes: - Bug fix: [Bugfix] Import InputPreprocessor into Renderer ### vllm-omni-perf - Source: [PR #1565](vllm-project/vllm-omni#1565) - [BugFix]: fix a lot of bug - Changes: - Bug fix: [BugFix]: fix a lot of bug ### vllm-omni-contrib - Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts - Changes: - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts ### vllm-omni-audio-tts - Source: [PR #1564](vllm-project/vllm-omni#1564) - [NPU][Bugfix] Align GPU side and recover qwen3-tts - Changes: - Bug fix: [NPU][Bugfix] Align GPU side and recover qwen3-tts ### vllm-omni-perf - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-api - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-quantization - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-distributed - Source: [PR #1562](vllm-project/vllm-omni#1562) - [BugFix] Fix unexpected crash when init OmniDiffusion - Changes: - Bug fix: [BugFix] Fix unexpected crash when init OmniDiffusion ### vllm-omni-api - Source: [PR #1554](vllm-project/vllm-omni#1554) - fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio - Changes: - Bug fix: fix(qwen3-tts): fix Base ICL voice clone producing corrupted audio ### vllm-omni-cicd - Source: [PR #1543](vllm-project/vllm-omni#1543) - [CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage. ### vllm-omni-perf - Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens - Changes: - Bug fix: Fix no embed text spk tokens ### vllm-omni-distributed - Source: [PR #1540](vllm-project/vllm-omni#1540) - Fix no embed text spk tokens - Changes: - Bug fix: Fix no embed text spk tokens ### vllm-omni-perf - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-quantization - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-distributed - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-image-gen - Source: [PR #1538](vllm-project/vllm-omni#1538) - [CI][skip ci]Update H100 image link based on #1518 ### vllm-omni-perf - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving ### vllm-omni-serving - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving ### vllm-omni-cicd - Source: [PR #1534](vllm-project/vllm-omni#1534) - [Debug] Merge vllm pull 35368 ### vllm-omni-contrib - Source: [PR #1530](vllm-project/vllm-omni#1530) - [Docs] update async chunk docs diagram [skip ci] ### vllm-omni-distributed - Source: [PR #1524](vllm-project/vllm-omni#1524) - [BugFix] Restore talker's config - Changes: - Bug fix: [BugFix] Restore talker's config ### vllm-omni-api - Source: [PR #1522](vllm-project/vllm-omni#1522) - [Bugfix] Use uds for zmq address if not set --stage-id - Changes: - New feature: [Bugfix] Use uds for zmq address if not set --stage-id ### vllm-omni-perf - Source: [PR #1521](vllm-project/vllm-omni#1521) - Revert gpu_1 job to use regular image ### vllm-omni-perf - Source: [PR #1518](vllm-project/vllm-omni#1518) - Use pull through cache image for H100 pool ### vllm-omni-perf - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-image-gen - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 - Additions: - num-images-per-prompt ### vllm-omni-quantization - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-distributed - Source: [PR #1515](vllm-project/vllm-omni#1515) - [Bugfix] fix offline text_to_image error from #1009 - Changes: - Bug fix: [Bugfix] fix offline text_to_image error from #1009 ### vllm-omni-api - Source: [PR #1509](vllm-project/vllm-omni#1509) - [Chore] remove unused logger in omni_diffusion (#531) ### vllm-omni-perf - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-quantization - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-distributed - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-contrib - Source: [PR #1505](vllm-project/vllm-omni#1505) - [Doc] Update installation instructions for vllm 0.16.0 ### vllm-omni-video-gen - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-perf - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-api - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-cicd - Source: [PR #1504](vllm-project/vllm-omni#1504) - [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading - Changes: - New feature: [Feature][Wan2.2] Speed up diffusion model startup by multi-thread weight loading ### vllm-omni-contrib - Source: [PR #1500](vllm-project/vllm-omni#1500) - [ROCm] [CI] [Docker] Point to use the latest vLLM v0.16.0 stable version ### vllm-omni-cicd - Source: [PR #1492](vllm-project/vllm-omni#1492) - [Platform] Enable layerwise offload on all hardware ### vllm-omni-image-gen - Source: [PR #1491](vllm-project/vllm-omni#1491) - [CI] Update Dockerfile for vllm-omni CI image and remove obsolete dep… ### vllm-omni-cicd - Source: [PR #1488](vllm-project/vllm-omni#1488) - [XPU][NPU][ROCM] enable cpu_offloading flag for non_cuda ### vllm-omni-audio-tts - Source: [PR #1482](vllm-project/vllm-omni#1482) - [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements - Changes: - Bug fix: [Fix][Chore] Qwen3-TTS Modeling Minor Code Sanity Improvements ### vllm-omni-perf - Source: [PR #1468](vllm-project/vllm-omni#1468) - [BugFix] process request.num_cached_tokens if it equals to the initial value - Changes: - Bug fix: [BugFix] process request.num_cached_tokens if it equals to the initial value ### vllm-omni-audio-tts - Source: [PR #1455](vllm-project/vllm-omni#1455) - [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration - Changes: - Bug fix: [Bugfix] Fix case-sensitive task_type matching in Qwen3TTSModelForGeneration ### vllm-omni-cicd - Source: [PR #1449](vllm-project/vllm-omni#1449) - [Test] Reduce Perf test case and fix modify stage config - Changes: - Bug fix: [Test] Reduce Perf test case and fix modify stage config ### vllm-omni-cicd - Source: [PR #1448](vllm-project/vllm-omni#1448) - [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler - Changes: - Bug fix: [Bugfix] Race condition in MultiprocExecutor when concurent access to Scheduler ### vllm-omni-cicd - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-api - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-contrib - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-audio-tts - Source: [PR #1438](vllm-project/vllm-omni#1438) - [Qwen3TTS][Feat] Streaming output - Changes: - New feature: [Qwen3TTS][Feat] Streaming output ### vllm-omni-cicd - Source: [PR #1435](vllm-project/vllm-omni#1435) - [Doc][Test][Misc] ComfyUI test, more screenshot, and code cleaning ### vllm-omni-video-gen - Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video ### vllm-omni-audio-tts - Source: [PR #1433](vllm-project/vllm-omni#1433) - [Debug] Multi-Request for Qwen 3 Omni use_audio_in_video
### vllm-omni-api - Source: [PR #1724](vllm-project/vllm-omni#1724) - Revert "[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline (#668)" - Changes: - New feature: Revert "[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline (#668)" ### vllm-omni-contrib - Source: [PR #1724](vllm-project/vllm-omni#1724) - Revert "[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline (#668)" - Changes: - New feature: Revert "[Profile] Adding metrics for Diffusion/DiT Single diffusion Pipeline (#668)" ### vllm-omni-api - Source: [PR #1716](vllm-project/vllm-omni#1716) - [Feature]: Add vae-patch-parallel CLI argument in online serving - Changes: - New feature: [Feature]: Add vae-patch-parallel CLI argument in online serving ### vllm-omni-contrib - Source: [PR #1716](vllm-project/vllm-omni#1716) - [Feature]: Add vae-patch-parallel CLI argument in online serving - Changes: - New feature: [Feature]: Add vae-patch-parallel CLI argument in online serving ### vllm-omni-contrib - Source: [PR #1693](vllm-project/vllm-omni#1693) - [skip CI][Docs] Add TTS model developer guide - Changes: - New feature: [skip CI][Docs] Add TTS model developer guide ### vllm-omni-audio-tts - Source: [PR #1688](vllm-project/vllm-omni#1688) - [MiMo-Audio] Bugfix tp lg than 1 - Changes: - Bug fix: [MiMo-Audio] Bugfix tp lg than 1 ### vllm-omni-distributed - Source: [PR #1688](vllm-project/vllm-omni#1688) - [MiMo-Audio] Bugfix tp lg than 1 - Changes: - Bug fix: [MiMo-Audio] Bugfix tp lg than 1 ### vllm-omni-perf - Source: [PR #1688](vllm-project/vllm-omni#1688) - [MiMo-Audio] Bugfix tp lg than 1 - Changes: - Bug fix: [MiMo-Audio] Bugfix tp lg than 1 ### vllm-omni-perf - Source: [PR #1687](vllm-project/vllm-omni#1687) - [BugFix] Return proper HTTP status for ErrorResponse in create_speech - Changes: - Bug fix: [BugFix] Return proper HTTP status for ErrorResponse in create_speech ### vllm-omni-distributed - Source: [PR #1687](vllm-project/vllm-omni#1687) - [BugFix] Return proper HTTP status for ErrorResponse in create_speech - Changes: - Bug fix: [BugFix] Return proper HTTP status for ErrorResponse in create_speech ### vllm-omni-api - Source: [PR #1687](vllm-project/vllm-omni#1687) - [BugFix] Return proper HTTP status for ErrorResponse in create_speech - Changes: - Bug fix: [BugFix] Return proper HTTP status for ErrorResponse in create_speech - Additions: - `/v1/audio/speech` ### vllm-omni-quantization - Source: [PR #1687](vllm-project/vllm-omni#1687) - [BugFix] Return proper HTTP status for ErrorResponse in create_speech - Changes: - Bug fix: [BugFix] Return proper HTTP status for ErrorResponse in create_speech ### vllm-omni-cicd - Source: [PR #1683](vllm-project/vllm-omni#1683) - [CI] Remove high concurrency tests before issue #1374 fixed. - Changes: - Bug fix: [CI] Remove high concurrency tests before issue #1374 fixed. ### vllm-omni-audio-tts - Source: [PR #1678](vllm-project/vllm-omni#1678) - Add non-async chunk support for Qwen3-TTS - Changes: - New feature: Add non-async chunk support for Qwen3-TTS ### vllm-omni-cicd - Source: [PR #1678](vllm-project/vllm-omni#1678) - Add non-async chunk support for Qwen3-TTS - Changes: - New feature: Add non-async chunk support for Qwen3-TTS ### vllm-omni-cicd - Source: [PR #1677](vllm-project/vllm-omni#1677) - Replace hard-coded cuda generator with current_omni_platform.device_type ### vllm-omni-perf - Source: [PR #1677](vllm-project/vllm-omni#1677) - Replace hard-coded cuda generator with current_omni_platform.device_type ### vllm-omni-serving - Source: [PR #1675](vllm-project/vllm-omni#1675) - [Misc] remove logits_processor_pattern this field, because vllm have … ### vllm-omni-cicd - Source: [PR #1666](vllm-project/vllm-omni#1666) - [Cleanup] Move cosyvoice3 tests to model subdirectory ### vllm-omni-audio-tts - Source: [PR #1664](vllm-project/vllm-omni#1664) - [Bugfix] Fix all-silence TTS output: use float32 for speech tokenizer decoder - Changes: - Bug fix: [Bugfix] Fix all-silence TTS output: use float32 for speech tokenizer decoder ### vllm-omni-cicd - Source: [PR #1664](vllm-project/vllm-omni#1664) - [Bugfix] Fix all-silence TTS output: use float32 for speech tokenizer decoder - Changes: - Bug fix: [Bugfix] Fix all-silence TTS output: use float32 for speech tokenizer decoder ### vllm-omni-distributed - Source: [PR #1656](vllm-project/vllm-omni#1656) - [Optimize][Qwen3-Omni] Reduce inter-packet latency in async chunk ### vllm-omni-contrib - Source: [PR #1656](vllm-project/vllm-omni#1656) - [Optimize][Qwen3-Omni] Reduce inter-packet latency in async chunk ### vllm-omni-quantization - Source: [PR #1652](vllm-project/vllm-omni#1652) - [UX] Add progress bar for diffusion models - Changes: - New feature: [UX] Add progress bar for diffusion models ### vllm-omni-perf - Source: [PR #1652](vllm-project/vllm-omni#1652) - [UX] Add progress bar for diffusion models - Changes: - New feature: [UX] Add progress bar for diffusion models ### vllm-omni-distributed - Source: [PR #1651](vllm-project/vllm-omni#1651) - docs: Announce vllm-omni-skills community project ### vllm-omni-quantization - Source: [PR #1651](vllm-project/vllm-omni#1651) - docs: Announce vllm-omni-skills community project ### vllm-omni-perf - Source: [PR #1651](vllm-project/vllm-omni#1651) - docs: Announce vllm-omni-skills community project ### vllm-omni-contrib - Source: [PR #1649](vllm-project/vllm-omni#1649) - [Misc] update wechat ### vllm-omni-perf - Source: [PR #1642](vllm-project/vllm-omni#1642) - [chore] add _repeated_blocks for regional compilation support - Changes: - New feature: [chore] add _repeated_blocks for regional compilation support ### vllm-omni-api - Source: [PR #1641](vllm-project/vllm-omni#1641) - [Bugfix] Add TTS request validation to prevent engine crashes - Changes: - New feature: [Bugfix] Add TTS request validation to prevent engine crashes ### vllm-omni-cicd - Source: [PR #1641](vllm-project/vllm-omni#1641) - [Bugfix] Add TTS request validation to prevent engine crashes - Changes: - New feature: [Bugfix] Add TTS request validation to prevent engine crashes ### vllm-omni-image-gen - Source: [PR #1640](vllm-project/vllm-omni#1640) - [FP8 Quantization] Add FP8 quantization support for Flux transformer - Changes: - New feature: [FP8 Quantization] Add FP8 quantization support for Flux transformer - Additions: - text-to-image - Text-to-Image - Flux ### vllm-omni-quantization - Source: [PR #1640](vllm-project/vllm-omni#1640) - [FP8 Quantization] Add FP8 quantization support for Flux transformer - Changes: - New feature: [FP8 Quantization] Add FP8 quantization support for Flux transformer - Additions: - FP8 support or improvements ### vllm-omni-contrib - Source: [PR #1640](vllm-project/vllm-omni#1640) - [FP8 Quantization] Add FP8 quantization support for Flux transformer - Changes: - New feature: [FP8 Quantization] Add FP8 quantization support for Flux transformer ### vllm-omni-perf - Source: [PR #1640](vllm-project/vllm-omni#1640) - [FP8 Quantization] Add FP8 quantization support for Flux transformer - Changes: - New feature: [FP8 Quantization] Add FP8 quantization support for Flux transformer ### vllm-omni-contrib - Source: [PR #1631](vllm-project/vllm-omni#1631) - [BugFix] Fix LongCat Sequence Parallelism / Small Cleanup - Changes: - Bug fix: [BugFix] Fix LongCat Sequence Parallelism / Small Cleanup ### vllm-omni-cicd - Source: [PR #1628](vllm-project/vllm-omni#1628) - [Test][Qwen3-Omni]Modify Qwen3-Omni benchmark test cases ### vllm-omni-perf - Source: [PR #1628](vllm-project/vllm-omni#1628) - [Test][Qwen3-Omni]Modify Qwen3-Omni benchmark test cases ### vllm-omni-perf - Source: [PR #1619](vllm-project/vllm-omni#1619) - [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context - Changes: - Bug fix: [Bugfix] Fix Qwen3-TTS code predictor crash due to missing vLLM config context ### vllm-omni-perf - Source: [PR #1617](vllm-project/vllm-omni#1617) - [Refactor][Perf] Qwen3-TTS: re-prefill Code Predictor with torch.compile + enable Code2Wav decoder CUDA Graph - Changes: - Performance improvement: [Refactor][Perf] Qwen3-TTS: re-prefill Code Predictor with torch.compile + enable Code2Wav decoder CUDA Graph ### vllm-omni-contrib - Source: [PR #1615](vllm-project/vllm-omni#1615) - [Doc] Fix links in the configuration doc - Changes: - Bug fix: [Doc] Fix links in the configuration doc ### vllm-omni-audio-tts - Source: [PR #1614](vllm-project/vllm-omni#1614) - perf: replace per-element .item() GPU syncs with batch .tolist() in TTS code predictor - Changes: - Performance improvement: perf: replace per-element .item() GPU syncs with batch .tolist() in TTS code predictor ### vllm-omni-perf - Source: [PR #1614](vllm-project/vllm-omni#1614) - perf: replace per-element .item() GPU syncs with batch .tolist() in TTS code predictor - Changes: - Performance improvement: perf: replace per-element .item() GPU syncs with batch .tolist() in TTS code predictor ### vllm-omni-image-gen - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Additions: - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image - GLM-Image ### vllm-omni-api - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-perf - Source: [PR #1609](vllm-project/vllm-omni#1609) - [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation - Changes: - Bug fix: [Bugfix] Fix filepath resolution for model with subdir and GLM-Image generation ### vllm-omni-contrib - Source: [PR #1604](vllm-project/vllm-omni#1604) - [Model]: support Helios from ByteDance ### vllm-omni-perf - Source: [PR #1604](vllm-project/vllm-omni#1604) - [Model]: support Helios from ByteDance ### vllm-omni-serving - Source: [PR #1602](vllm-project/vllm-omni#1602) - [Bugfix] fix kernel error for qwen3-omni - Changes: - Bug fix: [Bugfix] fix kernel error for qwen3-omni ### vllm-omni-distributed - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-image-gen - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Additions: - HunyuanImage3 - HunyuanImage3Pipeline - HunyuanImage3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage-3 - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage3Pipeline - HunyuanImage-3 ### vllm-omni-quantization - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-perf - Source: [PR #1598](vllm-project/vllm-omni#1598) - [BugFix] Fix load_weights error when loading HunyuanImage3.0 - Changes: - Bug fix: [BugFix] Fix load_weights error when loading HunyuanImage3.0 ### vllm-omni-audio-tts - Source: [PR #1583](vllm-project/vllm-omni#1583) - [Feat][Qwen3TTS] reduce TTFA with flexible initial phase - Changes: - New feature: [Feat][Qwen3TTS] reduce TTFA with flexible initial phase ### vllm-omni-api - Source: [PR #1583](vllm-project/vllm-omni#1583) - [Feat][Qwen3TTS] reduce TTFA with flexible initial phase - Changes: - New feature: [Feat][Qwen3TTS] reduce TTFA with flexible initial phase ### vllm-omni-cicd - Source: [PR #1583](vllm-project/vllm-omni#1583) - [Feat][Qwen3TTS] reduce TTFA with flexible initial phase - Changes: - New feature: [Feat][Qwen3TTS] reduce TTFA with flexible initial phase ### vllm-omni-contrib - Source: [PR #1583](vllm-project/vllm-omni#1583) - [Feat][Qwen3TTS] reduce TTFA with flexible initial phase - Changes: - New feature: [Feat][Qwen3TTS] reduce TTFA with flexible initial phase ### vllm-omni-api - Source: [PR #1579](vllm-project/vllm-omni#1579) - [1/N][Refactor] Clean up dead code in output processor ### vllm-omni-serving - Source: [PR #1579](vllm-project/vllm-omni#1579) - [1/N][Refactor] Clean up dead code in output processor ### vllm-omni-distributed - Source: [PR #1578](vllm-project/vllm-omni#1578) - [Feature][Bagel] Add CFG parallel mode - Changes: - New feature: [Feature][Bagel] Add CFG parallel mode ### vllm-omni-cicd - Source: [PR #1578](vllm-project/vllm-omni#1578) - [Feature][Bagel] Add CFG parallel mode - Changes: - New feature: [Feature][Bagel] Add CFG parallel mode ### vllm-omni-perf - Source: [PR #1578](vllm-project/vllm-omni#1578) - [Feature][Bagel] Add CFG parallel mode - Changes: - New feature: [Feature][Bagel] Add CFG parallel mode ### vllm-omni-contrib - Source: [PR #1576](vllm-project/vllm-omni#1576) - 0.16.0 release ### vllm-omni-audio-tts - Source: [PR #1570](vllm-project/vllm-omni#1570) - [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio - Changes: - Bug fix: [bugfix] Fix unexpected argument 'is_finished' in function llm2code2wav_async_chunk of mimo-audio ### vllm-omni-api - Source: [PR #1566](vllm-project/vllm-omni#1566) - [Bugfix] Import InputPreprocessor into Renderer - Changes: - Bug fix: [Bugfix] Import InputPreprocessor into Renderer ### vllm-omni-distributed - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-quantization - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-perf - Source: [PR #1539](vllm-project/vllm-omni#1539) - [Debug] Enable curl retry aligned with openai ### vllm-omni-image-gen - Source: [PR #1537](vllm-project/vllm-omni#1537) - [NPU] [Features] [Bugfix] Support mindiesd adaln - Changes: - New feature: [NPU] [Features] [Bugfix] Support mindiesd adaln - Additions: - mindiesd - mindiesd - Qwen-Image-Edit-2509 - mindiesd - mindiesd - mindiesd - mindiesd ### vllm-omni-perf - Source: [PR #1537](vllm-project/vllm-omni#1537) - [NPU] [Features] [Bugfix] Support mindiesd adaln - Changes: - New feature: [NPU] [Features] [Bugfix] Support mindiesd adaln ### vllm-omni-serving - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving ### vllm-omni-perf - Source: [PR #1536](vllm-project/vllm-omni#1536) - [Bugfix] Fix transformers 5.x compat issues in online TTS serving - Changes: - Bug fix: [Bugfix] Fix transformers 5.x compat issues in online TTS serving
Move some submodule load weights code of HunyuanImage3Pipeline to AutoWeightsLoader:load_weights, fix weights not initialized error.
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
DiffusersPipelineLoader:load_weights added strictly weights load gap verification. Which reports bug when loading HunyuanImage3.0 mode. Move some submodule loading code to AutoWeightsLoader:load_weights to fix this bug.
Test Plan
python examples/offline_inference/text_to_image/text_to_image.py --mode /data/HunyuanImage-3.0/ --prompt "A brown and white dog is running on the grass" --output output_image.png --num-inference-steps 50 --guidance-scale 5.0 --tensor-parallel-size 8 --seed 1234
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)