-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[BugFIX] enable Hunyuan image3 with stage selection among text_to_image/image_to_text #1826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Gaohan123
merged 7 commits into
vllm-project:main
from
xuechendi:hunyuan_image3_fix_default_config
Mar 24, 2026
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
8ce425f
fix AR path for xpu
xuechendi 0b72f26
Enable a new config - mode - to decide stage selection
xuechendi 30571a7
Merge branch 'main' into hunyuan_image3_fix_default_config
xuechendi 16f50b8
Merge remote-tracking branch 'origin/main' into hunyuan_image3_fix_de…
xuechendi c300c6d
update config to work with #1935
xuechendi 76c691e
Merge branch 'main' into hunyuan_image3_fix_default_config
Gaohan123 f1332e1
Merge branch 'main' into hunyuan_image3_fix_default_config
Gaohan123 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -2,6 +2,11 @@ | |
| # Stage 0: AR Model (vLLM implementation) | ||
|
|
||
| # The following config has been verified on 8x L40S-48G GPU. | ||
| modes: | ||
| - mode: text-to-image | ||
| stages: [1] | ||
| - mode: image-to-text | ||
| stages: [0] | ||
| stage_args: | ||
| - stage_id: 0 | ||
| stage_type: llm # Use llm stage type for AR stages | ||
|
|
@@ -37,6 +42,40 @@ stage_args: | |
| seed: 42 | ||
| detokenize: True | ||
| repetition_penalty: 1.1 | ||
| - stage_id: 1 | ||
| stage_type: diffusion | ||
| runtime: | ||
| process: true | ||
| devices: "0,1,2,3,4,5,6,7" | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @Bounty-hunter , that is because of the initial config is using all 8 cards. |
||
| max_batch_size: 1 | ||
| engine_args: | ||
| model_stage: diffusion | ||
| gpu_memory_utilization: 0.9 | ||
| enforce_eager: true | ||
| engine_output_type: image | ||
| distributed_executor_backend: "mp" | ||
| enable_prefix_caching: false | ||
| max_num_batched_tokens: 32768 | ||
| vae_use_slicing: false | ||
| vae_use_tiling: false | ||
| cache_backend: null | ||
| cache_config: null | ||
| enable_cache_dit_summary: false | ||
| parallel_config: | ||
| pipeline_parallel_size: 1 | ||
| data_parallel_size: 1 | ||
| tensor_parallel_size: 8 | ||
| enable_expert_parallel: false | ||
| sequence_parallel_size: 1 | ||
| ulysses_degree: 1 | ||
| ring_degree: 1 | ||
| cfg_parallel_size: 1 | ||
| vae_patch_parallel_size: 1 | ||
| use_hsdp: false | ||
| hsdp_shard_size: -1 | ||
| hsdp_replicate_size: 1 | ||
| final_output: true | ||
| final_output_type: image | ||
|
|
||
| # Top-level runtime config (concise): default windows and stage edges | ||
| runtime: | ||
|
|
||
83 changes: 83 additions & 0 deletions
83
vllm_omni/platforms/xpu/stage_configs/hunyuan_image_3_moe.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| # Stage config for running Hunyuan-Image3.0 with architecture of OmniLLM. | ||
| # Stage 0: AR Model (vLLM implementation) | ||
|
|
||
| # The following config has been verified on 8x Max 1550 GPU. | ||
| modes: | ||
| - mode: text-to-image | ||
| stages: [1] | ||
| - mode: image-to-text | ||
| stages: [0] | ||
| stage_args: | ||
| - stage_id: 0 | ||
| stage_type: llm # Use llm stage type to launch OmniLLM | ||
| runtime: | ||
| process: true # Run this stage in a separate process | ||
| devices: "0,1,2,3,4,5,6,7" # Visible devices for this stage | ||
| max_batch_size: 1 | ||
| engine_args: | ||
| model_stage: AR | ||
| model_arch: HunyuanImage3ForCausalMM | ||
| worker_cls: vllm_omni.platforms.xpu.worker.xpu_ar_worker.XPUARWorker | ||
| scheduler_cls: vllm_omni.core.sched.omni_ar_scheduler.OmniARScheduler | ||
| gpu_memory_utilization: 0.95 | ||
| enforce_eager: true # Now we only support eager mode | ||
| trust_remote_code: true | ||
| engine_output_type: latent | ||
| enable_prefix_caching: false | ||
| max_num_batched_tokens: 32784 | ||
| tensor_parallel_size: 8 | ||
| pipeline_parallel_size: 1 | ||
| enable_expert_parallel: true | ||
| quantization: "fp8" | ||
| is_comprehension: true | ||
| final_output: true | ||
| final_output_type: text | ||
| default_sampling_params: | ||
| temperature: 0.0 | ||
| top_p: 1.0 | ||
| top_k: -1 | ||
| max_tokens: 2048 | ||
| seed: 42 | ||
| detokenize: True | ||
| repetition_penalty: 1.1 | ||
| - stage_id: 1 | ||
| stage_type: diffusion | ||
| runtime: | ||
| process: true | ||
| devices: "0,1,2,3,4,5,6,7" | ||
| max_batch_size: 1 | ||
| engine_args: | ||
| model_stage: diffusion | ||
| gpu_memory_utilization: 0.9 | ||
| enforce_eager: true | ||
| engine_output_type: image | ||
| distributed_executor_backend: "mp" | ||
| enable_prefix_caching: false | ||
| vae_use_slicing: false | ||
| vae_use_tiling: false | ||
| cache_backend: null | ||
| cache_config: null | ||
| enable_cache_dit_summary: false | ||
| quantization: "fp8" | ||
| parallel_config: | ||
| pipeline_parallel_size: 1 | ||
| data_parallel_size: 1 | ||
| tensor_parallel_size: 8 | ||
| enable_expert_parallel: true | ||
| sequence_parallel_size: 1 | ||
| ulysses_degree: 1 | ||
| ring_degree: 1 | ||
| cfg_parallel_size: 1 | ||
| vae_patch_parallel_size: 1 | ||
| use_hsdp: false | ||
| hsdp_shard_size: -1 | ||
| hsdp_replicate_size: 1 | ||
| final_output: true | ||
| final_output_type: image | ||
|
|
||
| # Top-level runtime config (concise): default windows and stage edges | ||
| runtime: | ||
| enabled: true | ||
| defaults: | ||
| window_size: -1 # Simplified: trigger downstream only after full upstream completion | ||
| max_inflight: 1 # Simplified: process serially within each stage |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need this mapping here? @Semmer2 @lishunyang12 @nussejzz PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we load both stages and let workload to decide which stage to go.
Device memory utilization gets doubled.
This PR suggested a simple fix by using modes to decide if uses want to go text-to-image / image-to-text.
I am thinking a more aggressive fix by sharing same weight for different stages, if that makes sense, I can init a RFC and have some discussion on that?