[Config] Add HunyuanImage3 deploy configs#3172
Conversation
d7490f3 to
5030294
Compare
|
@Fishermanykx It would be great if we can accelerate this pr so that we can get it merged before 0.20.0. Due to limited bandwidth and gpu resources, i will close #2989 and let thsi pr to take over. cc @hsliuustc0106 @TaffyOfficial @Bounty-hunter . |
7dd4530 to
f5d0749
Compare
WIP, testing. I think it will be ready today |
f7a1ac7 to
e6fb220
Compare
|
Ready for review now, PTAL @hsliuustc0106 @gcanlin @Bounty-hunter @TaffyOfficial @lishunyang12 @xuechendi |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 764f1e3ac2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if model_type and model_type in _PIPELINE_REGISTRY: | ||
| return cls._create_from_registry(model_type, cli_overrides, deploy_config_path) |
There was a problem hiding this comment.
Honor deploy pipeline before model_type registry match
When --deploy-config points to a topology-specific file (for example hunyuan_image3_ar.yaml or hunyuan_image3_dit.yaml), this method still returns immediately on the auto-detected HF model_type and never gives the deploy file’s pipeline field a chance to select the intended registry entry. For HunyuanImage3 this means explicit AR-only/DiT-only deploy configs are ignored whenever the model reports hunyuan_image3, so users get the wrong stage topology at runtime.
Useful? React with 👍 / 👎.
| owns_tokenizer=False, | ||
| requires_multimodal_data=True, | ||
| model_arch=_HUNYUAN_IMAGE3_MODEL_ARCH, | ||
| engine_output_type="latent", |
There was a problem hiding this comment.
Remove latent output mode from AR-only text pipeline
The AR-only pipeline is declared as a text final-output stage, but engine_output_type is set to "latent". In HunyuanImage3, non-text engine_output_type switches the model out of comprehension behavior, changing token constraints and generation flow; this can degrade or break img2text/text2text outputs compared with the previous AR-only configs that relied on text/default output mode.
Useful? React with 👍 / 👎.
| distributed_executor_backend: mp | ||
| enable_prefix_caching: false | ||
| async_chunk: false | ||
|
|
There was a problem hiding this comment.
Could we remove these two arguments?
There was a problem hiding this comment.
only async_chunk: false is need in single stage deployment, others are removed now
| hf_overrides: | ||
| rope_parameters: | ||
| mrope_section: [0, 32, 32] | ||
| rope_type: default | ||
| final_output: true | ||
| final_output_type: text | ||
| requires_multimodal_data: true |
There was a problem hiding this comment.
It's wired that there still exists these static fields. It doesn't like deploy parameters. Is it possible to remove it?
There was a problem hiding this comment.
I cannot find out ways to remove these fields. final_output, final_output_type and requires_multimodal_data are runtime metadata, and they are now not included in cli. Now we can only control these args in yaml
There was a problem hiding this comment.
I checked the other migrated models as well. These fields are consistently defined in pipeline.py via StagePipelineConfig, rather than in deploy YAML:
requires_multimodal_datafinal_outputfinal_output_typemodel_stageengine_output_type
For example, this is how qwen3_omni, qwen2_5_omni, qwen3_tts, glm_image, and others are structured.
| response_address=response_address, | ||
| ) | ||
| complete_diffusion_handshake(proc, handshake_address) | ||
| complete_diffusion_handshake(proc, handshake_address, stage_init_timeout) |
There was a problem hiding this comment.
Any reason to add this parameter?
There was a problem hiding this comment.
deleted, these are add when I'm trying to use one yaml to control different stages. forgot to delete these lines
| `--stage-configs-path` are both omitted: | ||
|
|
||
| Get into the hunyuan_image3 folder: | ||
| | `--modality` | `mode` passed to Omni | Default deploy | |
There was a problem hiding this comment.
I don't see how I can use this modality field online.
There was a problem hiding this comment.
It's only used in offline mode. Online controls modality via different request fields (for example, t2t in chat/completions, t2i in images/generations, etc.)
bb045f3 to
958f5b6
Compare
| f"Please ensure the model has proper configuration files with 'model_type' field" | ||
| ) | ||
|
|
||
| default_config_path = current_omni_platform.get_default_stage_config_path() |
There was a problem hiding this comment.
why need to change the priority. you have rm hunyuan yaml from stage_config, if os.path.exists(complete_config_path): will return false.
| return stage_configs | ||
|
|
||
|
|
||
| def _normalize_mode_stage_overrides(stage_overrides: Any) -> list[dict[str, Any]]: |
There was a problem hiding this comment.
_normalize_mode_stage_overrides() and _apply_mode_stage_overrides() introduce a new YAML-side capability: mode-specific post-resolution mutation of stage config.
This is different from the original --stage-overrides mechanism, which was meant for CLI-side runtime overrides over YAML/default config.
e4b01b6 to
607e005
Compare
| stages: [0, 1] | ||
| stage_overrides: | ||
| 0: | ||
| requires_multimodal_data: false |
There was a problem hiding this comment.
requires_multimodal_data appears to be treated as a pipeline-structural field rather than a deploy-time field. In merge_pipeline_deploy(), the runtime value is always taken from StagePipelineConfig:
runtime["requires_multimodal_data"] = ps.requires_multimodal_dataSo even if the same field is present in deploy YAML, the resolved runtime here is sourced from ps, not from the deploy config.
| - mode: text-to-text | ||
| stages: [0] | ||
| stage_overrides: | ||
| 0: | ||
| requires_multimodal_data: false |
There was a problem hiding this comment.
I think there are two issues here:
--stage-overridesis a CLI-side per-stage runtime override mechanism, which makes sense for deployment knobs. It does not seem like something that should be embedded into YAML semantics.modeswas already used in the legacystage_config_pathflow to distinguish different modes by selecting different stages. That feels inconsistent with how we represent structural or mode-level differences in other models, where we usually use separate pipeline variants instead, e.g.qwen2_5_omnivsqwen2_5_omni_thinker_only,bagelvsbagel_think, andhunyuan_image3/_ar/_dit.
68b36eb to
b22796c
Compare
1c7bd85 to
f23d3af
Compare
f23d3af to
e16c4c6
Compare
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
This reverts commit dac00c4. Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
d6c42fb to
2b44288
Compare
Signed-off-by: KexiongYu <yukexiong1@huawei.com> Signed-off-by: Y. Fisher <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com> Signed-off-by: Y. Fisher <yukexiong1@huawei.com>
Summary
This PR adds deploy-config support for HunyuanImage3 and registers the model in the pipeline registry.
It includes:
hunyuan_image3.yamlAR + DiT deploy config with platform overrideshunyuan_image3_ar.yamland standalonehunyuan_image3_dit.yamlconfigs for single-stage usagecli_explicit_keysusage and diffusion handshake timeout propagationMotivation
HunyuanImage3 previously relied on scattered stage configs, which made platform-specific deployment harder to reason about and made independent AR/DiT startup awkward. The new deploy configs consolidate the runtime knobs under
vllm_omni/deploy/while preserving a full AR + DiT path and adding single-stage configs where needed.Design
HunyuanImage3 unified deploy config
This PR migrates HunyuanImage3 deployment settings into unified deploy YAMLs under
vllm_omni/deploy/and removes the duplicated legacy configs frommodel_executor/stage_configs/and platform-specificstage_configs/.HunyuanImage3 currently has three practical deployment topologies, so this PR keeps three explicit deploy YAMLs:
hunyuan_image3.yaml: full AR + DiT pipeline fortext-to-imageandimage-editing.hunyuan_image3_ar.yaml: AR-only single-stage pipeline forimage-to-textandtext-to-text.hunyuan_image3_dit.yaml: DiT-only single-stage pipeline for standalone diffusion execution.The reason for using three YAMLs instead of relying on CLI flags to start only stage 0 or stage 1 is that the current stage topology is strongly tied to the pipeline definition. In the full AR + DiT pipeline, stage ids and inter-stage dependencies assume both stages exist. Starting only
stage_id=1from the full pipeline is ambiguous because stage 1 expects upstream AR inputs and stage metadata from the two-stage topology. Similarly, usingstage_id=0for AR-only text output needs different final-output semantics from stage 0 in the full AR + DiT pipeline.By defining
hunyuan_image3_ar.yamlandhunyuan_image3_dit.yamlas separate single-stage deploy configs, each topology has a self-contained pipeline contract:This avoids overloading CLI stage selection with topology changes and keeps offline/serving behavior deterministic.
Stop token id calculation
This PR makes AR stop-token calculation follow the same task key that is used to build the offline HunyuanImage3 prompt. In
end2end.py,--modalityfirst determines the base task type (t2i,it2i,i2t,t2t), while--bot-taskonly describes the assistant behavior (think,recaption,vanilla, orauto). The example combines them into a single prompt task key, such ast2i_thinkorit2i_recaption, validates it against the shared_TASK_PRESETS, and then passes that resolved task intoprompt_utils.resolve_stop_token_ids().prompt_utilsis therefore the single source of truth for both prompt construction and stop-token selection. Each task preset records the system prompt mode, bot task, and optional trigger tag.resolve_stop_token_ids()always includes the HunyuanImage3 EOS token, then derives any additional AR stop token from the resolved task preset: tasks with a trigger tag stop on that trigger token, while plain comprehension tasks only use EOS. This keeps stop-token behavior aligned with the exact prompt format assembled bybuild_prompt_tokens(), and avoids duplicating modality/bot-task branching logic in the offline example.Only offline is supported now.
Future follow-ups
Some HunyuanImage3 deployment variants are intentionally left as future follow-ups:
requires_multimodal_data/ final-output metadata, we should either add a dedicated deploy YAML for that topology or carefully extend the mode override whitelist instead of allowing arbitrary overrides.Validation
Tested on 4xAscend NPUs
offline AR single stage
offline DiT single stage
AR+DiT relies on #2949
AR+DiT on NPU relies on #3070 additionally
Test Result
offline AR
offline AR grabage output will be fixed by #3243
offline DiT
offline AR+DiT with kv reuse (shm connector)
Runtime validation on target GPU/XPU hardware is still needed.