Skip to content

[Config] Add HunyuanImage3 deploy configs#3172

Merged
hsliuustc0106 merged 40 commits into
vllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_unified_deploy
May 11, 2026
Merged

[Config] Add HunyuanImage3 deploy configs#3172
hsliuustc0106 merged 40 commits into
vllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_unified_deploy

Conversation

@Fishermanykx
Copy link
Copy Markdown
Contributor

@Fishermanykx Fishermanykx commented Apr 27, 2026

Summary

This PR adds deploy-config support for HunyuanImage3 and registers the model in the pipeline registry.

It includes:

  • a unified hunyuan_image3.yaml AR + DiT deploy config with platform overrides
  • split hunyuan_image3_ar.yaml and standalone hunyuan_image3_dit.yaml configs for single-stage usage
  • HunyuanImage3 pipeline registry entries for the full, AR-only, and DiT-only topologies
  • deploy YAML fallback handling so explicit deploy configs can select the intended pipeline
  • small rebase fixes for stale cli_explicit_keys usage and diffusion handshake timeout propagation

Motivation

HunyuanImage3 previously relied on scattered stage configs, which made platform-specific deployment harder to reason about and made independent AR/DiT startup awkward. The new deploy configs consolidate the runtime knobs under vllm_omni/deploy/ while preserving a full AR + DiT path and adding single-stage configs where needed.

Design

HunyuanImage3 unified deploy config

This PR migrates HunyuanImage3 deployment settings into unified deploy YAMLs under vllm_omni/deploy/ and removes the duplicated legacy configs from model_executor/stage_configs/ and platform-specific stage_configs/.

HunyuanImage3 currently has three practical deployment topologies, so this PR keeps three explicit deploy YAMLs:

  • hunyuan_image3.yaml: full AR + DiT pipeline for text-to-image and image-editing.
  • hunyuan_image3_ar.yaml: AR-only single-stage pipeline for image-to-text and text-to-text.
  • hunyuan_image3_dit.yaml: DiT-only single-stage pipeline for standalone diffusion execution.

The reason for using three YAMLs instead of relying on CLI flags to start only stage 0 or stage 1 is that the current stage topology is strongly tied to the pipeline definition. In the full AR + DiT pipeline, stage ids and inter-stage dependencies assume both stages exist. Starting only stage_id=1 from the full pipeline is ambiguous because stage 1 expects upstream AR inputs and stage metadata from the two-stage topology. Similarly, using stage_id=0 for AR-only text output needs different final-output semantics from stage 0 in the full AR + DiT pipeline.

By defining hunyuan_image3_ar.yaml and hunyuan_image3_dit.yaml as separate single-stage deploy configs, each topology has a self-contained pipeline contract:

  • AR-only uses stage 0 as the final text output stage.
  • DiT-only uses stage 0 as the final image output stage.
  • Full AR + DiT keeps stage 0 as an intermediate AR stage and stage 1 as the final image stage.

This avoids overloading CLI stage selection with topology changes and keeps offline/serving behavior deterministic.

Stop token id calculation

This PR makes AR stop-token calculation follow the same task key that is used to build the offline HunyuanImage3 prompt. In end2end.py, --modality first determines the base task type (t2i, it2i, i2t, t2t), while --bot-task only describes the assistant behavior (think, recaption, vanilla, or auto). The example combines them into a single prompt task key, such as t2i_think or it2i_recaption, validates it against the shared _TASK_PRESETS, and then passes that resolved task into prompt_utils.resolve_stop_token_ids().

prompt_utils is therefore the single source of truth for both prompt construction and stop-token selection. Each task preset records the system prompt mode, bot task, and optional trigger tag. resolve_stop_token_ids() always includes the HunyuanImage3 EOS token, then derives any additional AR stop token from the resolved task preset: tasks with a trigger tag stop on that trigger token, while plain comprehension tasks only use EOS. This keeps stop-token behavior aligned with the exact prompt format assembled by build_prompt_tokens(), and avoids duplicating modality/bot-task branching logic in the offline example.

Only offline is supported now.

Future follow-ups

Some HunyuanImage3 deployment variants are intentionally left as future follow-ups:

  • Quantization variants may need separate deploy YAMLs if they require different topology, device layout, worker classes, or platform-specific parallel configs. If quantization is only a simple per-stage flag, it can remain as a deploy override, but validated production recipes should be explicit.
  • AR-to-DiT KV reuse is not kept in this PR. When KV reuse is reintroduced, the deploy config should be extended with the required connector and KV transfer settings, such as sender/receiver connector config and any transfer criteria.
  • If more HunyuanImage3 serving modes need different engine behavior beyond requires_multimodal_data / final-output metadata, we should either add a dedicated deploy YAML for that topology or carefully extend the mode override whitelist instead of allowing arbitrary overrides.
  • Online serving docs and examples can be expanded after the deploy YAMLs are validated across CUDA/NPU/XPU.

Validation

Tested on 4xAscend NPUs
offline AR single stage

deploy_cfg="/home/fisher/vllm-worksapce/vllm-omni/vllm_omni/deploy/hunyuan_image3_ar.yaml"
image_path="/home/fisher/vllm-worksapce/vllm-omni/examples/offline_inference/hunyuan_image3/output_0_0.png"

python end2end.py --model $model \
                  --image-path $image_path\
                  --prompts "Describe the content of the picture." \
                  --deploy-config $deploy_cfg \
                  --log-stats \
                  --modality "img2text" \
                  --enforce-eager

offline DiT single stage

deploy_cfg="/home/fisher/vllm-worksapce/vllm-omni/vllm_omni/deploy/hunyuan_image3_dit.yaml"

python "${SCRIPT_DIR}/end2end.py" \
                  --model $model \
                  --prompts \
                   "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style."\
                  --deploy-config $deploy_cfg \
                  --log-stats \
                  --modality "text2img" \
                  --guidance-scale 1.0 \
                  --seed 42

AR+DiT relies on #2949
AR+DiT on NPU relies on #3070 additionally

Test Result

offline AR

[Output] Text:
The image features a woman sitting on an ornate, red velvet armchair. She is wearing a dark teal or blue sleeveless dress. The scene is characterized by dramatic, high-contrast lighting, with strong sunlight casting distinct, dappled shadows across her face, body, and the background wall. The background appears to be an interior with patterned wallpaper, partially obscured by the shadows. The overall mood is contemplative and artistic, emphasized by the interplay of light and shadow.

offline AR grabage output will be fixed by #3243

offline DiT

output_0_0

offline AR+DiT with kv reuse (shm connector)

INFO 05-11 17:45:15 [hunyuan_image3_transformer.py:2794] Handling AR KV reuse with positive_reuse_len=6606, negative_reuse_len=6209
INFO 05-11 17:45:15 [hunyuan_image3_transformer.py:2794] Handling AR KV reuse with positive_reuse_len=6606, negative_reuse_len=6209
INFO 05-11 17:45:16 [hunyuan_image3_transformer.py:2794] Handling AR KV reuse with positive_reuse_len=6606, negative_reuse_len=6209

[Output] Text:
用户希望将一张可爱的金毛幼犬照片改造成一张充满节日氛围的新年宠物海报。参考图中是一只吐着舌头、歪着头微笑的金毛幼犬,背景是木质地板和模糊的绿植。原始指令非常具体,要求添加特定的标题文字、改变背景、调整构图并应用特定的胶片风格。这是一个中等复杂度的任务,因为它涉及了文字添加、背景替换、构图调整以及整体风格的转变。首先,我需要处理文字部分,将“新年快乐汪”和“HAPPY NEW YEAR”以圆润可爱的字体放置在图像上方。接着,背景需要从户外的木地板切换到室内的门口场景,这需要重新构建空间感。构图上,原始指令提到的鱼眼镜头效果意味着画面边缘会有明显的弧形畸变,这会给海报带来一种超现实的趣味感。对于主体小狗,虽然要保持其歪头吐舌的可爱神态,但需要为其添加红色的毛线帽和红色围巾,以契合新年主题。最后,整体风格要模拟宝丽莱相纸的效果,带有胶片颗粒感和复古质感。在改写指令时,我会把这些零散的要求整合为一个逻辑清晰的描述,明确文字的位置、样式,背景的具体内容,以及滤镜和构图的具体表现形式,确保生成的图像既保留了原图的神韵,又具备海报的完整性和节日感。</think>将参考图中的金毛幼犬制作成一张新年主题的海报。在图像顶部添加圆润可爱的艺术字体标题“新年快乐汪”,下方配以较小的英文“HAPPY NEW YEAR”。将背景从户外的木地板更换为室内的门口场景,可以看到白色的门框和室内的地板。对整张图片应用鱼眼镜头效果,使画面边缘产生自然的弧形畸变。给小狗戴上一顶红色的针织毛线帽和一条配套的红色围巾,保留其歪头吐舌的可爱表情。最后,为整张图片添加宝丽莱相纸风格的白色边框,并赋予其胶片摄影的颗粒感和复古色调,使画面呈现出一种温馨且具有质感的节日氛围。</recaption>

edit

Runtime validation on target GPU/XPU hardware is still needed.

@lishunyang12
Copy link
Copy Markdown
Collaborator

lishunyang12 commented Apr 28, 2026

@Fishermanykx It would be great if we can accelerate this pr so that we can get it merged before 0.20.0. Due to limited bandwidth and gpu resources, i will close #2989 and let thsi pr to take over. cc @hsliuustc0106 @TaffyOfficial @Bounty-hunter .

@Fishermanykx
Copy link
Copy Markdown
Contributor Author

@Fishermanykx It would be great if we can accelerate this pr so that we can get it merged before 0.20.0. Due to limited bandwidth and gpu resources, i will close #2989 and let thsi pr to take over. cc @hsliuustc0106 @TaffyOfficial @Bounty-hunter .

WIP, testing. I think it will be ready today

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_unified_deploy branch from f7a1ac7 to e6fb220 Compare April 29, 2026 06:43
@Fishermanykx Fishermanykx marked this pull request as ready for review April 29, 2026 07:49
@Fishermanykx
Copy link
Copy Markdown
Contributor Author

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 764f1e3ac2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 1016 to 1017
if model_type and model_type in _PIPELINE_REGISTRY:
return cls._create_from_registry(model_type, cli_overrides, deploy_config_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Honor deploy pipeline before model_type registry match

When --deploy-config points to a topology-specific file (for example hunyuan_image3_ar.yaml or hunyuan_image3_dit.yaml), this method still returns immediately on the auto-detected HF model_type and never gives the deploy file’s pipeline field a chance to select the intended registry entry. For HunyuanImage3 this means explicit AR-only/DiT-only deploy configs are ignored whenever the model reports hunyuan_image3, so users get the wrong stage topology at runtime.

Useful? React with 👍 / 👎.

owns_tokenizer=False,
requires_multimodal_data=True,
model_arch=_HUNYUAN_IMAGE3_MODEL_ARCH,
engine_output_type="latent",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove latent output mode from AR-only text pipeline

The AR-only pipeline is declared as a text final-output stage, but engine_output_type is set to "latent". In HunyuanImage3, non-text engine_output_type switches the model out of comprehension behavior, changing token constraints and generation flow; this can degrade or break img2text/text2text outputs compared with the previous AR-only configs that relied on text/default output mode.

Useful? React with 👍 / 👎.

distributed_executor_backend: mp
enable_prefix_caching: false
async_chunk: false

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove these two arguments?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only async_chunk: false is need in single stage deployment, others are removed now

Comment thread vllm_omni/deploy/hunyuan_image3_ar.yaml Outdated
Comment on lines +27 to +33
hf_overrides:
rope_parameters:
mrope_section: [0, 32, 32]
rope_type: default
final_output: true
final_output_type: text
requires_multimodal_data: true
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's wired that there still exists these static fields. It doesn't like deploy parameters. Is it possible to remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot find out ways to remove these fields. final_output, final_output_type and requires_multimodal_data are runtime metadata, and they are now not included in cli. Now we can only control these args in yaml

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the other migrated models as well. These fields are consistently defined in pipeline.py via StagePipelineConfig, rather than in deploy YAML:

  • requires_multimodal_data
  • final_output
  • final_output_type
  • model_stage
  • engine_output_type

For example, this is how qwen3_omni, qwen2_5_omni, qwen3_tts, glm_image, and others are structured.

Comment thread vllm_omni/engine/async_omni_engine.py Outdated
response_address=response_address,
)
complete_diffusion_handshake(proc, handshake_address)
complete_diffusion_handshake(proc, handshake_address, stage_init_timeout)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to add this parameter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted, these are add when I'm trying to use one yaml to control different stages. forgot to delete these lines

`--stage-configs-path` are both omitted:

Get into the hunyuan_image3 folder:
| `--modality` | `mode` passed to Omni | Default deploy |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how I can use this modality field online.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used in offline mode. Online controls modality via different request fields (for example, t2t in chat/completions, t2i in images/generations, etc.)

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_unified_deploy branch 2 times, most recently from bb045f3 to 958f5b6 Compare April 29, 2026 08:30
f"Please ensure the model has proper configuration files with 'model_type' field"
)

default_config_path = current_omni_platform.get_default_stage_config_path()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to change the priority. you have rm hunyuan yaml from stage_config, if os.path.exists(complete_config_path): will return false.

Comment thread vllm_omni/entrypoints/utils.py Outdated
return stage_configs


def _normalize_mode_stage_overrides(stage_overrides: Any) -> list[dict[str, Any]]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_normalize_mode_stage_overrides() and _apply_mode_stage_overrides() introduce a new YAML-side capability: mode-specific post-resolution mutation of stage config.

This is different from the original --stage-overrides mechanism, which was meant for CLI-side runtime overrides over YAML/default config.

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_unified_deploy branch from e4b01b6 to 607e005 Compare April 29, 2026 09:19
Comment thread vllm_omni/deploy/hunyuan_image3.yaml Outdated
stages: [0, 1]
stage_overrides:
0:
requires_multimodal_data: false
Copy link
Copy Markdown
Contributor

@xiaohajiayou xiaohajiayou Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requires_multimodal_data appears to be treated as a pipeline-structural field rather than a deploy-time field. In merge_pipeline_deploy(), the runtime value is always taken from StagePipelineConfig:

runtime["requires_multimodal_data"] = ps.requires_multimodal_data

So even if the same field is present in deploy YAML, the resolved runtime here is sourced from ps, not from the deploy config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment thread vllm_omni/deploy/hunyuan_image3_ar.yaml Outdated
Comment on lines +9 to +13
- mode: text-to-text
stages: [0]
stage_overrides:
0:
requires_multimodal_data: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two issues here:

  1. --stage-overrides is a CLI-side per-stage runtime override mechanism, which makes sense for deployment knobs. It does not seem like something that should be embedded into YAML semantics.
  2. modes was already used in the legacy stage_config_path flow to distinguish different modes by selecting different stages. That feels inconsistent with how we represent structural or mode-level differences in other models, where we usually use separate pipeline variants instead, e.g. qwen2_5_omni vs qwen2_5_omni_thinker_only, bagel vs bagel_think, and hunyuan_image3 / _ar / _dit.

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_unified_deploy branch 2 times, most recently from 68b36eb to b22796c Compare May 6, 2026 08:59
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_unified_deploy branch 3 times, most recently from 1c7bd85 to f23d3af Compare May 7, 2026 03:32
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_unified_deploy branch from f23d3af to e16c4c6 Compare May 7, 2026 07:33
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: Y. Fisher <yukexiong1@huawei.com>
Signed-off-by: Y. Fisher <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
This reverts commit dac00c4.

Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_unified_deploy branch from d6c42fb to 2b44288 Compare May 11, 2026 12:43
@Gaohan123 Gaohan123 added this to the v0.22.0 milestone May 11, 2026
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@hsliuustc0106 hsliuustc0106 merged commit a33e2eb into vllm-project:main May 11, 2026
7 of 8 checks passed
clodaghwalsh17 pushed a commit to clodaghwalsh17/nm-vllm-omni-ent that referenced this pull request May 12, 2026
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: Y. Fisher <yukexiong1@huawei.com>
MaciejBalaNV pushed a commit to MaciejBalaNV/vllm-omni that referenced this pull request May 12, 2026
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: Y. Fisher <yukexiong1@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants