[Config Refactor] HunyuanImage3 pipeline configs by lishunyang12 · Pull Request #2989 · vllm-project/vllm-omni

lishunyang12 · 2026-04-21T11:04:35Z

Summary

Continuation of RFC #2072. Migrates HunyuanImage-3.0 from the legacy vllm_omni/model_executor/stage_configs/hunyuan_image3_*.yaml files (7 yamls + 2 platform overlays) into the new pipeline.py (topology) + vllm_omni/deploy/<model>.yaml (deployment) split established by #2383.

Variant strategy

Five separate model_types — one per task — registered in _OMNI_PIPELINES. Precedent: qwen2_5_omni + qwen2_5_omni_thinker_only.

model_type	Topology	Default deploy yaml
`hunyuan_image3_t2i`	AR (stage 0) → DiT (stage 1) with KV transfer	`deploy/hunyuan_image3_t2i.yaml` (+ `_fp8.yaml`)
`hunyuan_image3_it2i`	AR (mm input, stage 0) → DiT (stage 1)	`deploy/hunyuan_image3_it2i.yaml`
`hunyuan_image3_dit_only`	DiT only (stage 0)	BYO — only CI yaml ships
`hunyuan_image3_i2t`	AR only (stage 0) — image+text → text	BYO
`hunyuan_image3_t2t`	AR only (stage 0) — text → text	BYO

dit_only / i2t / t2t carry only the pipeline.py topology — no default deploy yaml — because hardware sizing for those modes depends on the use case. Users bringing their own deploy yaml just point --pipeline hunyuan_image3_<variant> at it.

T2I path choice

T2I is registered as AR→DiT (matching the official bot_task="text" flow that produces a textual prompt for the DiT). For users wanting Tencent's bot_task="image" semantics (skip the AR side entirely), use --pipeline hunyuan_image3_dit_only with their own deploy yaml. The two paths produce different image quality / latency trade-offs; AR→DiT is the default because it matches the headline modality demonstrated in the official repo.

Deploy yaml consolidation

Hardware-tier deltas (1×H20 vs 4×H20 vs 2×L40S etc.) collapse into platforms: sections inside one deploy/hunyuan_image3_<variant>.yaml per task.
NPU + XPU overlays moved into platforms.npu / platforms.xpu sections of the corresponding CUDA yaml — mirrors qwen3_omni_moe.yaml structure.
FP8 stays as a separate deploy/hunyuan_image3_t2i_fp8.yaml (quantization is not a platform delta).

Field ownership

Following the 2/N decisions:

Pipeline (topology): model_arch=HunyuanImage3ForCausalMM, execution_type, input_sources, final_output*, omni_kv_config (KV transfer between AR↔DiT), kv_transfer_criteria, custom_process_input_func (hunyuan_image3.ar2diffusion on DiT stages), AR stop_token_ids: [127957] as sampling_constraints (model-intrinsic until [Follow-up] Deploy/pipeline config follow-ups from #2383 #2887 item 2 lands).
Deploy: gpu_memory_utilization, devices, tensor_parallel_size, max_num_seqs, default_sampling_params (per-variant AR sampling differs: t2i=greedy, it2i=temp=0.6/top_p=0.95/top_k=1024), DiT num_inference_steps=50, guidance_scale=2.5, hf_overrides.rope_parameters.mrope_section=[0,32,32]. AR stages keep enforce_eager: true per qwen3_omni_moe convention; DiT stages omit the field so cudagraph runs by default.
worker_cls / scheduler_cls are auto-derived from StageExecutionType.LLM_AR via _resolve_execution_mode — not copied.

Cleanup

Deletes:

vllm_omni/model_executor/stage_configs/hunyuan_image3_{t2i,t2i_2gpu,moe,moe_dit_2gpu_fp8,it2i,i2t,t2t}.yaml
vllm_omni/platforms/{npu,xpu}/stage_configs/hunyuan_image3_t2i.yaml

Updates:

Examples and tests that reference the old yaml paths now point to vllm_omni/deploy/.
tests/e2e/offline_inference/stage_configs/hunyuan_image3_dit_only_ci.yaml → tests/e2e/offline_inference/deploy/hunyuan_image3_dit_only_ci.yaml (renamed for consistency with the new schema).
examples/offline_inference/hunyuan_image3/end2end.py switched to Omni.from_cli_args(args, parser=parser, **overrides) so argparse defaults don't silently clobber deploy YAML values (override-precedence revisited in [RFC] Sentinel-default precedence for stage engine args #3035 post-0.20.0).

Coordination

Independent of #2977 (HunyuanImage3 has a real config.json at the repo root, so model-type detection works without diffusers_class_name).

Test plan

pre-commit run --files <changed-files> passes
pytest tests/config/test_pipeline_registry.py -v
CI green
Manual e2e: t2i, it2i, dit_only on H20

cc @alex-jw-brooks @hsliuustc0106 @nussejzz @TaffyOfficial @xuechendi @xiaohajiayou

lishunyang12 · 2026-04-21T13:53:49Z

No GPU to test. Awaiting

chatgpt-codex-connector · 2026-04-21T13:54:02Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

lishunyang12 · 2026-04-21T13:55:12Z

@alex-jw-brooks @xiaohajiayou PTAL

hsliuustc0106 · 2026-04-21T15:37:46Z

cc @kechengliu97

hsliuustc0106 · 2026-04-21T15:37:58Z

cc @Semmer2

hsliuustc0106

Blocker scan

Category	Result
Correctness	BLOCK
Reliability/Safety	PASS
Breaking Changes	BLOCK
Test Coverage	PASS
Documentation	BLOCK
stage_config.py wiring	PASS

Blocking issues

1. i2t and t2t modes deleted without migration

PR description says "Five separate model_types" and explicitly lists:

hunyuan_image3_i2t — AR only (stage 0) — Replaces i2t.yaml
hunyuan_image3_t2t — AR only (stage 0) — Replaces t2t.yaml

But neither is registered in pipeline_registry.py, no pipeline definition exists in pipeline.py, and both yamls are simply deleted. The README and end2end.py also remove those modalities entirely.

This is a breaking change for existing users. Either:

Add hunyuan_image3_i2t and hunyuan_image3_t2t pipeline definitions + deploy yamls, OR
Update the PR description to explicitly state these modes are intentionally dropped (not "Replaced")

2. FP8 deploy yaml mentioned but not included

PR body says:

FP8 stays as a separate deploy/hunyuan_image3_t2i_fp8.yaml (quantization is not a platform delta).

No such file exists in this diff. Both moe_dit_2gpu_fp8.yaml (2x H200 FP8 DiT) and t2i_2gpu.yaml (2-GPU AR) are deleted without replacement. Users on 2-GPU or FP8 setups lose their configs.

3. PR description / implementation mismatch

The description states 5 model_types; only 3 are implemented. The description lists NPU/XPU overlay consolidation, but only the t2i deploy yaml gets platform sections — it2i gets none (was this intentional?).

Non-blocking notes

stage_config.py changes look correct — omni_kv_config and requires_multimodal_data are new explicit wire-ups from StagePipelineConfig fields to engine_args/runtime, needed for the pipeline.py framework. No regression risk for existing yaml-based models.
hf_architectures (renders as *** in some tools) correctly used on T2I only for model-type fallback.
Pipeline topology definitions are clean and well-documented.
dit_only_ci.yaml correctly uses the new schema for the e2e test.
XPU/NPU consolidation into platforms: sections is the right pattern.

alex-jw-brooks

Thanks, I think it looks good - some thoughts, I think the text2text/img2text stuff in the earlier review are also important

alex-jw-brooks · 2026-04-21T18:29:52Z

+      enable_expert_parallel: false
+    vae_use_slicing: false
+    vae_use_tiling: false
+    cache_backend: null


I'm actually not sure this is the right default value for cache_backend, it might be currently be "none" as a string (e.g., based on places like this)

Since this and some of the others are default values though, I think it would be best to remove them where possible, since it makes the configs noisier

Good catch — yeah, cache_backend: null / cache_config: null / enable_cache_dit_summary: false were all defaults. Removed in df65e00. Same for the matching it2i values.

alex-jw-brooks · 2026-04-21T18:37:13Z

+    devices: "4,5,6,7"
+    parallel_config:
+      tensor_parallel_size: 4
+      enable_expert_parallel: false


Why is expert parallel disabled in this config, but enabled in hunyuan_image3_it2i.yaml?

Copy-paste asymmetry — no real reason. Aligned t2i stage 1 to enable_expert_parallel: true to match it2i in df65e00.

alex-jw-brooks · 2026-04-21T18:37:41Z

+  - stage_id: 0
+    max_num_seqs: 1
+    gpu_memory_utilization: 0.95
+    enforce_eager: true


Also curious about enforce_eager=True here

Stage 0 is the AR/MoE side — kept enforce_eager: true per the qwen3_omni_moe.yaml convention (cudagraph capture is unreliable across MoE expert routing during AR token-by-token generation). Flipped stage 1 (DiT) in df65e00 to fall through to the dataclass default False so cudagraph runs there.

alex-jw-brooks · 2026-04-21T18:41:57Z

+        devices: "0,1,2,3,4,5,6,7"
+        parallel_config:
+          tensor_parallel_size: 8
+          enable_expert_parallel: true


It may be a good idea to add the NPU config here too, since there was one before. I only see an NPU section in the CI config

Done in df65e00 — ported the deleted platforms/npu/stage_configs/hunyuan_image3_t2i.yaml into a platforms.npu section under platforms.xpu.

kechengliu97 · 2026-04-22T02:02:50Z

Looks good. It is necessary to extract the common part rather than "one strategy, one yaml file".

TaffyOfficial · 2026-04-22T02:15:37Z

Re: description says "users can pass --pipeline hunyuan_image3_ with a custom deploy yaml" — but i2t/t2t have no pipeline definition in pipeline.py after this PR, only dit_only does. A user bringing their own deploy yaml would still hit registry lookup failure. If the intent is "BYO", please keep the pipeline.py topology entries for i2t/t2t (small addition, no yaml cost) and drop only the deploy yamls.

TaffyOfficial · 2026-04-22T02:15:57Z

tests/e2e/.../stage_configs/dit_only_ci.yaml reintroduces the stage_configs/ directory name that #2383 explicitly deprecated. Suggest renaming to tests/e2e/.../deploy/hunyuan_image3_dit_only_ci.yaml to stay consistent with the new schema — otherwise follow-up 2c's cleanup will miss it.

TaffyOfficial · 2026-04-22T02:21:57Z

One small design thought — feel free to ignore if this is already settled.The official HunyuanImage-3 repo uses generate_image(prompt, bot_task="image") for T2I, which maps to the DIT_ONLY path rather than going through AR→DiT. I ran into this while setting up a GenEval CI on a fork and ended up registering DIT_ONLY as the default for hunyuan_image_3_moe so that vllm serve ... --omni would Just Work out of the box for T2I users.The current PR keeps hunyuan_image3_t2i as AR→DiT and exposes hunyuan_image3_dit_only as a separate model_type, which is cleaner semantically but means users have to know to pass --pipeline hunyuan_image3_dit_only to match official behavior.Not saying one is right and the other wrong — both have merit. Just thought it'd be worth a sentence in the PR description explaining the choice, so downstream users know which path matches the Tencent reference.

xiaohajiayou · 2026-04-22T07:23:37Z

It may be necessary to use Omni.from_cli_args() here; otherwise, it won’t be possible to distinguish CLI arguments explicitly provided by the user.

Done in df65e00 — switched to Omni.from_cli_args(args, parser=parser, **overrides). Note: this distinction is being revisited under #3035 (sentinel-default precedence) post-0.20.0; for now matching today's convention.

It may be necessary to use Omni.from_cli_args() here; otherwise, it won’t be possible to distinguish CLI arguments explicitly provided by the user.

Hi @xiaohajiayou, I sent you an email requesting for adding me on Wechat to facilicate further conversion.

lishunyang12 · 2026-04-22T15:45:53Z

Pushed df65e00 addressing the open review items:

Blockers (cc @hsliuustc0106)

Added HUNYUAN_IMAGE3_I2T_PIPELINE and HUNYUAN_IMAGE3_T2T_PIPELINE (AR-only topologies) to pipeline.py and registered them. No default deploy yaml — BYO per @TaffyOfficial's suggestion (hardware sizing for I2T/T2T depends on use case).
Added vllm_omni/deploy/hunyuan_image3_t2i_fp8.yaml (2x H200 FP8) — this was promised in the PR body but missing from the diff.
Updated PR description to match the actual implementation (5 model_types, FP8 yaml, NPU section, T2I path note).

Inline (cc @alex-jw-brooks)

Removed cache_backend / cache_config / enable_cache_dit_summary defaults from t2i.yaml.
Aligned t2i stage 1 enable_expert_parallel to true (matching it2i — the asymmetry was copy-paste).
Dropped enforce_eager: true from DiT stages in both t2i and it2i (falls through to dataclass default False for cudagraph). Kept on AR stages per qwen3_omni_moe convention.
Added platforms.npu section back to t2i.yaml, ported from the deleted NPU overlay.

TaffyOfficial follow-ups

Renamed tests/e2e/offline_inference/stage_configs/hunyuan_image3_dit_only_ci.yaml → tests/e2e/offline_inference/deploy/hunyuan_image3_dit_only_ci.yaml. Updated the test reference.
T2I path design note: kept AR→DiT as hunyuan_image3_t2i (matches the bot_task="text" flow). Users wanting Tencent's bot_task="image" (DIT-only) can pass --pipeline hunyuan_image3_dit_only. Documented in the updated PR description.

xiaohajiayou

Switched examples/offline_inference/hunyuan_image3/end2end.py to Omni.from_cli_args(args, parser=parser, **overrides). The override-precedence design is being revisited in [RFC] Sentinel-default precedence for stage engine args #3035 post-0.20.0.

lishunyang12 · 2026-04-22T15:59:30Z

@TaffyOfficial (re: i2t/t2t topology) — Done in df65e00. Added HUNYUAN_IMAGE3_I2T_PIPELINE and HUNYUAN_IMAGE3_T2T_PIPELINE to pipeline.py and registered both in _OMNI_PIPELINES. BYO deploy yaml works as you described — no registry lookup failure.

Signed-off-by: lishunyang <lishunyang12@163.com>

…i + it2i Image generation is the headline modality. AR-only (i2t/t2t) and DiT-only runs are niche; users can pass --pipeline hunyuan_image3_<variant> with a custom deploy yaml. FP8 toggles via --quantization fp8 (DiT-only path verified; IT2I AR + image FP8 hits an upstream vLLM kernel limitation — see vllm-project#2976). dit_only.yaml moved to tests/e2e/.../stage_configs/ as a CI-only fixture; the dit_only pipeline registration is kept so users can BYO deploy. Signed-off-by: lishunyang <lishunyang12@163.com>

…s cleanup Signed-off-by: lishunyang <lishunyang12@163.com>

lishunyang12 · 2026-04-22T16:04:41Z

@TaffyOfficial (re: stage_configs rename) — Done. Renamed to tests/e2e/offline_inference/deploy/hunyuan_image3_dit_only_ci.yaml and updated test_hunyuanimage3_text2img.py:21 to match.

lishunyang12 · 2026-04-22T16:04:52Z

@TaffyOfficial (re: T2I path) — Good point, hadn't documented this. Updated the PR description with a "T2I path choice" subsection: hunyuan_image3_t2i stays AR→DiT (matches bot_task="text"), and users wanting Tencent's bot_task="image" semantics use --pipeline hunyuan_image3_dit_only with their own deploy yaml. Thanks for flagging.

TaffyOfficial · 2026-04-23T07:23:47Z

The platforms.npu section in hunyuan_image3_t2i.yaml looks potentially inconsistent with the top-level pipeline choice.
The deploy still targets pipeline: hunyuan_image3_t2i (AR → DiT), but the NPU comment says “DiT only — single-stage NPU deployment” and only overrides stage_id: 0.
Given the current platform override behavior is stage-wise patching rather than replacing the full stage list, this seems likely to keep the other base stage(s) unless explicitly overridden.
Could you clarify whether the comment is stale, or whether the NPU path is intended to be truly DiT-only? If it is meant to be DiT-only, it may be better to point it at hunyuan_image3_dit_only or make the intent explicit in the config layout.

2.One coverage concern: the e2e text2img test now points to hunyuan_image3_dit_only_ci.yaml, so it no longer exercises the shipped default hunyuan_image3_t2i.yaml nor the AR→DiT KV-transfer path that this PR makes the default T2I route.
Since the refactor’s main semantic choice is exactly that default path, it would be good to keep at least one test covering the shipped hunyuan_image3_t2i deploy (even if the CI-only DiT-only fixture stays for cost/runtime reasons).

lishunyang12 changed the title ~~[Config Refactor 4/N] HunyuanImage3 pipeline configs~~ [Config Refactor] HunyuanImage3 pipeline configs Apr 21, 2026

lishunyang12 marked this pull request as ready for review April 21, 2026 13:53

lishunyang12 requested a review from hsliuustc0106 as a code owner April 21, 2026 13:53

hsliuustc0106 reviewed Apr 21, 2026

View reviewed changes

alex-jw-brooks reviewed Apr 21, 2026

View reviewed changes

lishunyang12 added this to the v0.20.0 milestone Apr 22, 2026

xiaohajiayou reviewed Apr 22, 2026

View reviewed changes

lishunyang12 added 3 commits April 23, 2026 00:03

[Config Refactor 4/N] HunyuanImage3 pipeline configs

dd0587a

Signed-off-by: lishunyang <lishunyang12@163.com>

Address review: add i2t/t2t pipelines, FP8 yaml, NPU section, default…

dbadc5c

…s cleanup Signed-off-by: lishunyang <lishunyang12@163.com>

lishunyang12 force-pushed the config-refactor-4-hunyuan-image3 branch from df65e00 to dbadc5c Compare April 22, 2026 16:04

Merge branch 'main' into config-refactor-4-hunyuan-image3

ef96965

lishunyang12 mentioned this pull request Apr 22, 2026

[Config Refactor]: Remove bagel yaml #2936

Open

11 tasks

Conversation

lishunyang12 commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Variant strategy

T2I path choice

Deploy yaml consolidation

Field ownership

Cleanup

Coordination

Test plan

Uh oh!

lishunyang12 commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

lishunyang12 commented Apr 21, 2026

Uh oh!

hsliuustc0106 commented Apr 21, 2026

Uh oh!

hsliuustc0106 commented Apr 21, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Blocker scan

Blocking issues

1. i2t and t2t modes deleted without migration

2. FP8 deploy yaml mentioned but not included

3. PR description / implementation mismatch

Non-blocking notes

Uh oh!

alex-jw-brooks left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kechengliu97 commented Apr 22, 2026

Uh oh!

TaffyOfficial commented Apr 22, 2026

Uh oh!

TaffyOfficial commented Apr 22, 2026

Uh oh!

TaffyOfficial commented Apr 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lishunyang12 commented Apr 22, 2026

Uh oh!

lishunyang12 commented Apr 22, 2026

Uh oh!

lishunyang12 commented Apr 22, 2026

Uh oh!

lishunyang12 commented Apr 22, 2026

Uh oh!

TaffyOfficial commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

lishunyang12 commented Apr 21, 2026 •

edited

Loading