Skip to content

[Config Refactor] HunyuanImage3 pipeline configs#2989

Open
lishunyang12 wants to merge 4 commits intovllm-project:mainfrom
lishunyang12:config-refactor-4-hunyuan-image3
Open

[Config Refactor] HunyuanImage3 pipeline configs#2989
lishunyang12 wants to merge 4 commits intovllm-project:mainfrom
lishunyang12:config-refactor-4-hunyuan-image3

Conversation

@lishunyang12
Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 commented Apr 21, 2026

Summary

Continuation of RFC #2072. Migrates HunyuanImage-3.0 from the legacy vllm_omni/model_executor/stage_configs/hunyuan_image3_*.yaml files (7 yamls + 2 platform overlays) into the new pipeline.py (topology) + vllm_omni/deploy/<model>.yaml (deployment) split established by #2383.

Variant strategy

Five separate model_types — one per task — registered in _OMNI_PIPELINES. Precedent: qwen2_5_omni + qwen2_5_omni_thinker_only.

model_type Topology Default deploy yaml
hunyuan_image3_t2i AR (stage 0) → DiT (stage 1) with KV transfer deploy/hunyuan_image3_t2i.yaml (+ _fp8.yaml)
hunyuan_image3_it2i AR (mm input, stage 0) → DiT (stage 1) deploy/hunyuan_image3_it2i.yaml
hunyuan_image3_dit_only DiT only (stage 0) BYO — only CI yaml ships
hunyuan_image3_i2t AR only (stage 0) — image+text → text BYO
hunyuan_image3_t2t AR only (stage 0) — text → text BYO

dit_only / i2t / t2t carry only the pipeline.py topology — no default deploy yaml — because hardware sizing for those modes depends on the use case. Users bringing their own deploy yaml just point --pipeline hunyuan_image3_<variant> at it.

T2I path choice

T2I is registered as AR→DiT (matching the official bot_task="text" flow that produces a textual prompt for the DiT). For users wanting Tencent's bot_task="image" semantics (skip the AR side entirely), use --pipeline hunyuan_image3_dit_only with their own deploy yaml. The two paths produce different image quality / latency trade-offs; AR→DiT is the default because it matches the headline modality demonstrated in the official repo.

Deploy yaml consolidation

  • Hardware-tier deltas (1×H20 vs 4×H20 vs 2×L40S etc.) collapse into platforms: sections inside one deploy/hunyuan_image3_<variant>.yaml per task.
  • NPU + XPU overlays moved into platforms.npu / platforms.xpu sections of the corresponding CUDA yaml — mirrors qwen3_omni_moe.yaml structure.
  • FP8 stays as a separate deploy/hunyuan_image3_t2i_fp8.yaml (quantization is not a platform delta).

Field ownership

Following the 2/N decisions:

  • Pipeline (topology): model_arch=HunyuanImage3ForCausalMM, execution_type, input_sources, final_output*, omni_kv_config (KV transfer between AR↔DiT), kv_transfer_criteria, custom_process_input_func (hunyuan_image3.ar2diffusion on DiT stages), AR stop_token_ids: [127957] as sampling_constraints (model-intrinsic until [Follow-up] Deploy/pipeline config follow-ups from #2383 #2887 item 2 lands).
  • Deploy: gpu_memory_utilization, devices, tensor_parallel_size, max_num_seqs, default_sampling_params (per-variant AR sampling differs: t2i=greedy, it2i=temp=0.6/top_p=0.95/top_k=1024), DiT num_inference_steps=50, guidance_scale=2.5, hf_overrides.rope_parameters.mrope_section=[0,32,32]. AR stages keep enforce_eager: true per qwen3_omni_moe convention; DiT stages omit the field so cudagraph runs by default.
  • worker_cls / scheduler_cls are auto-derived from StageExecutionType.LLM_AR via _resolve_execution_mode — not copied.

Cleanup

Deletes:

  • vllm_omni/model_executor/stage_configs/hunyuan_image3_{t2i,t2i_2gpu,moe,moe_dit_2gpu_fp8,it2i,i2t,t2t}.yaml
  • vllm_omni/platforms/{npu,xpu}/stage_configs/hunyuan_image3_t2i.yaml

Updates:

  • Examples and tests that reference the old yaml paths now point to vllm_omni/deploy/.
  • tests/e2e/offline_inference/stage_configs/hunyuan_image3_dit_only_ci.yamltests/e2e/offline_inference/deploy/hunyuan_image3_dit_only_ci.yaml (renamed for consistency with the new schema).
  • examples/offline_inference/hunyuan_image3/end2end.py switched to Omni.from_cli_args(args, parser=parser, **overrides) so argparse defaults don't silently clobber deploy YAML values (override-precedence revisited in [RFC] Sentinel-default precedence for stage engine args #3035 post-0.20.0).

Coordination

Independent of #2977 (HunyuanImage3 has a real config.json at the repo root, so model-type detection works without diffusers_class_name).

Test plan

  • pre-commit run --files <changed-files> passes
  • pytest tests/config/test_pipeline_registry.py -v
  • CI green
  • Manual e2e: t2i, it2i, dit_only on H20

cc @alex-jw-brooks @hsliuustc0106 @nussejzz @TaffyOfficial @xuechendi @xiaohajiayou

@lishunyang12 lishunyang12 changed the title [Config Refactor 4/N] HunyuanImage3 pipeline configs [Config Refactor] HunyuanImage3 pipeline configs Apr 21, 2026
@lishunyang12
Copy link
Copy Markdown
Collaborator Author

No GPU to test. Awaiting

@lishunyang12 lishunyang12 marked this pull request as ready for review April 21, 2026 13:53
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@lishunyang12
Copy link
Copy Markdown
Collaborator Author

@alex-jw-brooks @xiaohajiayou PTAL

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

cc @kechengliu97

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

cc @Semmer2

Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocker scan

Category Result
Correctness BLOCK
Reliability/Safety PASS
Breaking Changes BLOCK
Test Coverage PASS
Documentation BLOCK
stage_config.py wiring PASS

Blocking issues

1. i2t and t2t modes deleted without migration

PR description says "Five separate model_types" and explicitly lists:

hunyuan_image3_i2t — AR only (stage 0) — Replaces i2t.yaml
hunyuan_image3_t2t — AR only (stage 0) — Replaces t2t.yaml

But neither is registered in pipeline_registry.py, no pipeline definition exists in pipeline.py, and both yamls are simply deleted. The README and end2end.py also remove those modalities entirely.

This is a breaking change for existing users. Either:

  • Add hunyuan_image3_i2t and hunyuan_image3_t2t pipeline definitions + deploy yamls, OR
  • Update the PR description to explicitly state these modes are intentionally dropped (not "Replaced")

2. FP8 deploy yaml mentioned but not included

PR body says:

FP8 stays as a separate deploy/hunyuan_image3_t2i_fp8.yaml (quantization is not a platform delta).

No such file exists in this diff. Both moe_dit_2gpu_fp8.yaml (2x H200 FP8 DiT) and t2i_2gpu.yaml (2-GPU AR) are deleted without replacement. Users on 2-GPU or FP8 setups lose their configs.

3. PR description / implementation mismatch

The description states 5 model_types; only 3 are implemented. The description lists NPU/XPU overlay consolidation, but only the t2i deploy yaml gets platform sections — it2i gets none (was this intentional?).

Non-blocking notes

  • stage_config.py changes look correct — omni_kv_config and requires_multimodal_data are new explicit wire-ups from StagePipelineConfig fields to engine_args/runtime, needed for the pipeline.py framework. No regression risk for existing yaml-based models.
  • hf_architectures (renders as *** in some tools) correctly used on T2I only for model-type fallback.
  • Pipeline topology definitions are clean and well-documented.
  • dit_only_ci.yaml correctly uses the new schema for the e2e test.
  • XPU/NPU consolidation into platforms: sections is the right pattern.

Copy link
Copy Markdown
Contributor

@alex-jw-brooks alex-jw-brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think it looks good - some thoughts, I think the text2text/img2text stuff in the earlier review are also important

enable_expert_parallel: false
vae_use_slicing: false
vae_use_tiling: false
cache_backend: null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not sure this is the right default value for cache_backend, it might be currently be "none" as a string (e.g., based on places like this)

Since this and some of the others are default values though, I think it would be best to remove them where possible, since it makes the configs noisier

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — yeah, cache_backend: null / cache_config: null / enable_cache_dit_summary: false were all defaults. Removed in df65e00. Same for the matching it2i values.

devices: "4,5,6,7"
parallel_config:
tensor_parallel_size: 4
enable_expert_parallel: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is expert parallel disabled in this config, but enabled in hunyuan_image3_it2i.yaml?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-paste asymmetry — no real reason. Aligned t2i stage 1 to enable_expert_parallel: true to match it2i in df65e00.

- stage_id: 0
max_num_seqs: 1
gpu_memory_utilization: 0.95
enforce_eager: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious about enforce_eager=True here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stage 0 is the AR/MoE side — kept enforce_eager: true per the qwen3_omni_moe.yaml convention (cudagraph capture is unreliable across MoE expert routing during AR token-by-token generation). Flipped stage 1 (DiT) in df65e00 to fall through to the dataclass default False so cudagraph runs there.

devices: "0,1,2,3,4,5,6,7"
parallel_config:
tensor_parallel_size: 8
enable_expert_parallel: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a good idea to add the NPU config here too, since there was one before. I only see an NPU section in the CI config

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in df65e00 — ported the deleted platforms/npu/stage_configs/hunyuan_image3_t2i.yaml into a platforms.npu section under platforms.xpu.

@kechengliu97
Copy link
Copy Markdown
Contributor

Looks good. It is necessary to extract the common part rather than "one strategy, one yaml file".

@TaffyOfficial
Copy link
Copy Markdown
Contributor

Re: description says "users can pass --pipeline hunyuan_image3_ with a custom deploy yaml" — but i2t/t2t have no pipeline definition in pipeline.py after this PR, only dit_only does. A user bringing their own deploy yaml would still hit registry lookup failure. If the intent is "BYO", please keep the pipeline.py topology entries for i2t/t2t (small addition, no yaml cost) and drop only the deploy yamls.

@TaffyOfficial
Copy link
Copy Markdown
Contributor

tests/e2e/.../stage_configs/dit_only_ci.yaml reintroduces the stage_configs/ directory name that #2383 explicitly deprecated. Suggest renaming to tests/e2e/.../deploy/hunyuan_image3_dit_only_ci.yaml to stay consistent with the new schema — otherwise follow-up 2c's cleanup will miss it.

@TaffyOfficial
Copy link
Copy Markdown
Contributor

One small design thought — feel free to ignore if this is already settled.The official HunyuanImage-3 repo uses generate_image(prompt, bot_task="image") for T2I, which maps to the DIT_ONLY path rather than going through AR→DiT. I ran into this while setting up a GenEval CI on a fork and ended up registering DIT_ONLY as the default for hunyuan_image_3_moe so that vllm serve ... --omni would Just Work out of the box for T2I users.The current PR keeps hunyuan_image3_t2i as AR→DiT and exposes hunyuan_image3_dit_only as a separate model_type, which is cleaner semantically but means users have to know to pass --pipeline hunyuan_image3_dit_only to match official behavior.Not saying one is right and the other wrong — both have merit. Just thought it'd be worth a sentence in the PR description explaining the choice, so downstream users know which path matches the Tencent reference.

@lishunyang12 lishunyang12 added this to the v0.20.0 milestone Apr 22, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be necessary to use Omni.from_cli_args() here; otherwise, it won’t be possible to distinguish CLI arguments explicitly provided by the user.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in df65e00 — switched to Omni.from_cli_args(args, parser=parser, **overrides). Note: this distinction is being revisited under #3035 (sentinel-default precedence) post-0.20.0; for now matching today's convention.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be necessary to use Omni.from_cli_args() here; otherwise, it won’t be possible to distinguish CLI arguments explicitly provided by the user.

Hi @xiaohajiayou, I sent you an email requesting for adding me on Wechat to facilicate further conversion.

@lishunyang12
Copy link
Copy Markdown
Collaborator Author

Pushed df65e00 addressing the open review items:

Blockers (cc @hsliuustc0106)

  • Added HUNYUAN_IMAGE3_I2T_PIPELINE and HUNYUAN_IMAGE3_T2T_PIPELINE (AR-only topologies) to pipeline.py and registered them. No default deploy yaml — BYO per @TaffyOfficial's suggestion (hardware sizing for I2T/T2T depends on use case).
  • Added vllm_omni/deploy/hunyuan_image3_t2i_fp8.yaml (2x H200 FP8) — this was promised in the PR body but missing from the diff.
  • Updated PR description to match the actual implementation (5 model_types, FP8 yaml, NPU section, T2I path note).

Inline (cc @alex-jw-brooks)

  • Removed cache_backend / cache_config / enable_cache_dit_summary defaults from t2i.yaml.
  • Aligned t2i stage 1 enable_expert_parallel to true (matching it2i — the asymmetry was copy-paste).
  • Dropped enforce_eager: true from DiT stages in both t2i and it2i (falls through to dataclass default False for cudagraph). Kept on AR stages per qwen3_omni_moe convention.
  • Added platforms.npu section back to t2i.yaml, ported from the deleted NPU overlay.

TaffyOfficial follow-ups

  • Renamed tests/e2e/offline_inference/stage_configs/hunyuan_image3_dit_only_ci.yamltests/e2e/offline_inference/deploy/hunyuan_image3_dit_only_ci.yaml. Updated the test reference.
  • T2I path design note: kept AR→DiT as hunyuan_image3_t2i (matches the bot_task="text" flow). Users wanting Tencent's bot_task="image" (DIT-only) can pass --pipeline hunyuan_image3_dit_only. Documented in the updated PR description.

xiaohajiayou

@lishunyang12
Copy link
Copy Markdown
Collaborator Author

@TaffyOfficial (re: i2t/t2t topology) — Done in df65e00. Added HUNYUAN_IMAGE3_I2T_PIPELINE and HUNYUAN_IMAGE3_T2T_PIPELINE to pipeline.py and registered both in _OMNI_PIPELINES. BYO deploy yaml works as you described — no registry lookup failure.

Signed-off-by: lishunyang <lishunyang12@163.com>
…i + it2i

Image generation is the headline modality. AR-only (i2t/t2t) and DiT-only
runs are niche; users can pass --pipeline hunyuan_image3_<variant> with a
custom deploy yaml. FP8 toggles via --quantization fp8 (DiT-only path
verified; IT2I AR + image FP8 hits an upstream vLLM kernel limitation —
see vllm-project#2976).

dit_only.yaml moved to tests/e2e/.../stage_configs/ as a CI-only fixture;
the dit_only pipeline registration is kept so users can BYO deploy.

Signed-off-by: lishunyang <lishunyang12@163.com>
…s cleanup

Signed-off-by: lishunyang <lishunyang12@163.com>
@lishunyang12 lishunyang12 force-pushed the config-refactor-4-hunyuan-image3 branch from df65e00 to dbadc5c Compare April 22, 2026 16:04
@lishunyang12
Copy link
Copy Markdown
Collaborator Author

@TaffyOfficial (re: stage_configs rename) — Done. Renamed to tests/e2e/offline_inference/deploy/hunyuan_image3_dit_only_ci.yaml and updated test_hunyuanimage3_text2img.py:21 to match.

@lishunyang12
Copy link
Copy Markdown
Collaborator Author

@TaffyOfficial (re: T2I path) — Good point, hadn't documented this. Updated the PR description with a "T2I path choice" subsection: hunyuan_image3_t2i stays AR→DiT (matches bot_task="text"), and users wanting Tencent's bot_task="image" semantics use --pipeline hunyuan_image3_dit_only with their own deploy yaml. Thanks for flagging.

@TaffyOfficial
Copy link
Copy Markdown
Contributor

  1. The platforms.npu section in hunyuan_image3_t2i.yaml looks potentially inconsistent with the top-level pipeline choice.
    The deploy still targets pipeline: hunyuan_image3_t2i (AR → DiT), but the NPU comment says “DiT only — single-stage NPU deployment” and only overrides stage_id: 0.
    Given the current platform override behavior is stage-wise patching rather than replacing the full stage list, this seems likely to keep the other base stage(s) unless explicitly overridden.
    Could you clarify whether the comment is stale, or whether the NPU path is intended to be truly DiT-only? If it is meant to be DiT-only, it may be better to point it at hunyuan_image3_dit_only or make the intent explicit in the config layout.

2.One coverage concern: the e2e text2img test now points to hunyuan_image3_dit_only_ci.yaml, so it no longer exercises the shipped default hunyuan_image3_t2i.yaml nor the AR→DiT KV-transfer path that this PR makes the default T2I route.
Since the refactor’s main semantic choice is exactly that default path, it would be good to keep at least one test covering the shipped hunyuan_image3_t2i deploy (even if the CI-only DiT-only fixture stays for cost/runtime reasons).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants