[Config Refactor] HunyuanImage3 pipeline configs#2989
[Config Refactor] HunyuanImage3 pipeline configs#2989lishunyang12 wants to merge 4 commits intovllm-project:mainfrom
Conversation
|
No GPU to test. Awaiting |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
|
cc @Semmer2 |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Blocker scan
| Category | Result |
|---|---|
| Correctness | BLOCK |
| Reliability/Safety | PASS |
| Breaking Changes | BLOCK |
| Test Coverage | PASS |
| Documentation | BLOCK |
| stage_config.py wiring | PASS |
Blocking issues
1. i2t and t2t modes deleted without migration
PR description says "Five separate model_types" and explicitly lists:
hunyuan_image3_i2t — AR only (stage 0) — Replaces i2t.yaml
hunyuan_image3_t2t — AR only (stage 0) — Replaces t2t.yaml
But neither is registered in pipeline_registry.py, no pipeline definition exists in pipeline.py, and both yamls are simply deleted. The README and end2end.py also remove those modalities entirely.
This is a breaking change for existing users. Either:
- Add
hunyuan_image3_i2tandhunyuan_image3_t2tpipeline definitions + deploy yamls, OR - Update the PR description to explicitly state these modes are intentionally dropped (not "Replaced")
2. FP8 deploy yaml mentioned but not included
PR body says:
FP8 stays as a separate
deploy/hunyuan_image3_t2i_fp8.yaml(quantization is not a platform delta).
No such file exists in this diff. Both moe_dit_2gpu_fp8.yaml (2x H200 FP8 DiT) and t2i_2gpu.yaml (2-GPU AR) are deleted without replacement. Users on 2-GPU or FP8 setups lose their configs.
3. PR description / implementation mismatch
The description states 5 model_types; only 3 are implemented. The description lists NPU/XPU overlay consolidation, but only the t2i deploy yaml gets platform sections — it2i gets none (was this intentional?).
Non-blocking notes
stage_config.pychanges look correct —omni_kv_configandrequires_multimodal_dataare new explicit wire-ups fromStagePipelineConfigfields to engine_args/runtime, needed for the pipeline.py framework. No regression risk for existing yaml-based models.hf_architectures(renders as***in some tools) correctly used on T2I only for model-type fallback.- Pipeline topology definitions are clean and well-documented.
- dit_only_ci.yaml correctly uses the new schema for the e2e test.
- XPU/NPU consolidation into
platforms:sections is the right pattern.
alex-jw-brooks
left a comment
There was a problem hiding this comment.
Thanks, I think it looks good - some thoughts, I think the text2text/img2text stuff in the earlier review are also important
| enable_expert_parallel: false | ||
| vae_use_slicing: false | ||
| vae_use_tiling: false | ||
| cache_backend: null |
There was a problem hiding this comment.
I'm actually not sure this is the right default value for cache_backend, it might be currently be "none" as a string (e.g., based on places like this)
Since this and some of the others are default values though, I think it would be best to remove them where possible, since it makes the configs noisier
There was a problem hiding this comment.
Good catch — yeah, cache_backend: null / cache_config: null / enable_cache_dit_summary: false were all defaults. Removed in df65e00. Same for the matching it2i values.
| devices: "4,5,6,7" | ||
| parallel_config: | ||
| tensor_parallel_size: 4 | ||
| enable_expert_parallel: false |
There was a problem hiding this comment.
Why is expert parallel disabled in this config, but enabled in hunyuan_image3_it2i.yaml?
There was a problem hiding this comment.
Copy-paste asymmetry — no real reason. Aligned t2i stage 1 to enable_expert_parallel: true to match it2i in df65e00.
| - stage_id: 0 | ||
| max_num_seqs: 1 | ||
| gpu_memory_utilization: 0.95 | ||
| enforce_eager: true |
There was a problem hiding this comment.
Also curious about enforce_eager=True here
There was a problem hiding this comment.
Stage 0 is the AR/MoE side — kept enforce_eager: true per the qwen3_omni_moe.yaml convention (cudagraph capture is unreliable across MoE expert routing during AR token-by-token generation). Flipped stage 1 (DiT) in df65e00 to fall through to the dataclass default False so cudagraph runs there.
| devices: "0,1,2,3,4,5,6,7" | ||
| parallel_config: | ||
| tensor_parallel_size: 8 | ||
| enable_expert_parallel: true |
There was a problem hiding this comment.
It may be a good idea to add the NPU config here too, since there was one before. I only see an NPU section in the CI config
There was a problem hiding this comment.
Done in df65e00 — ported the deleted platforms/npu/stage_configs/hunyuan_image3_t2i.yaml into a platforms.npu section under platforms.xpu.
|
Looks good. It is necessary to extract the common part rather than "one strategy, one yaml file". |
|
Re: description says "users can pass --pipeline hunyuan_image3_ with a custom deploy yaml" — but i2t/t2t have no pipeline definition in pipeline.py after this PR, only dit_only does. A user bringing their own deploy yaml would still hit registry lookup failure. If the intent is "BYO", please keep the pipeline.py topology entries for i2t/t2t (small addition, no yaml cost) and drop only the deploy yamls. |
|
tests/e2e/.../stage_configs/dit_only_ci.yaml reintroduces the stage_configs/ directory name that #2383 explicitly deprecated. Suggest renaming to tests/e2e/.../deploy/hunyuan_image3_dit_only_ci.yaml to stay consistent with the new schema — otherwise follow-up 2c's cleanup will miss it. |
|
One small design thought — feel free to ignore if this is already settled.The official HunyuanImage-3 repo uses generate_image(prompt, bot_task="image") for T2I, which maps to the DIT_ONLY path rather than going through AR→DiT. I ran into this while setting up a GenEval CI on a fork and ended up registering DIT_ONLY as the default for hunyuan_image_3_moe so that vllm serve ... --omni would Just Work out of the box for T2I users.The current PR keeps hunyuan_image3_t2i as AR→DiT and exposes hunyuan_image3_dit_only as a separate model_type, which is cleaner semantically but means users have to know to pass --pipeline hunyuan_image3_dit_only to match official behavior.Not saying one is right and the other wrong — both have merit. Just thought it'd be worth a sentence in the PR description explaining the choice, so downstream users know which path matches the Tencent reference. |
There was a problem hiding this comment.
It may be necessary to use Omni.from_cli_args() here; otherwise, it won’t be possible to distinguish CLI arguments explicitly provided by the user.
There was a problem hiding this comment.
It may be necessary to use
Omni.from_cli_args()here; otherwise, it won’t be possible to distinguish CLI arguments explicitly provided by the user.
Hi @xiaohajiayou, I sent you an email requesting for adding me on Wechat to facilicate further conversion.
|
Pushed df65e00 addressing the open review items: Blockers (cc @hsliuustc0106)
Inline (cc @alex-jw-brooks)
TaffyOfficial follow-ups
xiaohajiayou
|
|
@TaffyOfficial (re: i2t/t2t topology) — Done in df65e00. Added |
Signed-off-by: lishunyang <lishunyang12@163.com>
…i + it2i Image generation is the headline modality. AR-only (i2t/t2t) and DiT-only runs are niche; users can pass --pipeline hunyuan_image3_<variant> with a custom deploy yaml. FP8 toggles via --quantization fp8 (DiT-only path verified; IT2I AR + image FP8 hits an upstream vLLM kernel limitation — see vllm-project#2976). dit_only.yaml moved to tests/e2e/.../stage_configs/ as a CI-only fixture; the dit_only pipeline registration is kept so users can BYO deploy. Signed-off-by: lishunyang <lishunyang12@163.com>
…s cleanup Signed-off-by: lishunyang <lishunyang12@163.com>
df65e00 to
dbadc5c
Compare
|
@TaffyOfficial (re: stage_configs rename) — Done. Renamed to |
|
@TaffyOfficial (re: T2I path) — Good point, hadn't documented this. Updated the PR description with a "T2I path choice" subsection: |
2.One coverage concern: the e2e text2img test now points to hunyuan_image3_dit_only_ci.yaml, so it no longer exercises the shipped default hunyuan_image3_t2i.yaml nor the AR→DiT KV-transfer path that this PR makes the default T2I route. |
Summary
Continuation of RFC #2072. Migrates HunyuanImage-3.0 from the legacy
vllm_omni/model_executor/stage_configs/hunyuan_image3_*.yamlfiles (7 yamls + 2 platform overlays) into the newpipeline.py(topology) +vllm_omni/deploy/<model>.yaml(deployment) split established by #2383.Variant strategy
Five separate
model_types — one per task — registered in_OMNI_PIPELINES. Precedent:qwen2_5_omni+qwen2_5_omni_thinker_only.hunyuan_image3_t2ideploy/hunyuan_image3_t2i.yaml(+_fp8.yaml)hunyuan_image3_it2ideploy/hunyuan_image3_it2i.yamlhunyuan_image3_dit_onlyhunyuan_image3_i2thunyuan_image3_t2tdit_only/i2t/t2tcarry only the pipeline.py topology — no default deploy yaml — because hardware sizing for those modes depends on the use case. Users bringing their own deploy yaml just point--pipeline hunyuan_image3_<variant>at it.T2I path choice
T2I is registered as AR→DiT (matching the official
bot_task="text"flow that produces a textual prompt for the DiT). For users wanting Tencent'sbot_task="image"semantics (skip the AR side entirely), use--pipeline hunyuan_image3_dit_onlywith their own deploy yaml. The two paths produce different image quality / latency trade-offs; AR→DiT is the default because it matches the headline modality demonstrated in the official repo.Deploy yaml consolidation
platforms:sections inside onedeploy/hunyuan_image3_<variant>.yamlper task.platforms.npu/platforms.xpusections of the corresponding CUDA yaml — mirrorsqwen3_omni_moe.yamlstructure.deploy/hunyuan_image3_t2i_fp8.yaml(quantization is not a platform delta).Field ownership
Following the 2/N decisions:
model_arch=HunyuanImage3ForCausalMM,execution_type,input_sources,final_output*,omni_kv_config(KV transfer between AR↔DiT),kv_transfer_criteria,custom_process_input_func(hunyuan_image3.ar2diffusionon DiT stages), ARstop_token_ids: [127957]assampling_constraints(model-intrinsic until [Follow-up] Deploy/pipeline config follow-ups from #2383 #2887 item 2 lands).gpu_memory_utilization,devices,tensor_parallel_size,max_num_seqs,default_sampling_params(per-variant AR sampling differs: t2i=greedy, it2i=temp=0.6/top_p=0.95/top_k=1024), DiTnum_inference_steps=50,guidance_scale=2.5,hf_overrides.rope_parameters.mrope_section=[0,32,32]. AR stages keepenforce_eager: trueper qwen3_omni_moe convention; DiT stages omit the field so cudagraph runs by default.worker_cls/scheduler_clsare auto-derived fromStageExecutionType.LLM_ARvia_resolve_execution_mode— not copied.Cleanup
Deletes:
vllm_omni/model_executor/stage_configs/hunyuan_image3_{t2i,t2i_2gpu,moe,moe_dit_2gpu_fp8,it2i,i2t,t2t}.yamlvllm_omni/platforms/{npu,xpu}/stage_configs/hunyuan_image3_t2i.yamlUpdates:
vllm_omni/deploy/.tests/e2e/offline_inference/stage_configs/hunyuan_image3_dit_only_ci.yaml→tests/e2e/offline_inference/deploy/hunyuan_image3_dit_only_ci.yaml(renamed for consistency with the new schema).examples/offline_inference/hunyuan_image3/end2end.pyswitched toOmni.from_cli_args(args, parser=parser, **overrides)so argparse defaults don't silently clobber deploy YAML values (override-precedence revisited in [RFC] Sentinel-default precedence for stage engine args #3035 post-0.20.0).Coordination
Independent of #2977 (HunyuanImage3 has a real
config.jsonat the repo root, so model-type detection works withoutdiffusers_class_name).Test plan
pre-commit run --files <changed-files>passespytest tests/config/test_pipeline_registry.py -vcc @alex-jw-brooks @hsliuustc0106 @nussejzz @TaffyOfficial @xuechendi @xiaohajiayou