-
Notifications
You must be signed in to change notification settings - Fork 1k
[Config] Add HunyuanImage3 deploy configs #3172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
hsliuustc0106
merged 40 commits into
vllm-project:main
from
Fishermanykx:yukexiong/hunyuan_unified_deploy
May 11, 2026
Merged
Changes from all commits
Commits
Show all changes
40 commits
Select commit
Hold shift + click to select a range
550f938
[Config] Add HunyuanImage3 deploy configs
Fishermanykx 6fddd0e
Add request-level HunyuanImage3 bot task controls
Fishermanykx f032d5f
Apply ruff format for HunyuanImage3 files
Fishermanykx 851baf6
Refine HunyuanImage3 prompt task composition
Fishermanykx d6ed92f
Unify online HunyuanImage3 bot task handling
Fishermanykx a10219d
Revert "Unify online HunyuanImage3 bot task handling"
Fishermanykx 441145c
Consolidate HunyuanImage3 bot task resolution
Fishermanykx 5d88d16
Remove legacy HunyuanImage3 bot task helpers
Fishermanykx 7d70ae5
Remove online HunyuanImage3 bot task changes
Fishermanykx 09a0259
Hardcode HunyuanImage3 offline control token ids
Fishermanykx 2cc6ad7
Hardcode HunyuanImage3 offline control token ids
Fishermanykx 12a77da
Refactor prompt_utils.py
Fishermanykx 2612670
adjust end2end according to prompt utils
Fishermanykx 1dab1f0
Fix HunyuanImage3 i2t think stop tokens
Fishermanykx 5c3eda0
Revert "Fix HunyuanImage3 i2t think stop tokens"
Fishermanykx 8d2970b
Fix HunyuanImage3 i2t think stop token
Fishermanykx 85881e8
Align HunyuanImage3 prompt utils tests
Fishermanykx a72f457
Remove unsupported HunyuanImage3 comprehension think tasks
Fishermanykx 596148b
update
Fishermanykx 29e9f94
update
Fishermanykx 1ccedc6
Update HunyuanImage3 stop token handling
Fishermanykx a63b9ff
Fix HunyuanImage3 pre-commit formatting
Fishermanykx 21e16af
Add HunyuanImage3 KV reuse deploy config
Fishermanykx 6ae5389
Address HunyuanImage3 deploy path review
Fishermanykx 02a8378
Limit HunyuanImage3 images per prompt
Fishermanykx 476a7f0
Revert "Limit HunyuanImage3 images per prompt"
Fishermanykx 8f594ee
Fix HunyuanImage3 stop token mapping
Fishermanykx 5c03b7c
Enable model sampler for NPU AR runner
Fishermanykx 32ea60f
Update HunyuanImage3 KV reuse deploy config
Fishermanykx c7643df
Fix HunyuanImage3 stop token unit test
Fishermanykx 553bd8b
Update HunyuanImage3 deploy config
Fishermanykx badd206
Fix HunyuanImage3 stop token test ids
Fishermanykx 3975e50
Print HunyuanImage3 AR generated text
Fishermanykx 015b34f
Preserve HunyuanImage3 AR tag output
Fishermanykx 4f6b573
Fix HunyuanImage3 NPU AR output flow
Fishermanykx 4807452
Fix NPU AR sampler history fallback
Fishermanykx a0dd770
Revert NPU AR sampler history fallback
Fishermanykx 64a65c7
Revert NPU AR model sampler override
Fishermanykx 6d9b2f9
Adjust HunyuanImage3 NPU stage 0 batching
Fishermanykx 2b44288
Remove legacy HunyuanImage3 stage config
Fishermanykx File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,172 +1,156 @@ | ||
| # HunyuanImage-3.0-Instruct | ||
|
|
||
| ## Set up | ||
|
|
||
| Please refer to the [stage configuration documentation](https://docs.vllm.ai/projects/vllm-omni/en/latest/configuration/stage_configs/) to configure memory allocation appropriately for your hardware setup. | ||
|
|
||
| ## Run examples | ||
|
|
||
| **Note**: These examples work with the default configuration on **8x NVIDIA L40S (48GB)**. For different GPU setups, modify the stage configuration to adjust device allocation and memory utilization. | ||
|
|
||
| Get into the hunyuan_image3 folder: | ||
| This example runs HunyuanImage-3.0-Instruct offline with the unified deploy | ||
| YAMLs under `vllm_omni/deploy/`. | ||
|
|
||
| ## Deploy Configs | ||
|
|
||
| | File | Topology | Default use | | ||
| | :--- | :--- | :--- | | ||
| | `vllm_omni/deploy/hunyuan_image3.yaml` | AR + DiT | Default for `text2img` and `img2img`. | | ||
| | `vllm_omni/deploy/hunyuan_image3_ar.yaml` | AR only | Default for `img2text` and `text2text`. | | ||
| | `vllm_omni/deploy/hunyuan_image3_dit.yaml` | DiT only | Standalone diffusion stage. Pass it explicitly with `--deploy-config`. | | ||
|
|
||
| The example chooses a deploy config automatically when `--deploy-config` and | ||
| `--stage-configs-path` are both omitted: | ||
|
|
||
| | `--modality` | `mode` passed to Omni | Default deploy | | ||
| | :--- | :--- | :--- | | ||
| | `text2img` | `text-to-image` | `hunyuan_image3.yaml` | | ||
| | `img2img` | `image-editing` | `hunyuan_image3.yaml` | | ||
| | `img2text` | `image-to-text` | `hunyuan_image3_ar.yaml` | | ||
| | `text2text` | `text-to-text` | `hunyuan_image3_ar.yaml` | | ||
|
|
||
| `--modality` is an offline example convenience flag. It maps to the internal | ||
| `mode` argument passed to `Omni(...)` by this script. HunyuanImage3 uses | ||
| separate deploy YAMLs for AR + DiT, AR-only, and DiT-only topologies, so the | ||
| stage topology is selected by the deploy file rather than by YAML mode | ||
| overrides. | ||
|
|
||
| Online serving does not expose a `--modality` flag or accept `mode` as an API | ||
| request field. Choose the deploy topology when starting the server with | ||
| `--deploy-config`, then use the OpenAI-compatible endpoint and request shape for | ||
| the scenario. The `modalities` request field is used by the chat completions | ||
| path; the image endpoints infer the image task from the endpoint and payload. | ||
|
|
||
| | Online scenario | Server deploy | Request | | ||
| | :--- | :--- | :--- | | ||
| | Text to image | `--deploy-config vllm_omni/deploy/hunyuan_image3.yaml` | `POST /v1/images/generations`, or `POST /v1/chat/completions` with `"modalities": ["image"]`. | | ||
| | Image editing | `--deploy-config vllm_omni/deploy/hunyuan_image3.yaml` | `POST /v1/images/edits`. | | ||
| | Image/text to text | `--deploy-config vllm_omni/deploy/hunyuan_image3_ar.yaml` | `POST /v1/chat/completions` for text output, for example with `"modalities": ["text"]`. | | ||
| | DiT-only image generation | `--deploy-config vllm_omni/deploy/hunyuan_image3_dit.yaml` | `POST /v1/images/generations`. | | ||
|
|
||
| ## Run Examples | ||
|
|
||
| Text to image, using the default AR + DiT deploy: | ||
|
|
||
| ```bash | ||
| cd examples/offline_inference/hunyuan_image3 | ||
| python examples/offline_inference/hunyuan_image3/end2end.py \ | ||
| --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2img \ | ||
| --prompts "A cute cat sitting on a windowsill watching the sunset" | ||
| ``` | ||
|
|
||
| ### Modality Control | ||
|
|
||
| HunyuanImage-3.0-Instruct supports multiple modality modes. You can control the mode using the `--modality` argument: | ||
|
|
||
| #### Text to Image (text2img) | ||
|
|
||
| - **Pipeline**: Text → AR (CoT + latent tokens) → DiT (denoise) → VAE Decode → Image | ||
| - **Stages Used**: Stage 0 (AR) + Stage 1 (DiT) | ||
| - **KV Transfer**: AR sends KV cache to DiT for conditioned generation | ||
| - **Default Config**: `hunyuan_image3_t2i.yaml` | ||
| Image editing, using the default AR + DiT deploy: | ||
|
|
||
| ```bash | ||
| python end2end.py --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2img \ | ||
| --prompts "A cute cat sitting on a windowsill watching the sunset" | ||
| python examples/offline_inference/hunyuan_image3/end2end.py \ | ||
| --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality img2img \ | ||
| --image-path /path/to/image.png \ | ||
| --prompts "Make the petals neon pink" | ||
| ``` | ||
|
|
||
| **With VAE tiling (required on A100 GPUs):** | ||
| ```bash | ||
| python end2end.py --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2img \ | ||
| --prompts "A cute cat sitting on a windowsill watching the sunset" \ | ||
| --vae-use-tiling | ||
| ``` | ||
|
|
||
| #### Image to Image (img2img) | ||
|
|
||
| - **Pipeline**: Image + Text → AR (CoT + recaption + latent) → DiT → Edited Image | ||
| - **Stages Used**: Stage 0 (AR) + Stage 1 (DiT) | ||
| - **KV Transfer**: AR sends KV cache to DiT | ||
| - **Default Config**: `hunyuan_image3_it2i.yaml` | ||
| Image to text, using the AR-only deploy: | ||
|
|
||
| ```bash | ||
| python end2end.py --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality img2img \ | ||
| --image-path /path/to/image.png \ | ||
| --prompts "Make the petals neon pink" | ||
| python examples/offline_inference/hunyuan_image3/end2end.py \ | ||
| --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality img2text \ | ||
| --image-path /path/to/image.jpg \ | ||
| --prompts "Describe the content of the picture." | ||
| ``` | ||
|
|
||
| #### Image to Text (img2text) | ||
|
|
||
| - **Pipeline**: Image + Question → AR → Text description | ||
| - **Stages Used**: Stage 0 (AR) only | ||
| - **Default Config**: `hunyuan_image3_i2t.yaml` | ||
| Text to text, using the AR-only deploy: | ||
|
|
||
| ```bash | ||
| python end2end.py --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality img2text \ | ||
| --image-path /path/to/image.jpg \ | ||
| --prompts "Describe the content of the picture." | ||
| python examples/offline_inference/hunyuan_image3/end2end.py \ | ||
| --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2text \ | ||
| --prompts "What is the capital of France?" | ||
| ``` | ||
|
|
||
| #### Text to Text (text2text) | ||
|
|
||
| - **Pipeline**: Text → AR → Text | ||
| - **Stages Used**: Stage 0 (AR) only | ||
| - **Default Config**: `hunyuan_image3_t2t.yaml` | ||
| Standalone DiT, using the DiT-only deploy explicitly: | ||
|
|
||
| ```bash | ||
| python end2end.py --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2text \ | ||
| --prompts "What is the capital of France?" | ||
| python examples/offline_inference/hunyuan_image3/end2end.py \ | ||
| --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2img \ | ||
| --deploy-config vllm_omni/deploy/hunyuan_image3_dit.yaml \ | ||
| --prompts "A cinematic portrait of an astronaut in a greenhouse" | ||
| ``` | ||
|
|
||
| ### Inference Steps & Guidance | ||
|
|
||
| Control generation quality for image modalities: | ||
| Override the default full AR + DiT deploy explicitly: | ||
|
|
||
| ```bash | ||
| python end2end.py --modality text2img \ | ||
| --steps 50 \ | ||
| --guidance-scale 5.0 \ | ||
| --height 1024 --width 1024 \ | ||
| --prompts "A photo-realistic sunset over the ocean" | ||
| python examples/offline_inference/hunyuan_image3/end2end.py \ | ||
| --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2img \ | ||
| --deploy-config vllm_omni/deploy/hunyuan_image3.yaml \ | ||
| --prompts "A cute cat" | ||
| ``` | ||
|
|
||
| ### Key Arguments | ||
|
|
||
| #### 📌 Command Line Arguments (end2end.py) | ||
|
|
||
| | Argument | Type | Default | Description | | ||
| | :--------------------- | :----- | :----------------------------------- | :----------------------------------------------------------- | | ||
| | `--model` | string | `tencent/HunyuanImage-3.0-Instruct` | Model path or name | | ||
| | `--modality` | choice | `text2img` | Modality: `text2img`, `img2img`, `img2text`, `text2text` | | ||
| | `--prompts` | list | `None` | Input text prompts | | ||
| | `--image-path` | string | `None` | Input image path (for `img2img`/`img2text`) | | ||
| | `--output` | string | `.` | Output directory for saved images | | ||
| | `--steps` | int | `50` | Number of inference steps | | ||
| | `--guidance-scale` | float | `5.0` | Classifier-free guidance scale | | ||
| | `--seed` | int | `42` | Random seed | | ||
| | `--height` | int | `1024` | Output image height | | ||
| | `--width` | int | `1024` | Output image width | | ||
| | `--bot-task` | string | auto | Override prompt task (e.g. `it2i_think`, `t2i_recaption`) | | ||
| | `--sys-type` | string | auto | Override system prompt type (e.g. `en_unified`, `en_vanilla`) | | ||
| | `--stage-configs-path` | string | auto | Custom stage config YAML path | | ||
| | `--enforce-eager` | flag | `False` | Disable torch.compile | | ||
| | `--init-timeout` | int | `300` | Initialization timeout (seconds) | | ||
| | `--vae-use-tiling` | flag | `False` | Enable VAE tiling for memory optimization (required to avoid OOM on A100) | | ||
|
|
||
| ------ | ||
|
|
||
| #### ⚙️ Stage Configurations | ||
|
|
||
| | Config YAML | Modality | Stages | GPUs | Description | | ||
| | :---------------------------------- | :-------- | :----- | :----- | :------------------------------------ | | ||
| | `hunyuan_image3_t2i.yaml` | text2img | 2 | 8 | T2I with AR→DiT, 4 GPU each | | ||
| | `hunyuan_image3_it2i.yaml` | img2img | 2 | 8 | IT2I with AR→DiT, 4 GPU each | | ||
| | `hunyuan_image3_i2t.yaml` | img2text | 1 | 4 | I2T (AR only) | | ||
| | `hunyuan_image3_t2t.yaml` | text2text | 1 | 4 | T2T (AR only) | | ||
| | `hunyuan_image3_t2i_2gpu.yaml` | text2img | 2 | 2 | T2I for 2-GPU setups | | ||
| | `hunyuan_image3_moe.yaml` | text2img | 2 | 8 | T2I with MoE AR→DiT KV reuse | | ||
| | `hunyuan_image3_moe_dit_2gpu_fp8.yaml` | text2img | 2 | 2 | T2I with FP8 quantization | | ||
|
|
||
| ------ | ||
|
|
||
| ## Using MoE Config | ||
|
|
||
| The `hunyuan_image3_moe.yaml` config enables AR→DiT KV cache reuse with 8 GPUs (4 for AR + 4 for DiT). | ||
| ## Key Arguments | ||
|
|
||
| ```bash | ||
| python end2end.py --model tencent/HunyuanImage-3.0-Instruct \ | ||
| --modality text2img \ | ||
| --stage-configs-path hunyuan_image3_moe.yaml \ | ||
| --prompts "A cute cat" | ||
| ``` | ||
| | Argument | Description | | ||
| | :--- | :--- | | ||
| | `--deploy-config` | Preferred config path for unified deploy YAMLs. | | ||
| | `--stage-configs-path` | Legacy stage config path, kept only for compatibility. Prefer `--deploy-config`. | | ||
| | `--modality` | Offline-only convenience flag. One of `text2img`, `img2img`, `img2text`, `text2text`. It selects prompt formatting, internal `mode`, and default deploy config for this script. Online serving uses `--deploy-config` plus the endpoint and, for chat completions, request `modalities` instead. | | ||
| | `--steps` | Number of diffusion inference steps for image generation. | | ||
| | `--guidance-scale` | Classifier-free guidance scale for image generation. | | ||
| | `--height`, `--width` | Output image size for `text2img`. | | ||
| | `--bot-task` | Prompt behavior. `auto` selects the default from `--modality`; `think` adds `<think>`; `recaption` adds `<recaption>`; `vanilla` uses the text-to-image pretrain template. | | ||
| | `--sys-type` | Override the system prompt type, for example `en_unified` or `en_vanilla`. | | ||
| | `--vae-use-tiling` | Enable VAE tiling for memory reduction. | | ||
|
|
||
| ------ | ||
| ## Notes | ||
|
|
||
| - `hunyuan_image3_ar.yaml` is a 4-card AR-only text/comprehension deploy. It sets `engine_output_type: text`, `final_output_type: text`, and text sampling defaults. | ||
| - `hunyuan_image3_dit.yaml` is a single-stage DiT deploy with `stage_id: 0`; it does not require stage 1 or a running AR stage. | ||
| - The old HunyuanImage3 YAMLs under `model_executor/stage_configs/` and `platforms/*/stage_configs/` have been folded into the deploy YAMLs. | ||
| - This PR does not keep the HunyuanImage3 AR-to-DiT KV reuse wiring. The deploy YAMLs describe the topology and platform settings only. | ||
|
|
||
| ## Prompt Format | ||
|
|
||
| HunyuanImage-3.0-Instruct uses an instruct chat template: | ||
|
|
||
| ``` | ||
| <|startoftext|>{system_prompt}\n\nUser: {<img>?}{user_prompt}\n\nAssistant: {trigger_tag?} | ||
| ```text | ||
| <|startoftext|>{system_prompt} | ||
|
|
||
| User: {<img>?}{user_prompt} | ||
|
|
||
| Assistant: {trigger_tag?} | ||
| ``` | ||
|
|
||
| - `<img>`: Placeholder for each input image (single token; expanded by the multimodal pipeline) | ||
| - Trigger tags: `<think>` (CoT), `<recaption>` (recaptioning) — placed AFTER `Assistant: ` | ||
| - System prompt: Auto-selected based on task | ||
| - `t2i_vanilla` is the only task that uses the bare pretrain template (no chat structure) | ||
| - `<img>`: Placeholder for each input image (single token; expanded by the multimodal pipeline). | ||
| - Trigger tags: `<think>` for CoT and `<recaption>` for recaptioning, placed after `Assistant: `. | ||
| - System prompt: Auto-selected based on task. | ||
| - `t2i_vanilla` is the only task that uses the bare pretrain template without chat structure. | ||
| - The example composes the internal prompt task from `--modality` and `--bot-task` | ||
| before calling `prompt_utils`; for example, `img2text + think` becomes | ||
| `i2t_think` for prompt and stop-token lookup. | ||
|
|
||
| The shared `vllm_omni.diffusion.models.hunyuan_image3.prompt_utils.build_prompt_tokens()` | ||
| helper handles segment-by-segment tokenization (matches HF `apply_chat_template` byte-for-byte). | ||
|
|
||
| ------ | ||
| helper handles segment-by-segment tokenization and matches HF `apply_chat_template`. | ||
|
|
||
| ## FAQ | ||
|
|
||
| - **OOM errors**: Decrease `gpu_memory_utilization` in the YAML stage config, use a smaller `max_num_batched_tokens`, or enable VAE tiling with `--vae-use-tiling` (required on A100 GPUs). | ||
| - **OOM errors**: Decrease `gpu_memory_utilization` in the deploy YAML, use a smaller `max_num_batched_tokens`, or enable VAE tiling with `--vae-use-tiling`. | ||
| - **Custom image sizes**: Use `--height` and `--width` flags (multiples of 16 recommended). | ||
|
|
||
| | Stage | VRAM (approx) | | ||
| | :---------------- | :------------------- | | ||
| | Stage 0 (AR) | ~15 GiB + KV Cache | | ||
| | Stage 1 (DiT) | ~30 GiB | | ||
| | Total (8-GPU) | ~45 GiB + KV Cache | | ||
| | Stage | VRAM (approx) | | ||
| | :--- | :--- | | ||
| | Stage 0 (AR) | ~15 GiB + KV Cache | | ||
| | Stage 1 (DiT) | ~30 GiB | | ||
| | Total (8-GPU) | ~45 GiB + KV Cache | | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how I can use this modality field online.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used in offline mode. Online controls modality via different request fields (for example, t2t in chat/completions, t2i in images/generations, etc.)