Feat/Add HunyuanImage-3.0-Instruct ar part support: by TaffyOfficial · Pull Request #2713 · vllm-project/vllm-omni

TaffyOfficial · 2026-04-13T02:53:45Z

Purpose

HunyuanImage-3.0-Instruct is a multimodal model by Tencent that supports image understanding (I2T), image editing (IT2I), and text-to-image generation (T2I). The initial model registration was added in #759, but only included the basic model files and a single T2I diffusion config. The AR side was missing critical runtime logic — sampling-based decoding produced empty outputs, image tokens used wrong attention type, and there were no stage configs for I2T/IT2I/T2T pipelines.

This PR fills those gaps to make the AR side actually functional.

Depends on #2712 (rename fix).

Custom sampler with logits processors

The official HunyuanImage-3.0 AR model generates tokens in a fixed internal sequence: <think> → </think> → <recaption> → </recaption> → <boi> → size token → ratio token. This is enforced by two logits processors inside the official generate_image():

_StageTransitionLogitsProcessor: checks which phase the model is in and forces the next special token at phase boundaries (e.g. after </think>, force <recaption>)
_ConditionalSliceVocabLogitsProcessor: after a size token is emitted, masks the vocabulary to only allow ratio tokens

Without these processors, sampling-based decoding (temperature=0.6, top_k=1024, top_p=0.95) breaks immediately — the model samples </answer> or <|endoftext|> as the first token and outputs an empty string. Greedy decoding happened to work by luck but doesn't match official behavior.

This PR ports both processors into HunyuanImage3ForConditionalGeneration.sample() with prefer_model_sampler=True, so vLLM-Omni's standard sampling pipeline calls our custom sampler before applying temperature/top_k/top_p. Also adds ratio-token EOS forcing: once a ratio token is selected, the next token is forced to EOS, matching the official behavior where all ratio tokens are final_stop_tokens.

Bidirectional attention for image tokens

HunyuanImage-3.0 uses bidirectional (non-causal) attention for image token positions — text tokens remain causal. This is controlled by cond_token_attn_type: "joint_full" in the HF config. vLLM-Omni routes this through ModelConfig.is_mm_prefix_lm, but hunyuan_image_3_moe was not in the allowlist. This PR patches is_mm_prefix_lm to include it. Without this fix, image tokens only attend to preceding tokens, which degrades image understanding quality.

The patch also fixes a cached_property.__get__ crash in vllm 0.19.0+ pydantic dataclasses by using __dict__ access instead of attribute access.

Stage configs

hunyuan_image3_i2t.yaml: Image-to-Text — single LLM stage, TP4, gpu_memory_utilization=0.95
hunyuan_image3_t2t.yaml: Text-to-Text — single LLM stage for pure text generation

For IT2I, a test script has been added, but full end-to-end validation is not yet possible. The current DiT side does not fully consume or validate the AR-produced content path yet, so this PR only lands the bridging logic and test scaffolding for follow-up integration.

AR→DiT bridge for IT2I

stage_input_processors/hunyuan_image3.py provides ar2diffusion(): takes the AR stage's output (latent token IDs + prompt text), decodes the token IDs back to continuous latent vectors via the AR model's embedding table, concatenates with text embeddings from the tokenizer, and packages everything into the format DiT expects.

I2T output alignment with official HF model

Verified vLLM-Omni I2T output against the official tencent/HunyuanImage-3.0-Instruct HF model (greedy decoding, 4×H800, bf16, SDPA):

prompt_utils.py: aligned build_prompt with official instruct template (\n\nUser: ... \n\nAssistant: format, trigger_tag after Assistant)
hunyuan_image3.py: removed <answer>/</answer> from blocked_token_ids in comprehension mode — the model naturally generates these, blocking them breaks output
i2t.yaml / t2t.yaml: temperature 0.6→0.0 (greedy), added 128026 (</answer>) to stop_token_ids

Test Plan

GPU: 4 × H800 (80GB), TP4

pytest tests/e2e/offline_inference/test_hunyuanimage3_i2t.py -v -m advanced_model
pytest tests/e2e/offline_inference/test_hunyuanimage3_t2i.py -v -m advanced_model

Note: test files are in the follow-up PR. Tests were run against the full stack on the GPU server.

Accuracy Verification

The official HunyuanImage-3.0-Instruct model itself is non-deterministic across processes under bf16 multi-GPU inference. We established this baseline first to set realistic expectations for vLLM-Omni alignment.

Official HF model cross-process baseline

Same code, same device_map, two separate runs:

Scenario	Result
Intra-process (two consecutive `model.generate`)	466 tokens, 100% match
Cross-process (model reload, fixed `device_map`)	baseline 638 tokens vs verify 458 tokens, first 34 match (~5% agreement)

Root cause: bf16 NCCL all-reduce floating-point accumulation order is non-deterministic across processes; greedy argmax amplifies tiny numerical differences into token-level divergence.

vLLM-Omni vs official HF

I2T (Image-to-Text):

input_ids: exact match (6364 tokens)
Output: semantically correct, structurally coherent (accurately describes image content)
First 30+ output tokens match the official model

T2I AR (Text-to-Image, AR stage only):

Official: 908 tokens, vLLM-Omni: 920 tokens
First 121 tokens identical, then diverge (187/908 = 20.6% overall match)
This is significantly better than the official model's own cross-process reproducibility (~5%)

Alignment criteria

input_ids exactly match the official model
First 30+ output tokens match
Output is semantically correct and structurally coherent
Token-level exact match is not expected — this is inherent to bf16 multi-GPU inference, not a code bug

chatgpt-codex-connector · 2026-04-13T02:53:54Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

hsliuustc0106 · 2026-04-13T09:27:49Z

WIP PR marked with 【wip】prefix. Preliminary comments only:

Pre-commit and build checks not visible in status rollup. Ensure these pass before requesting full review.
Test plan mentions 'test files are in follow-up PR'. For a feature PR adding model support, tests should be in the same PR unless there's a specific reason for splitting.
994 LOC across 21 files is substantial. Consider running L3 tests locally and adding results once WIP status is removed:

https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/test_guide/#l3-level--l4-level

Full review when WIP status removed and gates pass.

…, stage configs Add custom sampler with logits processors for AR stage transitions. Ports official _StageTransitionLogitsProcessor and _ConditionalSliceVocabLogitsProcessor into sample() with prefer_model_sampler=True, enabling sampling-based decoding (temperature=0.6, top_k=1024, top_p=0.95) with correct think→recaption→ratio stage transitions. - hunyuan_image3.py: custom sample() with stage transition, ratio restriction, comprehension token blocking, ratio EOS forcing - patch.py: extend is_mm_prefix_lm for bidirectional attention on image tokens (hunyuan_image_3_moe model type). Use __dict__ access for cached_property compat with vllm 0.19.0+ pydantic dataclasses - Stage configs: hunyuan_image3_i2t.yaml (single LLM, TP4), hunyuan_image3_it2i.yaml (2-stage AR→DiT), hunyuan_image3_t2t.yaml - stage_input_processors/hunyuan_image3.py: ar2diffusion() bridge - Delete hunyuan_image3_moe.yaml (replaced by split per-task configs) - Update test_hunyuanimage3_text2img.py to use hunyuan_image3_t2i.yaml Signed-off-by: TaffyOfficial <2324465096@qq.com>

- build_prompt: add instruct template (\n\nUser: ...\n\nAssistant: ) - hunyuan_image3.py: unblock <answer>/<\/answer> tokens so model can follow its natural generation pattern - i2t/t2t YAML: temperature=0.0 for greedy decoding, add </answer> (128026) to stop_token_ids Verified on 4xH800: input_ids match official baseline exactly (6364 tokens), greedy output is self-consistent within same process. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: TaffyOfficial <2324465096@qq.com>

hsliuustc0106 · 2026-04-15T08:43:24Z

@nussejzz PTAL

hsliuustc0106

Missing hunyuan_image3_t2i.yaml -- the test change references a file that doesn't exist in this PR.

hsliuustc0106 · 2026-04-15T09:03:03Z

 LOCAL_CLIP_PATH = "openai/clip-vit-base-patch32"
 REPO_ROOT = Path(__file__).resolve().parents[3]
-STAGE_CONFIG_PATH = REPO_ROOT / "vllm_omni" / "model_executor" / "stage_configs" / "hunyuan_image3_moe.yaml"
+STAGE_CONFIG_PATH = REPO_ROOT / "vllm_omni" / "model_executor" / "stage_configs" / "hunyuan_image3_t2i.yaml"


hunyuan_image3_t2i.yaml doesn't exist in this PR. The deleted hunyuan_image3_moe.yaml is replaced by i2t/it2i/t2t configs, but there's no t2i config for pure text-to-image. This test will fail with FileNotFoundError.

This file has been merged into the library along with # 2712, #2712

hsliuustc0106 · 2026-04-15T09:03:03Z

+        logits[req_idx].fill_(min_score)
+        logits[req_idx, max_id] = 0
+
+    def _clear_transition_state(self, req_idx: int) -> None:


_clear_transition_state is defined but never called. With max_num_seqs=1 this is harmless (req_idx=0 gets reused), but it will leak entries if batching is ever enabled. Can you hook it into the request-finish path?

I have now deleted _transition state and made the phase transition logic stateless, so there is no need to clean up every request state when the request is completed. The next mandatory token is derived from the token history decoded at each step.

hsliuustc0106 · 2026-04-15T09:03:03Z

+            ]
+
+        self._sampler: Sampler | None = None
+        self._eos_token_id: int = 127957  # <|endoftext|>


Hardcoded EOS token ID. Should this come from tokenizer.eos_token_id or the HF config? If the tokenizer changes this will silently break.

I switched this to tokenizer.eos_token_id

hsliuustc0106 · 2026-04-15T09:03:03Z

+_orig_cp = _OriginalModelConfig.__dict__.get("is_mm_prefix_lm")
+if _orig_cp is not _patched_cp:
+    # Our assignment above should have replaced it, but just in case
+    pass


Dead code -- _patched_cp was already assigned above, so this branch is never taken. Remove it.

…zer eos_token_id, hook _clear_transition_state Signed-off-by: TaffyOfficial <2324465096@qq.com>

… devices, harden patch.py comments Signed-off-by: TaffyOfficial <2324465096@qq.com>

TaffyOfficial · 2026-04-15T10:14:36Z

@hsliuustc0106 update now

Kyr1e666 · 2026-04-16T08:31:37Z

nice work, when will this pr be merged?

amy-why-3459 · 2026-04-16T10:35:27Z

+                images = mm_data.get("images")
+                if images:
+                    pil_image = images[0] if isinstance(images, list) else images
+            if pil_image is not None:


Can we use else here directly?

This follows the same pattern as glm_image.py (L249-256). The multimodal data may arrive as "image" (single PIL Image) or "images"(list), depending on how the input was constructed. The fallback handles both formats. We can't use a simple else here because pil_image may come from either source, and the final guard covers both paths.

…ne_args in it2i.yaml Signed-off-by: TaffyOfficial <2324465096@qq.com>

hsliuustc0106

Missing tests — regression tests for the core AR sampler logic (stage transitions, ratio restriction, comprehension blocking) should ship with this PR, not a follow-up.

hsliuustc0106 · 2026-04-16T10:58:43Z

+
+        for req_idx in range(logits.shape[0]):
+            decoded_tokens: list[int] = (
+                sampling_metadata.output_token_ids[req_idx] if req_idx < len(sampling_metadata.output_token_ids) else []


sample() loops per-request over logits.shape[0] with in-place mutation. Fine with max_num_seqs: 1 (which all YAML configs use), but the method signature implies batch support it doesn't correctly handle. Add an assertion or document the constraint.

Added assert logits.shape[0] == 1 at the top of sample(). All stage configs enforce max_num_seqs: 1; this makes the constraint explicit and fails loudly if violated.

hsliuustc0106 · 2026-04-16T10:58:43Z

+        or history has diverged from the expected forced sequence.
+        """
+        for i in range(len(decoded_tokens) - 1, -1, -1):
+            trigger = decoded_tokens[i]


_get_forced_token scans all decoded tokens backwards every step — O(n²) across decode steps. Acceptable now given short generation lengths, but track for optimization if sequence lengths grow.

Acknowledged. Generation lengths for this model are bounded (~900 tokens for T2I AR, ~2048 for I2T), so the scan cost is negligible in practice. Added a note in the docstring. If sequence lengths grow significantly we can cache the last trigger position

hsliuustc0106 · 2026-04-16T10:58:43Z

+        return True
+    model_type = getattr(self.hf_config, "model_type", "")
+    return model_type in _OMNI_MM_PREFIX_LM_MODELS
+


If this __set_name__ dance fails silently (e.g., vLLM changes the descriptor), the model falls back to unpatched is_mm_prefix_lm — bidirectional attention breaks with no error. Add a sanity check at import time that the patch is actually active.

Added an import-time assertion that verifies the patched cached_property is actually installed on ModelConfig. If vLLM changes the descriptor, this will fail at import rather than silently falling back.

TaffyOfficial · 2026-04-16T11:23:50Z

Added tests/diffusion/models/hunyuan_image3/test_hunyuan_image3_sampler.py with regression tests for all core sampler paths:

_get_forced_token stage transitions: trigger → forced sequence, partial completion, full completion, diverged history stops forcing, later trigger takes precedence
Comprehension mode blocking (I2T/T2T): image-generation tokens (, <img_size_*>, ratio tokens) are masked to -inf, text token unaffected
Ratio restriction: after <img_size_*>, only ratio tokens (<img_ratio_0>–<img_ratio_32> + extras) retain their logits, all other vocab masked
Force EOS after ratio: once a ratio token is selected, only EOS is allowed — prevents the ratio token loop that was fixed earlier

… unit tests Signed-off-by: TaffyOfficial <2324465096@qq.com>

TaffyOfficial · 2026-04-16T11:47:51Z

@hsliuustc0106 update now

Kyr1e666 · 2026-04-17T07:11:50Z

hi can you give a example for hunyuan-image3-instruct it2i vllm_omni infer? thank you!

TaffyOfficial · 2026-04-17T07:20:37Z

hi can you give a example for hunyuan-image3-instruct it2i vllm_omni infer? thank you!

The AR-to-DIT connection hasn't been established yet. We need to wait for #2590 to be merged before the IT2I process can actually proceed.

Signed-off-by: TaffyOfficial <2324465096@qq.com> Co-authored-by: TaffyOfficial <2324465096@qq.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

TaffyOfficial requested a review from hsliuustc0106 as a code owner April 13, 2026 02:53

TaffyOfficial changed the title ~~Feat/hunyuan image3 model~~ 【wip】Feat/Add HunyuanImage-3.0-Instruct model support: Apr 13, 2026

TaffyOfficial changed the title ~~【wip】Feat/Add HunyuanImage-3.0-Instruct model support:~~ Feat/Add HunyuanImage-3.0-Instruct model support: Apr 13, 2026

TaffyOfficial force-pushed the feat/hunyuan-image3-model branch 2 times, most recently from bbe22a2 to 98f58dc Compare April 13, 2026 11:49

TaffyOfficial changed the title ~~Feat/Add HunyuanImage-3.0-Instruct model support:~~ 【wip】Feat/Add HunyuanImage-3.0-Instruct model support: Apr 15, 2026

TaffyOfficial changed the title ~~【wip】Feat/Add HunyuanImage-3.0-Instruct model support:~~ 【wip】Feat/Add HunyuanImage-3.0-Instruct ar part support: Apr 15, 2026

TaffyOfficial force-pushed the feat/hunyuan-image3-model branch from a2b8592 to 08b4274 Compare April 15, 2026 04:21

TaffyOfficial changed the title ~~【wip】Feat/Add HunyuanImage-3.0-Instruct ar part support:~~ Feat/Add HunyuanImage-3.0-Instruct ar part support: Apr 15, 2026

TaffyOfficial force-pushed the feat/hunyuan-image3-model branch 2 times, most recently from bbbae43 to 7efb959 Compare April 15, 2026 07:44

TaffyOfficial force-pushed the feat/hunyuan-image3-model branch from 7efb959 to 2b7e4ab Compare April 15, 2026 07:56

hsliuustc0106 requested changes Apr 15, 2026

View reviewed changes

TaffyOfficial added 2 commits April 15, 2026 17:18

fix(hunyuan_image3): address PR review - remove dead code, use tokeni…

d9d5136

…zer eos_token_id, hook _clear_transition_state Signed-off-by: TaffyOfficial <2324465096@qq.com>

fix(hunyuan_image3): simplify transition state to single-request, fix…

e1d7bab

… devices, harden patch.py comments Signed-off-by: TaffyOfficial <2324465096@qq.com>

TaffyOfficial force-pushed the feat/hunyuan-image3-model branch from 49002a7 to e1d7bab Compare April 15, 2026 10:11

Gaohan123 added this to the v0.20.0 milestone Apr 15, 2026

hsliuustc0106 added the ready label to trigger buildkite CI label Apr 16, 2026

amy-why-3459 reviewed Apr 16, 2026

View reviewed changes

fix(hunyuan_image3): remove LLM-only fields from diffusion stage engi…

e18e920

…ne_args in it2i.yaml Signed-off-by: TaffyOfficial <2324465096@qq.com>

hsliuustc0106 reviewed Apr 16, 2026

View reviewed changes

fix(hunyuan_image3): add batch assertion, patch sanity check, sampler…

fffba50

… unit tests Signed-off-by: TaffyOfficial <2324465096@qq.com>

TaffyOfficial force-pushed the feat/hunyuan-image3-model branch from 823d247 to fffba50 Compare April 16, 2026 11:27

hsliuustc0106 approved these changes Apr 16, 2026

View reviewed changes

hsliuustc0106 merged commit c3ca5da into vllm-project:main Apr 16, 2026
8 checks passed

mephisto1484 mentioned this pull request Apr 21, 2026

[Bug]: FP8 quantization fails with multi-dimensional tensor in HunyuanImage-3.0 IT2I AR stage #2976

Open

1 task

Conversation

TaffyOfficial commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Custom sampler with logits processors

Bidirectional attention for image tokens

Stage configs

AR→DiT bridge for IT2I

I2T output alignment with official HF model

Test Plan

Accuracy Verification

Official HF model cross-process baseline

vLLM-Omni vs official HF

Alignment criteria

Uh oh!

chatgpt-codex-connector Bot commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 13, 2026

Uh oh!

hsliuustc0106 commented Apr 15, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TaffyOfficial Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TaffyOfficial commented Apr 15, 2026

Uh oh!

Kyr1e666 commented Apr 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TaffyOfficial commented Apr 16, 2026

Uh oh!

TaffyOfficial commented Apr 16, 2026

Uh oh!

Uh oh!

Kyr1e666 commented Apr 17, 2026

Uh oh!

TaffyOfficial commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

TaffyOfficial commented Apr 13, 2026 •

edited

Loading

TaffyOfficial Apr 15, 2026 •

edited

Loading