Skip to content

[HunyuanImage3] Align system_prompt support with official implementation#2270

Merged
gcanlin merged 1 commit intovllm-project:mainfrom
Semmer2:feat/hunyuan-system-prompt-align
Apr 7, 2026
Merged

[HunyuanImage3] Align system_prompt support with official implementation#2270
gcanlin merged 1 commit intovllm-project:mainfrom
Semmer2:feat/hunyuan-system-prompt-align

Conversation

@skf-1999
Copy link
Copy Markdown
Contributor

@skf-1999 skf-1999 commented Mar 27, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Add system prompt support for HunyuanImage to align with the official implementation. This PR introduces command line arguments --use-system-prompt and --system-prompt to allow users to specify different system prompt presets or custom prompts when generating images.

Test Plan

New Arguments

Argument Description Options Default
--use-system-prompt System prompt preset type None, dynamic, en_vanilla, en_recaption, en_think_recaption, en_unified, custom en_unified
--system-prompt Custom system prompt text Any string None (used when --use-system-prompt=custom)

Usage Examples

Offline Inference:

python3 vllm-omni/examples/offline_inference/text_to_image/text_to_image.py \
  --model /data/HunyuanImage-3.0/ \
  --prompt "a cute cat" \
  --output dog.png \
  --num-inference-steps 50 \
  --guidance-scale 5.0 \
  --cfg-scale 4.0 \
  --seed 1234 \
  --tensor-parallel-size 8 \
  --use-system-prompt "en_unified"

Online API (curl):

curl -X POST http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A brown and white dog is running on the grass",
    "use_system_prompt": "en_unified",
    "num_inference_steps": 50,
    "n": 4,
    "size": "1024x1024",
    "seed": 1234
  }' | jq -r '.data[0].b64_json' | base64 -d > online_test.png

Test Script Location:

Requires two model weights: "tencent/HunyuanImage-3.0" and "openai/clip-vit-base-patch32".
pytest tests/e2e/offline_inference/test_hunyuanimage3_text2img.py -v -s
CLIP Embedding Image Similarity Comparison
Test Configuration:
Prompt: "A brown and white dog is running on the grass"
Seed: 1234
Metric: CLIP Cosine Similarity between Baseline (Official Source) and PR Implementation

Test Result:

System Prompt Baseline (Official) PR Implementation CLIP Score
None
(seed=1234)
0.994987
en_recaption 0.997418
en_think_recaption 0.999098
en_vanilla 0.998555
en_unified 0.999665
dynamic
(same as en_vanilla when bot_task=image)
0.998555
Note: dynamic produces identical results to en_vanilla when bot_task = image.

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please paste the results comparison before and after, or the e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

@skf-1999 skf-1999 requested a review from hsliuustc0106 as a code owner March 27, 2026 10:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1dba8d27d9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tests/test_system_prompt.py Outdated
cosine_sim = torch.dot(EXPECTED_EMBEDDING, features).item()
return cosine_sim

print(compare_semantic(SYSTEM_EN_UNIFIED, IMAGE_PATH))# example for en_unified
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid running benchmark code at test import time

Because this file is named test_system_prompt.py, pytest will execute module-level statements during collection, and this top-level call runs compare_semantic immediately against a hard-coded /data/your_system_prompt_en_unified.png. In normal CI/local environments that path is absent, so collection fails before any test runs, effectively breaking the offline inference test suite instead of providing a real test.

Useful? React with 👍 / 👎.

return system_prompt
# Unsupported type: raise NotImplementedError
else:
raise NotImplementedError(f"Unsupported system prompt type: {sys_type}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise a validation error for unknown system prompt types

Unsupported use_system_prompt values currently raise NotImplementedError, but the API request model accepts any string and /v1/images/generations only maps ValueError to a 400 response. That means a simple client typo (for example, en-unified) is surfaced as a 500 internal server error rather than a user-facing validation error, which is misleading and harder to debug.

Useful? React with 👍 / 👎.

@skf-1999 skf-1999 force-pushed the feat/hunyuan-system-prompt-align branch from 1dba8d2 to 486f3cb Compare March 27, 2026 11:40
Copy link
Copy Markdown
Collaborator

@hsliuustc0106 hsliuustc0106 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏗️ Architecture Review

Thank you for this well-structured feature PR! The design is clean and aligns well with the existing codebase. However, there are a few blocking issues that need to be addressed before merging.


🔴 Blocking Issues

1. Pre-commit Check Failed ❌

The pre-commit CI check failed due to code formatting issues:

  • `all` formatting in `system_prompt.py`
  • Field definition formatting in `images.py`

Fix:
```bash
pre-commit run --all-files
git commit --amend
```

2. Missing Automated Tests ⚠️

The file `tests/e2e/offline_inference/test_system_prompt.py` is a manual testing tool with hardcoded paths (`/data/your_system_prompt_en_unified.png`), not an automated test suite.

Required:

  • Add unit tests for different system prompt types (`None`, `dynamic`, `en_vanilla`, etc.)
  • Add API integration tests for the `/v1/images/generations` endpoint
  • Reference existing tests in `tests/e2e/` for patterns

Example:
```python

tests/test_system_prompt.py

def test_system_prompt_types():
"""Test different system prompt types"""
for prompt_type in ["None", "dynamic", "en_vanilla", "en_recaption",
"en_think_recaption", "en_unified", "custom"]:
result = get_system_prompt(prompt_type, "image")
assert result is not None or prompt_type == "None"

def test_api_integration():
"""Test API endpoint with system prompt"""
# Test with vllm test client
```

3. Documentation Incomplete 📚

The PR description checklist shows documentation updates are unchecked:

  • Model support table not updated
  • Usage documentation missing
  • API documentation not updated

Required:

  • Update `docs/models/supported_models.md` with system prompt feature
  • Add usage examples in `docs/`
  • Update API documentation

⚠️ Non-Blocking Suggestions

1. Add Parameter Validation (Correctness)

Consider adding validation for `use_system_prompt` parameter:

```python

In images.py

from pydantic import field_validator

@field_validator('use_system_prompt')
def validate_system_prompt_type(cls, v):
valid_types = ["None", "dynamic", "en_vanilla", "en_recaption",
"en_think_recaption", "en_unified", "custom"]
if v is not None and v not in valid_types:
raise ValueError(f"Invalid use_system_prompt. Must be one of: {valid_types}")
return v
```

2. Improve Error Handling (Reliability)

```python

In system_prompt.py

import logging

logger = logging.getLogger(name)

def get_system_prompt(sys_type, bot_task, system_prompt=None):
try:
# Existing logic
...
except Exception as e:
logger.warning(f"Failed to get system prompt: {e}")
return None # Graceful degradation
```


✅ What Works Well

  1. Modular Design: `system_prompt.py` is well-encapsulated with single responsibility
  2. Parameter Passing: Clean use of `extra_args` following existing patterns
  3. Flexibility: Multiple modes (vanilla, recaption, think_recaption, unified) provide good flexibility
  4. Official Alignment: Aligns with HunyuanImage official implementation

📊 Summary

BLOCKER scan:

  • ✅ Correctness: PASS
  • ✅ Reliability/Safety: PASS
  • ✅ Breaking Changes: PASS
  • ❌ Test Coverage: Missing automated tests
  • ❌ Documentation: Incomplete
  • ✅ Security: PASS

OVERALL: 2 BLOCKERS FOUND

VERDICT: REQUEST_CHANGES


Once the blocking issues are resolved, this PR will be ready for approval. The architecture and design are solid! 🏗️

Architecture Score: 3/5 - Good design, but needs test and doc improvements

@skf-1999 skf-1999 force-pushed the feat/hunyuan-system-prompt-align branch 3 times, most recently from e451c05 to ef884bd Compare March 28, 2026 08:37
gen_params = OmniDiffusionSamplingParams(num_outputs_per_prompt=request.n)

extra_args = {}
if request.use_system_prompt is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to put it into prompt ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not—system prompts bypass CFG dropping while user prompts participate in it, so behavior definitions should stay in system.

Comment thread tests/test_system_prompt.py Outdated
], dtype=torch.float32)

LOCAL_CLIP_PATH = "openai/clip-vit-base-patch32"
IMAGE_PATH= "/data/your_system_prompt_en_unified_picture"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add to ci, and it run automically.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful suggestion. I’ll adjust the code to make it automated.

if bot_task == "think":
return t2i_system_prompts["en_think_recaption"][0]
# Recaption task: use recaption prompt
elif bot_task == "recaption":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to support think & recaption in DiT stage? In my opinion think & recaption needs to be completed in AR stage.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only image is supported; other forms of bot-task are not supported.

@skf-1999 skf-1999 force-pushed the feat/hunyuan-system-prompt-align branch 4 times, most recently from e224d19 to 5909b49 Compare March 31, 2026 07:20
@skf-1999 skf-1999 requested a review from hsliuustc0106 March 31, 2026 07:55
# Below are the CLIP embedding tensors from the official HunyuanImage model (seed=1234, prompt: "A brown and white dog is running on the grass").
# SEED_1234 denotes the output without system prompt, while the remaining entries correspond to outputs generated with different system prompts.

SEED_1234 = torch.tensor(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yenuo26 @congw729 Can we upload these tensors to remote repo and download them every time executing test to avoid introduce too many code lines?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can later plan a dedicated directory to store these files.

@skf-1999 skf-1999 force-pushed the feat/hunyuan-system-prompt-align branch from 0d10619 to f869766 Compare March 31, 2026 11:38
if request.system_prompt is not None:
extra_args["system_prompt"] = request.system_prompt
if extra_args:
gen_params.extra_args = extra_args
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is introducing a new field in generation API. Before, we haven't collect all non-standard parameters into extra_args. If we plan to do that, should we also collect other non-standard parameters like layers, which is only for Qwen-Image-Layered? @SamitHuang @wtomin

@@ -1,3 +1,4 @@
# ruff: noqa: E501
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we add this line?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix that right away.

@skf-1999 skf-1999 force-pushed the feat/hunyuan-system-prompt-align branch 2 times, most recently from 04c8e38 to d783f60 Compare April 1, 2026 08:19
Comment thread docs/design/feature/expert_parallel.md Outdated
--guidance_scale 5.0 \
--tensor-parallel-size 8 \
--seed 1234 \
--enable_expert_parallel
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should unify to --enable-expert-parallel

lora_name: LoRA name (optional, defaults to path stem)
lora_scale: LoRA scale factor (default: 1.0)
lora_int_id: LoRA integer ID (optional, derived from path if not provided)
use_system_prompt: System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to add the usage of these fields in Hunyuan-Image doc.


# vllm-omni extensions for diffusion control
negative_prompt: str | None = Field(default=None, description="Text describing what to avoid in the image")
system_prompt: str | None = Field(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need a validator for this new field.

@field_validator('use_system_prompt')
def validate_system_prompt_type(cls, v):
    valid_types = ["None", "dynamic", "en_vanilla", "en_recaption", 
                   "en_think_recaption", "en_unified", "custom"]
    if v is not None and v not in valid_types:
        raise ValueError(f"Invalid use_system_prompt type: {v}")
    return v

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, will revise promptly.

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late review. Just need to fix these small suggestions. Others look good to me. Thanks!

system_prompt = extra_args.get("system_prompt")
if use_system_prompt is not None:
system_prompt = get_system_prompt(use_system_prompt, "image", system_prompt)
system_prompt = system_prompt.strip() if system_prompt is not None else ""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
system_prompt = system_prompt.strip() if system_prompt is not None else ""
system_prompt = system_prompt.strip() if system_prompt is not None else None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part we should keep same with official hunyuan: in HunyuanImage-3.0/hunyuan_image_3/modeling_hunyuan_image_3.py system_prompt = system_prompt.strip() if system_prompt is not None else ""

PROMPT = "A brown and white dog is running on the grass"
MODEL_NAME = "tencent/HunyuanImage-3.0"
LOCAL_CLIP_PATH = "openai/clip-vit-base-patch32"
REPO_ROOT = Path(__file__).resolve().parents[1]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
REPO_ROOT = Path(__file__).resolve().parents[1]
REPO_ROOT = Path(__file__).resolve().parents[3]

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

Good catch! This is a limitation of the current implementation.

Current scope: This PR focuses on image generation (bot_task="image") to align with the basic official functionality.

Future work: Support for "think" and "recaption" modes will be added in a follow-up PR.

Does the official HunyuanImage-3.0 implementation use these modes (think/recaption) dynamically? If so, we can prioritize adding this feature.

REPO_ROOT = Path(__file__).resolve().parents[1]
STAGE_CONFIG_PATH = REPO_ROOT / "vllm_omni" / "model_executor" / "stage_configs" / "hunyuan_image_3_moe.yaml"

pytestmark = [pytest.mark.advanced_model, pytest.mark.diffusion, pytest.mark.cuda, pytest.mark.gpu]
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yenuo26 Could you help check whether this mark is appropriate? It seems that pytest.mark.cuda, pytest.mark.gpu are redundant. Should it be like:

Suggested change
pytestmark = [pytest.mark.advanced_model, pytest.mark.diffusion, pytest.mark.cuda, pytest.mark.gpu]
pytestmark = [pytest.mark.advanced_model, pytest.mark.diffusion]

Copy link
Copy Markdown
Collaborator

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple comments. the system prompt module itself looks fine.

Comment thread docs/design/feature/expert_parallel.md Outdated
@@ -0,0 +1,221 @@
# Expert Parallel
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is unrelated to the system prompt feature. Please split it into a separate PR — mixing unrelated changes makes review and bisect harder.

type=str,
default=None,
help=(
"System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified. "
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: --use-system-prompt accepts any string but only a handful of values are valid. Consider using choices=[...] so argparse catches typos.

Suggested change
"System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified. "
parser.add_argument(
"--use-system-prompt",
type=str,
default=None,
choices=["None", "dynamic", "en_vanilla", "en_recaption",
"en_think_recaption", "en_unified", "custom"],
help="System prompt preset for generation. Recommended: en_unified.",
)

@@ -992,10 +993,15 @@
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is a bit convoluted — hasattr guard + getattr default do the same thing. Simpler:

Suggested change
@@ -992,10 +993,15 @@ def forward(
extra_args = getattr(getattr(req, "sampling_params", None), "extra_args", {}) or {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, addressed.

@skf-1999 skf-1999 force-pushed the feat/hunyuan-system-prompt-align branch 2 times, most recently from 5f9178a to bae05ea Compare April 3, 2026 07:43
@skf-1999
Copy link
Copy Markdown
Contributor Author

skf-1999 commented Apr 3, 2026

Good catch! This is a limitation of the current implementation.

Current scope: This PR focuses on image generation (bot_task="image") to align with the basic official functionality.

Future work: Support for "think" and "recaption" modes will be added in a follow-up PR.

Does the official HunyuanImage-3.0 implementation use these modes (think/recaption) dynamically? If so, we can prioritize adding this feature.

According to the official documentation, these modes are supported, but I haven't fully verified this with the official implementation yet (may require HunyuanImage-3.0-Instruct weights).

@gcanlin gcanlin added the ready label to trigger buildkite CI label Apr 3, 2026
@gcanlin gcanlin dismissed hsliuustc0106’s stale review April 3, 2026 07:58

All issues hsliuustc0106 mentioned before have been fixed.

@gcanlin gcanlin enabled auto-merge (squash) April 3, 2026 07:58
@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Apr 3, 2026

@hsliuustc0106 I dismissed your requested change review because all issues you mentioned have been fixed. Let's wait CI happy and merge this PR ASAP.

Signed-off-by: skf1999 <13234016272@163.com>
auto-merge was automatically disabled April 3, 2026 08:09

Head branch was pushed to by a user without write access

@skf-1999 skf-1999 force-pushed the feat/hunyuan-system-prompt-align branch from bae05ea to eed5d64 Compare April 3, 2026 08:09
@congw729
Copy link
Copy Markdown
Collaborator

congw729 commented Apr 3, 2026

@gcanlin @hsliuustc0106 All CI passed. This PR is okay for merge.

@gcanlin gcanlin merged commit 9584dd6 into vllm-project:main Apr 7, 2026
8 checks passed
vraiti pushed a commit to vraiti/vllm-omni that referenced this pull request Apr 9, 2026
bob-021206 pushed a commit to jasonlee-1024/vllm-omni that referenced this pull request Apr 21, 2026
…ion (vllm-project#2270)

Signed-off-by: skf1999 <13234016272@163.com>
Signed-off-by: bob-021206 <binyan_github@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants