[HunyuanImage3] Align system_prompt support with official implementation#2270
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1dba8d27d9
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cosine_sim = torch.dot(EXPECTED_EMBEDDING, features).item() | ||
| return cosine_sim | ||
|
|
||
| print(compare_semantic(SYSTEM_EN_UNIFIED, IMAGE_PATH))# example for en_unified |
There was a problem hiding this comment.
Avoid running benchmark code at test import time
Because this file is named test_system_prompt.py, pytest will execute module-level statements during collection, and this top-level call runs compare_semantic immediately against a hard-coded /data/your_system_prompt_en_unified.png. In normal CI/local environments that path is absent, so collection fails before any test runs, effectively breaking the offline inference test suite instead of providing a real test.
Useful? React with 👍 / 👎.
| return system_prompt | ||
| # Unsupported type: raise NotImplementedError | ||
| else: | ||
| raise NotImplementedError(f"Unsupported system prompt type: {sys_type}") |
There was a problem hiding this comment.
Raise a validation error for unknown system prompt types
Unsupported use_system_prompt values currently raise NotImplementedError, but the API request model accepts any string and /v1/images/generations only maps ValueError to a 400 response. That means a simple client typo (for example, en-unified) is surfaced as a 500 internal server error rather than a user-facing validation error, which is misleading and harder to debug.
Useful? React with 👍 / 👎.
1dba8d2 to
486f3cb
Compare
hsliuustc0106
left a comment
There was a problem hiding this comment.
🏗️ Architecture Review
Thank you for this well-structured feature PR! The design is clean and aligns well with the existing codebase. However, there are a few blocking issues that need to be addressed before merging.
🔴 Blocking Issues
1. Pre-commit Check Failed ❌
The pre-commit CI check failed due to code formatting issues:
- `all` formatting in `system_prompt.py`
- Field definition formatting in `images.py`
Fix:
```bash
pre-commit run --all-files
git commit --amend
```
2. Missing Automated Tests ⚠️
The file `tests/e2e/offline_inference/test_system_prompt.py` is a manual testing tool with hardcoded paths (`/data/your_system_prompt_en_unified.png`), not an automated test suite.
Required:
- Add unit tests for different system prompt types (`None`, `dynamic`, `en_vanilla`, etc.)
- Add API integration tests for the `/v1/images/generations` endpoint
- Reference existing tests in `tests/e2e/` for patterns
Example:
```python
tests/test_system_prompt.py
def test_system_prompt_types():
"""Test different system prompt types"""
for prompt_type in ["None", "dynamic", "en_vanilla", "en_recaption",
"en_think_recaption", "en_unified", "custom"]:
result = get_system_prompt(prompt_type, "image")
assert result is not None or prompt_type == "None"
def test_api_integration():
"""Test API endpoint with system prompt"""
# Test with vllm test client
```
3. Documentation Incomplete 📚
The PR description checklist shows documentation updates are unchecked:
- Model support table not updated
- Usage documentation missing
- API documentation not updated
Required:
- Update `docs/models/supported_models.md` with system prompt feature
- Add usage examples in `docs/`
- Update API documentation
⚠️ Non-Blocking Suggestions
1. Add Parameter Validation (Correctness)
Consider adding validation for `use_system_prompt` parameter:
```python
In images.py
from pydantic import field_validator
@field_validator('use_system_prompt')
def validate_system_prompt_type(cls, v):
valid_types = ["None", "dynamic", "en_vanilla", "en_recaption",
"en_think_recaption", "en_unified", "custom"]
if v is not None and v not in valid_types:
raise ValueError(f"Invalid use_system_prompt. Must be one of: {valid_types}")
return v
```
2. Improve Error Handling (Reliability)
```python
In system_prompt.py
import logging
logger = logging.getLogger(name)
def get_system_prompt(sys_type, bot_task, system_prompt=None):
try:
# Existing logic
...
except Exception as e:
logger.warning(f"Failed to get system prompt: {e}")
return None # Graceful degradation
```
✅ What Works Well
- Modular Design: `system_prompt.py` is well-encapsulated with single responsibility
- Parameter Passing: Clean use of `extra_args` following existing patterns
- Flexibility: Multiple modes (vanilla, recaption, think_recaption, unified) provide good flexibility
- Official Alignment: Aligns with HunyuanImage official implementation
📊 Summary
BLOCKER scan:
- ✅ Correctness: PASS
- ✅ Reliability/Safety: PASS
- ✅ Breaking Changes: PASS
- ❌ Test Coverage: Missing automated tests
- ❌ Documentation: Incomplete
- ✅ Security: PASS
OVERALL: 2 BLOCKERS FOUND
VERDICT: REQUEST_CHANGES
Once the blocking issues are resolved, this PR will be ready for approval. The architecture and design are solid! 🏗️
Architecture Score: 3/5 - Good design, but needs test and doc improvements
e451c05 to
ef884bd
Compare
| gen_params = OmniDiffusionSamplingParams(num_outputs_per_prompt=request.n) | ||
|
|
||
| extra_args = {} | ||
| if request.use_system_prompt is not None: |
There was a problem hiding this comment.
Better to put it into prompt ?
There was a problem hiding this comment.
Better not—system prompts bypass CFG dropping while user prompts participate in it, so behavior definitions should stay in system.
| ], dtype=torch.float32) | ||
|
|
||
| LOCAL_CLIP_PATH = "openai/clip-vit-base-patch32" | ||
| IMAGE_PATH= "/data/your_system_prompt_en_unified_picture" |
There was a problem hiding this comment.
why not add to ci, and it run automically.
There was a problem hiding this comment.
Very useful suggestion. I’ll adjust the code to make it automated.
| if bot_task == "think": | ||
| return t2i_system_prompts["en_think_recaption"][0] | ||
| # Recaption task: use recaption prompt | ||
| elif bot_task == "recaption": |
There was a problem hiding this comment.
Do we need to support think & recaption in DiT stage? In my opinion think & recaption needs to be completed in AR stage.
There was a problem hiding this comment.
Currently, only image is supported; other forms of bot-task are not supported.
e224d19 to
5909b49
Compare
| # Below are the CLIP embedding tensors from the official HunyuanImage model (seed=1234, prompt: "A brown and white dog is running on the grass"). | ||
| # SEED_1234 denotes the output without system prompt, while the remaining entries correspond to outputs generated with different system prompts. | ||
|
|
||
| SEED_1234 = torch.tensor( |
There was a problem hiding this comment.
Maybe we can later plan a dedicated directory to store these files.
0d10619 to
f869766
Compare
| if request.system_prompt is not None: | ||
| extra_args["system_prompt"] = request.system_prompt | ||
| if extra_args: | ||
| gen_params.extra_args = extra_args |
There was a problem hiding this comment.
This change is introducing a new field in generation API. Before, we haven't collect all non-standard parameters into extra_args. If we plan to do that, should we also collect other non-standard parameters like layers, which is only for Qwen-Image-Layered? @SamitHuang @wtomin
| @@ -1,3 +1,4 @@ | |||
| # ruff: noqa: E501 | |||
There was a problem hiding this comment.
I'll fix that right away.
04c8e38 to
d783f60
Compare
| --guidance_scale 5.0 \ | ||
| --tensor-parallel-size 8 \ | ||
| --seed 1234 \ | ||
| --enable_expert_parallel |
There was a problem hiding this comment.
We should unify to --enable-expert-parallel
| lora_name: LoRA name (optional, defaults to path stem) | ||
| lora_scale: LoRA scale factor (default: 1.0) | ||
| lora_int_id: LoRA integer ID (optional, derived from path if not provided) | ||
| use_system_prompt: System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified. |
There was a problem hiding this comment.
It's better to add the usage of these fields in Hunyuan-Image doc.
|
|
||
| # vllm-omni extensions for diffusion control | ||
| negative_prompt: str | None = Field(default=None, description="Text describing what to avoid in the image") | ||
| system_prompt: str | None = Field( |
There was a problem hiding this comment.
We may need a validator for this new field.
@field_validator('use_system_prompt')
def validate_system_prompt_type(cls, v):
valid_types = ["None", "dynamic", "en_vanilla", "en_recaption",
"en_think_recaption", "en_unified", "custom"]
if v is not None and v not in valid_types:
raise ValueError(f"Invalid use_system_prompt type: {v}")
return v
There was a problem hiding this comment.
Thank you for the suggestion, will revise promptly.
gcanlin
left a comment
There was a problem hiding this comment.
Sorry for late review. Just need to fix these small suggestions. Others look good to me. Thanks!
| system_prompt = extra_args.get("system_prompt") | ||
| if use_system_prompt is not None: | ||
| system_prompt = get_system_prompt(use_system_prompt, "image", system_prompt) | ||
| system_prompt = system_prompt.strip() if system_prompt is not None else "" |
There was a problem hiding this comment.
| system_prompt = system_prompt.strip() if system_prompt is not None else "" | |
| system_prompt = system_prompt.strip() if system_prompt is not None else None |
There was a problem hiding this comment.
This part we should keep same with official hunyuan: in HunyuanImage-3.0/hunyuan_image_3/modeling_hunyuan_image_3.py system_prompt = system_prompt.strip() if system_prompt is not None else ""
| PROMPT = "A brown and white dog is running on the grass" | ||
| MODEL_NAME = "tencent/HunyuanImage-3.0" | ||
| LOCAL_CLIP_PATH = "openai/clip-vit-base-patch32" | ||
| REPO_ROOT = Path(__file__).resolve().parents[1] |
There was a problem hiding this comment.
| REPO_ROOT = Path(__file__).resolve().parents[1] | |
| REPO_ROOT = Path(__file__).resolve().parents[3] |
|
Good catch! This is a limitation of the current implementation. Current scope: This PR focuses on image generation (bot_task="image") to align with the basic official functionality. Future work: Support for "think" and "recaption" modes will be added in a follow-up PR. Does the official HunyuanImage-3.0 implementation use these modes (think/recaption) dynamically? If so, we can prioritize adding this feature. |
| REPO_ROOT = Path(__file__).resolve().parents[1] | ||
| STAGE_CONFIG_PATH = REPO_ROOT / "vllm_omni" / "model_executor" / "stage_configs" / "hunyuan_image_3_moe.yaml" | ||
|
|
||
| pytestmark = [pytest.mark.advanced_model, pytest.mark.diffusion, pytest.mark.cuda, pytest.mark.gpu] |
There was a problem hiding this comment.
@yenuo26 Could you help check whether this mark is appropriate? It seems that pytest.mark.cuda, pytest.mark.gpu are redundant. Should it be like:
| pytestmark = [pytest.mark.advanced_model, pytest.mark.diffusion, pytest.mark.cuda, pytest.mark.gpu] | |
| pytestmark = [pytest.mark.advanced_model, pytest.mark.diffusion] |
lishunyang12
left a comment
There was a problem hiding this comment.
left a couple comments. the system prompt module itself looks fine.
| @@ -0,0 +1,221 @@ | |||
| # Expert Parallel | |||
There was a problem hiding this comment.
This file is unrelated to the system prompt feature. Please split it into a separate PR — mixing unrelated changes makes review and bisect harder.
| type=str, | ||
| default=None, | ||
| help=( | ||
| "System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified. " |
There was a problem hiding this comment.
Nit: --use-system-prompt accepts any string but only a handful of values are valid. Consider using choices=[...] so argparse catches typos.
| "System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified. " | |
| parser.add_argument( | |
| "--use-system-prompt", | |
| type=str, | |
| default=None, | |
| choices=["None", "dynamic", "en_vanilla", "en_recaption", | |
| "en_think_recaption", "en_unified", "custom"], | |
| help="System prompt preset for generation. Recommended: en_unified.", | |
| ) |
| @@ -992,10 +993,15 @@ | |||
There was a problem hiding this comment.
This line is a bit convoluted — hasattr guard + getattr default do the same thing. Simpler:
| @@ -992,10 +993,15 @@ def forward( | |
| extra_args = getattr(getattr(req, "sampling_params", None), "extra_args", {}) or {} |
There was a problem hiding this comment.
Thanks for the suggestion, addressed.
5f9178a to
bae05ea
Compare
According to the official documentation, these modes are supported, but I haven't fully verified this with the official implementation yet (may require HunyuanImage-3.0-Instruct weights). |
All issues hsliuustc0106 mentioned before have been fixed.
|
@hsliuustc0106 I dismissed your |
Signed-off-by: skf1999 <13234016272@163.com>
Head branch was pushed to by a user without write access
bae05ea to
eed5d64
Compare
|
@gcanlin @hsliuustc0106 All CI passed. This PR is okay for merge. |
…ion (vllm-project#2270) Signed-off-by: skf1999 <13234016272@163.com>
…ion (vllm-project#2270) Signed-off-by: skf1999 <13234016272@163.com> Signed-off-by: bob-021206 <binyan_github@163.com>
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Add system prompt support for HunyuanImage to align with the official implementation. This PR introduces command line arguments --use-system-prompt and --system-prompt to allow users to specify different system prompt presets or custom prompts when generating images.
Test Plan
New Arguments
--use-system-promptNone,dynamic,en_vanilla,en_recaption,en_think_recaption,en_unified,customen_unified--system-promptNone(used when--use-system-prompt=custom)Usage Examples
Offline Inference:
Online API (curl):
Test Script Location:
Requires two model weights: "tencent/HunyuanImage-3.0" and "openai/clip-vit-base-patch32".
pytest tests/e2e/offline_inference/test_hunyuanimage3_text2img.py -v -s
CLIP Embedding Image Similarity Comparison
Test Configuration:
Prompt: "A brown and white dog is running on the grass"
Seed: 1234
Metric: CLIP Cosine Similarity between Baseline (Official Source) and PR Implementation
Test Result:
(seed=1234)en_recaptionen_think_recaptionen_vanillaen_unifieddynamic(same as en_vanilla when bot_task=image)
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)