[Feature] HunyuanImage-3.0 IT2I (image editing) support#3107
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00c4135ed4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return rotated.flatten(-2, -1).to(x.dtype) | ||
|
|
||
|
|
||
| def apply_rope_to_qk( |
There was a problem hiding this comment.
Restore Wan rotary class export
This change removes RotaryEmbeddingWan from rope.py, but Wan 2.2 still imports and instantiates that symbol in vllm_omni/diffusion/models/wan2_2/wan2_2_transformer.py. In environments where Wan is loaded, the module import now fails before inference starts, so Wan pipelines become unusable. Please keep RotaryEmbeddingWan exported here (or migrate downstream imports in the same change).
Useful? React with 👍 / 👎.
| "ltx2", | ||
| "pipeline_ltx2_image2video", | ||
| "LTX2I2VDMD2Pipeline", | ||
| ), |
There was a problem hiding this comment.
Re-register LTX2.3 pipelines in diffusion registry
The commit drops LTX23Pipeline and LTX23ImageToVideoPipeline from _DIFFUSION_MODELS (and their post-process entries), so any config using those model_class_name values now fails model initialization with a registry lookup error. Since the LTX-2.3 pipeline implementations are still present in the repo, this is a functional regression in supported model loading.
Useful? React with 👍 / 👎.
|
|
||
| def init_device(self) -> None: | ||
| """Initialize the device and distributed environment.""" | ||
| torch.backends.cudnn.enabled = False |
There was a problem hiding this comment.
Why hardcode torch.backends.cudnn.enabled = False here?
There was a problem hiding this comment.
Indeed, there is quite a bit of redundant code at present.
| # Detect KV-reuse stage configs. Note: the AR prompt layout is now the | ||
| # same Instruct template in both paths (see build_prompt); the flag is | ||
| # only used for informational purposes. | ||
| _kv_reuse_configs = {"hunyuan_image3_it2i_kv_reuse.yaml"} |
There was a problem hiding this comment.
this PR seems unrelated to kv reuse
There was a problem hiding this comment.
The code will be adjusted accordingly.
033ff7a to
c238a8e
Compare
|
We need accuracy test like test_hunyuanimage3_text2img.py ? |
| prompt_images = mm_data.get("images") | ||
| if prompt_images is not None: | ||
| diffusion_input["pil_image"] = prompt_images | ||
| diffusion_input["multi_modal_data"] = {"image": prompt_images} |
There was a problem hiding this comment.
Why these two duplicate fields needed?
There was a problem hiding this comment.
Yes. We remove the changes.
| generated_token_ids = output.cumulative_token_ids | ||
| generated_text = getattr(output, "text", "") or "" | ||
|
|
||
| if not generated_text and generated_token_ids: |
There was a problem hiding this comment.
can we just detokenize in AR stage by setting yaml with detokenize: true?
HunyuanImage3TokenizerFast.apply_general_template uses Assistant: as the bot role prefix in instruct sequence_template (verified by decoding HF prepare_model_inputs output with system_prompt=en_unified + image + bot_task=think: token 72803 = "Assistant"). Switch build_prompt() to use the full word so the AR prefill aligns with the official HF tokenization. Also unify T2T to the same en_unified + Assistant: template (PR vllm-project#3107 reference implementation does the same; the previous T2T-specific branch was a workaround for an earlier prompt-format experiment). Note: BPE merge across user_prompt/Assistant boundary still produces 1 merged token (e.g. "。\n\n" -> single id) where HF apply_chat_template keeps them separate. Full byte-identical alignment requires passing pre-tokenized prompt_token_ids — that path is supported by vllm-omni (OmniTokensPrompt) but not yet plumbed through build_prompt(). Signed-off-by: TaffyOfficial <2324465096@qq.com>
Gaohan123
left a comment
There was a problem hiding this comment.
Please add an e2e function L4 test. Thanks
We will align the precision as soon as possible, streamline the code, and add tests. |
| raise TypeError(f"Unsupported image input type: {type(image)}") | ||
|
|
||
|
|
||
| def _resize_and_crop_center(image: PILImage.Image, target_width: int, target_height: int) -> PILImage.Image: |
There was a problem hiding this comment.
The AR stage also has the same function, HunyuanImage3Processor::process_image. Can we inherit from it or use another approach to reuse that implementation, so their behavior remains consistent?
There was a problem hiding this comment.
Maybe this can be addressed in the next PR
|
Since AR and DiT call different functions to construct the input template (or input token id) (for example, prepare_model_inputs for DiT), Maybe we should add unit tests to verify that their behavior remains consistent. |
- it2i.yaml: switch AR stage to detokenize=true so ar_generated_text comes from the engine; drop the AutoTokenizer fallback in ar2diffusion that was a workaround for detokenize=false. - ar2diffusion: send the conditioning image only via multi_modal_data (matching vLLM's standard schema); the pipeline pre-process already reads it from there. Removes the duplicate pil_image field. - pipeline_hunyuan_image3._resize_and_crop_center: replace with the exact algorithm used by HunyuanImage3Processor._resize_and_crop on the AR side so AR and DiT preprocess condition images identically. Signed-off-by: zuiho-kai <wu15922848573@outlook.com>
Address PR vllm-project#3107 review feedback from Bounty-hunter (2026-05-04): 'AR and DiT call different functions to construct the input template (prepare_model_inputs for DiT). Maybe we should add unit tests to verify that their behavior remains consistent.' Two CPU-only structural guards added to test_prompt_utils.py: 1. test_dit_pipeline_reads_sequence_template_from_generation_config AST-verifies that HunyuanImage3Pipeline.prepare_model_inputs still pulls sequence_template from generation_config (not hardcoded) and forwards it into the tokenizer wrapper. Catches a regression where someone hardcodes 'pretrain' or removes the lookup -- the historic shape of the AR/DiT template-drift bug. 2. test_dit_tokenizer_wrapper_supports_instruct_branch Asserts the DiT-side TokenizerWrapper still recognizes sequence_template='instruct' (the AR-side instruct template anchors live in the model's chat-template definition, not in the wrapper module itself, so we route on the dispatch keyword instead of the literal anchor strings). Both tests are pure-AST/string scans and require no GPU, model weights, or HF cache, so they run in the same core_model+cpu lane as the rest of test_prompt_utils.py. Signed-off-by: zuiho-kai <wu15922848573@outlook.com>
| # Siglip2VisionModel; the top-level module is now itself the vision | ||
| # tower. Older transformers expose both, so dropping `.vision_model` | ||
| # is forward-compatible. | ||
| self.vision_model = Siglip2VisionModel(vision_config) |
There was a problem hiding this comment.
can we just import Siglip2VisionTransformer from vllm_omni.model_executor.models.hunyuan_image3.siglip2 ?
There was a problem hiding this comment.
We could, but I tested both paths end-to-end and they're numerically equivalent — so the choice is mostly stylistic. A couple of practical notes:
Numerical equivalence (verified) With identical weights + identical inputs (cartoon test image, dtype=bf16):
Both also produce 1119/1152 channels with cross-token std < 0.01 — that's the model's natural post-LayerNorm behavior, not a wrapper bug.
Trade-offs of the local import
- ➕ Self-contained, no dependency on transformers internals; matches what the HF snapshot bundles, byte-for-byte
- ➖ The local file uses
_prepare_4d_attention_maskfromtransformers.modeling_attn_mask_utils, which is deprecated and slated for removal in transformers 5.10 (DEPRECATION warning fires today). Thetransformersimpl already migrated tocreate_bidirectional_mask - ➖ Different forward signature (
attention_mask=vspixel_attention_mask=, requires explicitreturn_dict=True, takes adictinstead ofSiglip2VisionConfig) — call sites need adapting (vit_kwargsis currently keyedpixel_attention_maskper transformers 5.x) - ➖ We'd carry two SigLIP2 implementations in-tree (already imported by
model_executor.models.hunyuan_image3.hunyuan_image3for the AR side)
Recommendation Keep transformers.Siglip2VisionModel here. It's numerically equivalent, maintained upstream, and avoids the deprecated mask-utils path. If we want to drop the transformers dep entirely we should do it consistently across both AR and DiT paths in a separate refactor PR.
(Also note: I was investigating a separate painterly-style drift suspecting this site, and confirmed via the same test that the SigLIP2 wrapper isn't the cause — the cond features come out identical either way.)
7d2c974 to
e89d986
Compare
Adds image-to-image editing capability for tencent/HunyuanImage-3.0-Instruct,
using the same two-stage AR -> DiT pipeline as the existing T2I path with
the AR stage receiving an additional condition image alongside the user
prompt.
Highlights:
* Pipeline & runtime
- vllm_omni/diffusion/models/hunyuan_image3/pipeline_hunyuan_image3.py:
cond image VAE-encode, ViT-encode, and scatter the resulting features
into the DiT prefill via instantiate_vae_image_tokens /
instantiate_vit_image_tokens (matches HF reference modeling layout).
- vllm_omni/model_executor/stage_input_processors/hunyuan_image3.py:
ar2diffusion bridge forwards condition image + system_prompt + user
prompt from AR stage to DiT stage.
- vllm_omni/model_executor/stage_configs/hunyuan_image3_it2i.yaml:
8-GPU IT2I stage config (4 AR + 4 DiT).
- examples/offline_inference/hunyuan_image3/end2end.py + README.md:
img2img modality entry; prompt_dict uses vllm-standard `prompt` key
so the offline path receives the raw user prompt at the DiT stage
(DiT pipeline reads `p.get("prompt")` only).
* DiT MoE accuracy fixes (stale 0.18-era code surfaced as bugs after
the 0.20 rebase). Both addressed by aligning with the upstream PR
vllm-project#3373 by @dengyunyang who independently surfaced
the same accuracy gap.
- vllm_omni/diffusion/models/hunyuan_image3/hunyuan_fused_moe.py:
HunyuanFusedMoEDefault used to register a forward pre-hook that
called `self.quant_method.process_weights_after_loading(self)` on
first forward, to compensate for the 0.18-era standard model loader
not invoking it on FusedMoE layers. vLLM 0.20's standard loader
(`model_executor/model_loader/base_loader.py`) now invokes
`process_weights_after_loading` model-wide on init, so the hook
fires a second time on first forward, double-applying non-idempotent
in-place transforms (`UnquantizedFusedMoEMethod._maybe_pad_weight`
re-pads w13/w2 in place; `_setup_kernel` re-registers the moe_kernel
oracle on already-padded weights). Corrupted w13/w2 layout + wrong
kernel oracle config produces a small per-token, per-layer expert-
dispatch bias that accumulates across the 32 DiT MoE layers into a
"painterly / oil texture" attractor on the generated image. The
unquantized FusedMoE method has no
`_already_called_process_weights_after_loading` guard (only the FP8
quant method does), so non-quantized HunyuanImage3 reliably trips
this. Hook deliberately not registered.
- vllm_omni/diffusion/models/hunyuan_image3/hunyuan_image3_transformer.py
(HunYuanSparseMoeBlock):
Drop external `shared_experts` merge + `maybe_all_reduce_tensor_model_parallel`
in forward, and drop `reduce_results=False` on the FusedMoE init.
Since vLLM 0.20, when `shared_experts` is passed to FusedMoE, the
`shared_mlp` output is merged inside FusedMoE.forward and the TP
all-reduce is done internally; the wrapper code that did both of
these externally was a 0.18-era workaround that became a double
op after 0.20. Net effect of double-reduce + double shared_mlp add
was a small numerical bias on top of the painterly drift; removing
the wrapper restores HF-reference parity.
Verified on 4xL20X TP=2/2 (vllm 0.20.0 + torch 2.11.0+cu130): same
cartoon-block input + cute orange cat prompt yields a clean flat-
cartoon output, visually matching HF generate_image() reference.
* Tests
- tests/diffusion/models/hunyuan_image3/test_hunyuan_image3_it2i_ar_format.py:
unit-level - AR prefill input_ids byte-equal HF chat template,
image-tensor byte-equal AR-side processor.
- tests/e2e/accuracy/test_hunyuan_image3_it2i.py:
full-pipeline e2e - vllm-omni AR -> DiT vs HF generate_image() at
PSNR >= 40 dB on the same (condition_image, prompt, seed) tuple.
Co-authored-by: dengyunyang <584797741@qq.com>
Co-authored-by: skf <54565339+skf-1999@users.noreply.github.com>
Co-authored-by: John Liu BUAA <liukecheng97@gmail.com>
Signed-off-by: TaffyOfficial <2324465096@qq.com>
| pytestmark = [pytest.mark.full_model, pytest.mark.diffusion] | ||
|
|
||
|
|
||
| MODEL_NAME = "tencent/HunyuanImage-3.0-Instruct" |
There was a problem hiding this comment.
Will open another PR later, and add accuracy metrics and functional test cases to CI together.
| def __init__(self, *, prefix: str = "", **kwargs: Any) -> None: | ||
| super().__init__(prefix=prefix, **kwargs) | ||
| self._prefix = prefix | ||
| # NOTE: prior to vLLM 0.20, this class registered a forward pre-hook |
There was a problem hiding this comment.
remove useless comments
| else: | ||
| self.shared_mlp = None | ||
|
|
||
| # Since vLLM 0.20, FusedMoE merges `shared_experts` output and runs |
There was a problem hiding this comment.
remove useless comments
There was a problem hiding this comment.
Modifications completed
| height = original_prompt.get("height", 1024) | ||
| width = original_prompt.get("width", 1024) | ||
| text_prompt = original_prompt.get("prompt", "") | ||
| text_prompt = original_prompt.get("user_prompt") or original_prompt.get("prompt", "") |
There was a problem hiding this comment.
can't see user_prompt field initialized in end2end.py
There was a problem hiding this comment.
delete the unnecessary changes
HunyuanImage3TokenizerFast.apply_general_template uses Assistant: as the bot role prefix in instruct sequence_template (verified by decoding HF prepare_model_inputs output with system_prompt=en_unified + image + bot_task=think: token 72803 = "Assistant"). Switch build_prompt() to use the full word so the AR prefill aligns with the official HF tokenization. Also unify T2T to the same en_unified + Assistant: template (PR vllm-project#3107 reference implementation does the same; the previous T2T-specific branch was a workaround for an earlier prompt-format experiment). Note: BPE merge across user_prompt/Assistant boundary still produces 1 merged token (e.g. "。\n\n" -> single id) where HF apply_chat_template keeps them separate. Full byte-identical alignment requires passing pre-tokenized prompt_token_ids — that path is supported by vllm-omni (OmniTokensPrompt) but not yet plumbed through build_prompt(). Signed-off-by: TaffyOfficial <2324465096@qq.com>
- it2i.yaml: switch AR stage to detokenize=true so ar_generated_text comes from the engine; drop the AutoTokenizer fallback in ar2diffusion that was a workaround for detokenize=false. - ar2diffusion: send the conditioning image only via multi_modal_data (matching vLLM's standard schema); the pipeline pre-process already reads it from there. Removes the duplicate pil_image field. - pipeline_hunyuan_image3._resize_and_crop_center: replace with the exact algorithm used by HunyuanImage3Processor._resize_and_crop on the AR side so AR and DiT preprocess condition images identically. Signed-off-by: zuiho-kai <wu15922848573@outlook.com>
Address PR vllm-project#3107 review feedback from Bounty-hunter (2026-05-04): 'AR and DiT call different functions to construct the input template (prepare_model_inputs for DiT). Maybe we should add unit tests to verify that their behavior remains consistent.' Two CPU-only structural guards added to test_prompt_utils.py: 1. test_dit_pipeline_reads_sequence_template_from_generation_config AST-verifies that HunyuanImage3Pipeline.prepare_model_inputs still pulls sequence_template from generation_config (not hardcoded) and forwards it into the tokenizer wrapper. Catches a regression where someone hardcodes 'pretrain' or removes the lookup -- the historic shape of the AR/DiT template-drift bug. 2. test_dit_tokenizer_wrapper_supports_instruct_branch Asserts the DiT-side TokenizerWrapper still recognizes sequence_template='instruct' (the AR-side instruct template anchors live in the model's chat-template definition, not in the wrapper module itself, so we route on the dispatch keyword instead of the literal anchor strings). Both tests are pure-AST/string scans and require no GPU, model weights, or HF cache, so they run in the same core_model+cpu lane as the rest of test_prompt_utils.py. Signed-off-by: zuiho-kai <wu15922848573@outlook.com>
Address PR vllm-project#3107 review (Bounty-hunter / Gaohan123) requesting AR-output-format and DiT-output-accuracy regression tests. Layout mirrors PR vllm-project#2949's split (CPU unit test under tests/diffusion/..., GPU accuracy test under tests/e2e/accuracy/...). CPU unit test tests/diffusion/models/hunyuan_image3/test_hunyuan_image3_it2i_ar_format.py - test_ar_prefill_tokens_match_hf_apply_chat_template_for_it2i: asserts build_prompt_tokens (the AR-side prefill builder) is token-id-identical to HF tokenizer.apply_chat_template for the same (system, user_prompt, image) triple. Catches drift between the AR's input distribution and the model's training distribution -- the same failure mode PR vllm-project#3243 fixed for T2I. - test_dit_condition_image_preprocessing_byte_matches_ar_processor: asserts the diffusion-side _resize_and_crop_center produces byte-identical pixels to the AR-side HunyuanImage3Processor._resize_and_crop on the canonical resize targets. Direct response to Bounty-hunter's PR vllm-project#3107 review. Both tests gate on tencent/HunyuanImage-3.0-Instruct being in the local HF cache (no GPU/model weights required at runtime, just the tokenizer config + image processor). GPU accuracy test tests/e2e/accuracy/test_hunyuan_image3_it2i.py - test_hunyuan_image3_it2i_matches_hf_reference_psnr_40: drives vllm-omni's offline IT2I path through Omni and runs the official HF reference via AutoModelForCausalLM.generate_image, compared via the shared assert_similarity helper at PSNR>=40 dB and SSIM>=0.92. Marked full_model + skipif<8 GPUs; the threshold follows PR vllm-project#2949's review discussion (40 dB gives slack for TP=2 NCCL drift while still catching prompt/image-preprocessing bugs). Signed-off-by: zuiho-kai <wu15922848573@outlook.com>
…output alignment
The previous CPU-side test in test_hunyuan_image3_it2i_ar_format.py
called the official tokenizer's apply_chat_template to render the AR
prefill prompt and compared its token id sequence to vllm-omni's
build_prompt_tokens output. Two problems:
- it tested the *input* prompt only, not the AR's *generated output*
(which is what 'AR output matches official' actually demands);
- HunyuanImage3TokenizerFast.from_pretrained(snap) returns a
byte-fallback (char-level) tokenizer in a vacuum, which is not the
encoding the vllm-omni production path uses (AutoTokenizer with the
BPE merges from tokenizer.json) -- so the comparison was apples vs
char-bytes and could never pass on PyPI 0.20.x.
Replaced with a real GPU-required e2e test in
tests/e2e/accuracy/test_hunyuan_image3_it2i_ar_output.py that:
- drives the HF reference via AutoModelForCausalLM.from_pretrained +
model.prepare_model_inputs + model.generate(do_sample=False) (the
pattern already in scripts/bench/bench_ar_hf.py);
- drives vllm-omni AR via the i2t stage YAML with the prompt fed as
prompt_token_ids = HF prefill (the alignment path documented in
workflow-starter/memory/hf/hf_omni_alignment_method.md);
- asserts prefill input_ids byte-equality and the first 8 of 16
greedy AR-generated tokens match between HF and omni.
Skips cleanly when the snapshot is missing the two manual modeling
patches (RoPE broadcast / 2D attention_mask SDPA fall-through) that
the project's HF baseline runbook requires.
The CPU-only DiT/HF condition-image preprocessing byte-equality check
in test_hunyuan_image3_it2i_ar_format.py is preserved -- it passes
locally and guards Bounty-hunter's PR vllm-project#3107 review item directly.
Signed-off-by: zuiho-kai <wu15922848573@outlook.com>
Signed-off-by: zuiho <2324465096@qq.com>
|
@Bounty-hunter @Gaohan123 @hsliuustc0106 need review |
|
LGTM now |
Co-authored-by: dengyunyang <584797741@qq.com> Co-authored-by: TaffyOfficial <2324465096@qq.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: skf1999 <13234016272@163.com>
| token_ids = build_prompt_tokens(p, tokenizer, task=task, sys_type=args.sys_type) | ||
|
|
||
| prompt_dict: dict = {"prompt_token_ids": token_ids, "prompt": p} | ||
| preset_sys_type, _, _ = _TASK_PRESETS[task] |
There was a problem hiding this comment.
I suggest we list tasks explicitly in end2end.py rather than private enumerate
There was a problem hiding this comment.
Suggestion adopted.
Co-authored-by: dengyunyang <584797741@qq.com> Co-authored-by: TaffyOfficial <2324465096@qq.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com> Signed-off-by: skf1999 <13234016272@163.com>
…rebase (#3395) Signed-off-by: dengyunyang <584797741@qq.com>
Default IT2I (`hunyuan_image3_it2i.yaml`) and the AR+DiT T2I config (`hunyuan_image3_t2i_2gpu.yaml`) left `skip_special_tokens` at the vLLM default (True), so the AR engine stripped the trailing `<img_size_BASE><img_ratio_Y>` markers from `output.text` before the `ar2diffusion` bridge could read them. With the previous ar2diffusion fix, that meant the bridge fell back to the prompt-carried height/width — i.e. `pil_images[0].size` from the OpenAI edits path, which collapsed to a square whenever the first reference image was square. The KV-reuse config (`hunyuan_image3_it2i_kv_reuse.yaml`) already sets this flag (added in vllm-project#3346 because the KV reuse machinery needs the exact AR token stream), but the original IT2I yaml from vllm-project#3107 did not need it at the time and was never updated when ar2diffusion grew the ratio-token consumer. Aligns both configs with `_kv_reuse.yaml`. AR token-id fallback in ar2diffusion still works for users who keep the default, but having the text path live by default is cheaper (no tokenizer load) and avoids the model-name/path ambiguity the token-id fallback hits when the model is loaded from a local directory rather than the HF hub identifier. Signed-off-by: TaffyOfficial <wu15922848573@outlook.com>
…#3107) Signed-off-by: TaffyOfficial <2324465096@qq.com> Signed-off-by: zuiho <2324465096@qq.com> Signed-off-by: skf1999 <13234016272@163.com> Co-authored-by: TaffyOfficial <2324465096@qq.com> Co-authored-by: dengyunyang <584797741@qq.com> Co-authored-by: John Liu BUAA <liukecheng97@gmail.com>
…project#3107 rebase (vllm-project#3395) Signed-off-by: dengyunyang <584797741@qq.com>
Co-authored-by: dengyunyang 584797741@qq.com
Co-authored-by: TaffyOfficial 2324465096@qq.com
Co-authored-by: John Liu BUAA liukecheng97@gmail.com
Purpose
Test Plan
Test Result
PSNR:

bot-task think AR results.
Official
用户希望将这张可爱的金毛幼犬照片改造成一张充满节日氛围的新年宠物海报。参考图中是一只坐在木质地板上、背景有白色蒲公英的小狗,它正对着镜头开心地吐着舌头。原始指令非常具体,要求添加特定的标题文字、改变小狗的配饰、调整构图视角以及应用特定的胶片摄影风格。这是一个中等复杂度的任务,因为它涉及了文字添加、物体替换、视角变换和整体风格滤镜的叠加。首先,我需要处理文字部分,将“新年快乐汪”和“HAPPY NEW YEAR”以圆润可爱的字体放置在画面上方。接着,针对小狗的配饰,需要将原本粉色的项圈替换为鲜艳的红色针织围巾,并给它戴上一顶配套的红色毛线帽,这能极大地增强新年主题。在构图上,原始指令提到的鱼眼镜头意味着画面中心会向小狗头部聚拢,产生一种夸张的近景特写效果,背景的蒲公英和木板会因为透视而产生弯曲。最后,为了达到宝丽莱相纸和胶片摄影的效果,我需要给整张图加上白色的相纸边框,并加入细腻的胶片颗粒感和复古的色调,使画面看起来像是一张冲洗出来的老照片。通过这些步骤,原本普通的宠物照就能转化为一张极具感染力的节日海报。
The PR
用户希望将这张可爱的金毛幼犬照片改造成一张充满节日氛围的新年宠物海报。参考图中,小狗坐在木质地板上,背景是模糊的白色花朵,整体色调自然清新。为了实现"新年宠物海报"这一目标,我需要将背景从户外的自然景观彻底替换为具有中国传统新年特色的室内或庭院场景。这涉及到色彩的全面调整,从自然的木色和绿色转变为喜庆的红色和金色。在视觉元素上,我应该加入红灯笼、春联、福字、鞭炮等经典符号,并在小狗周围添加一些动态的烟花或光斑效果,以增强节日的热闹感。小狗本身作为主体需要保留,但为了更好地融入新环境,它的光影应该调整为受红色环境光影响的暖色调。此外,海报还需要包含醒目的文字标题,如"新年快乐"和"宠物贺岁",并采用具有设计感的艺术字体。最后,为了增加海报的仪式感,可以在画面四周添加一个带有传统纹样的红色边框。通过这些具体的视觉转化,原本普通的宠物照就能变成一张主题鲜明、细节丰富的节日宣传海报。
E2E Output picture
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model. Please runmkdocs serveto sync the documentation editions to./docs.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)