Skip to content

[Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage#2762

Merged
gcanlin merged 3 commits intovllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_guidance_scale_le1
Apr 15, 2026
Merged

[Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage#2762
gcanlin merged 3 commits intovllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_guidance_scale_le1

Conversation

@Fishermanykx
Copy link
Copy Markdown
Contributor

@Fishermanykx Fishermanykx commented Apr 14, 2026

Purpose

This PR is intended to adjust HunyuanImage3's default behavior that always generated the negative/unconditional branch, so generation can run in single-branch mode when guidance is not enabled.

What This PR Changes

  1. Guidance behavior

    • Allows guidance_scale <= 1.0 without forcing it to 1.0 + epsilon.
    • This enables true non-CFG behavior for low-guidance requests.
  2. CFG factor control

    • Changes cfg_factor for gen_image from a fixed value to dynamic gating
    • CFG duplication is now enabled only when guidance is actually greater than 1.
  3. Tensor layout robustness

    • Replaces view(...) with reshape(...) in the HunyuanImage3 attention output path to avoid runtime errors when tensors are non-contiguous.

Test Plan

Tested on 4x Ascend NPU with v0.18.0.post1 vllm omni

Online

vllm serve "/data/HunyuanImage-3.0/" --omni --port "8091" \
    --tensor_parallel_size 4  \
    --log-stats \
    --stage-configs-path "vllm_omni/platforms/npu/stage_configs/hunyuan_image3_moe_dit.yaml"

client

curl -X POST http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": 
        "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style.",
    "num_inference_steps": 50,
    "guidance_scale": "1.0",
    "n": 1,
    "size": "1024x1024",
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > output.png

test plan without vllm omni

torchrun --master_port=10086 --nproc_per_node 4 run_image_gen.py --reproduce --model-id $model  --verbose 0 --image-size 1024x1024 --diff-infer-steps 50 --prompt $prompt 2>&1 | tee "./logs/$(date +%Y%m%d_%H%M%S).log"

Test Result

guidance scale E2E
5.0 27.256s
1.0 15.895s

guidance scale = 1.0 with vllm omni
output

guidance scale = 1.0 without vllm omni
poc

@Fishermanykx Fishermanykx changed the title [WIP] [Fix] HunyuanImage3 guidance_scale<=1 and cfg-factor gating [WIP] HunyuanImage3 allow guidance_scale<=1 Apr 14, 2026
@Fishermanykx Fishermanykx changed the title [WIP] HunyuanImage3 allow guidance_scale<=1 [WIP] [Feature] HunyuanImage3 allow guidance_scale<=1 Apr 14, 2026
@Fishermanykx Fishermanykx changed the title [WIP] [Feature] HunyuanImage3 allow guidance_scale<=1 [WIP] [Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage Apr 14, 2026
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_guidance_scale_le1 branch 2 times, most recently from b0b20e2 to 9108826 Compare April 14, 2026 06:53
@Fishermanykx Fishermanykx marked this pull request as ready for review April 14, 2026 06:59
@chatgpt-codex-connector
Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@Fishermanykx Fishermanykx changed the title [WIP] [Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage [Feature] HunyuanImage3 allow guidance_scale<=1 in DiT stage Apr 14, 2026
@Fishermanykx
Copy link
Copy Markdown
Contributor Author

@gcanlin @Bounty-hunter PTAL

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_guidance_scale_le1 branch 3 times, most recently from a8ab210 to 178f23c Compare April 15, 2026 01:37
@gcanlin
Copy link
Copy Markdown
Collaborator

gcanlin commented Apr 15, 2026

Why do we think that hunyuan-image doesn't support guidance_scale <= 1 before?

@Fishermanykx
Copy link
Copy Markdown
Contributor Author

Fishermanykx commented Apr 15, 2026

Why do we think that hunyuan-image doesn't support guidance_scale <= 1 before?

In line 1013 of vllm_omni/diffusion/models/hunyuan_image_3/pipeline_hunyuan_image_3.py, if guidance_scale <= 1, it will be set to 1.0 + np.finfo(float).eps

@@ -544,7 +543,7 @@ def prepare_model_inputs(
generator = [torch.Generator(self.device).manual_seed(seed) for seed in seeds]

# 3. apply chat template
cfg_factor = {"gen_text": 1, "gen_image": 2}
cfg_factor = {"gen_text": 1, "gen_image": 1 + int(guidance_scale > 1.0)}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is

Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gcanlin gcanlin added the ready label to trigger buildkite CI label Apr 15, 2026
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_guidance_scale_le1 branch from 178f23c to 90f3077 Compare April 15, 2026 03:35
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_guidance_scale_le1 branch from 90f3077 to d2091f9 Compare April 15, 2026 06:22
@gcanlin gcanlin merged commit 50ae1de into vllm-project:main Apr 15, 2026
8 checks passed
y123456y78 pushed a commit to y123456y78/vllm-omni that referenced this pull request Apr 15, 2026
lvliang-intel pushed a commit to lvliang-intel/vllm-omni that referenced this pull request Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants