Skip to content

[Feat] Enable VAE parallel in HunyuanImage3#3091

Open
Fishermanykx wants to merge 2 commits into
vllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_vae_opt
Open

[Feat] Enable VAE parallel in HunyuanImage3#3091
Fishermanykx wants to merge 2 commits into
vllm-project:mainfrom
Fishermanykx:yukexiong/hunyuan_vae_opt

Conversation

@Fishermanykx
Copy link
Copy Markdown
Contributor

@Fishermanykx Fishermanykx commented Apr 24, 2026

Summary

Enable VAE parallel support in HunyuanImage3.

Current changes:

  • add a distributed Hunyuan VAE wrapper at vllm_omni/diffusion/distributed/autoencoders/autoencoder_kl_hunyuan.py
  • wire HunyuanImage3Pipeline to use the distributed autoencoder wrapper
  • remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py

unified deploy yaml in #3172

Validation

  • static checks only so far (py_compile, diff checks)
  • runtime validation is still pending

Test Plan

Tested on 4xAscend NPU

server

vllm serve $model --omni --port "8031" \
    --log-stats \
    --stage-configs-path "vllm_omni/platforms/npu/stage_configs/hunyuan_image3_t2i.yaml" 

vae_patch_parallel_size is set to 4

client

curl -X POST http://localhost:8031/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": 
    "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style.",
    "num_inference_steps": 2,
    "guidance_scale": "1.0",
    "n": 1,
    "size": "1024x1024",
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > output.png

Test Result

output

output

VAE decode time 625.7ms -> 355ms

w/o vae parallel
wo-vae

w vae parallel
w vae

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 2 times, most recently from 421d557 to c69899e Compare April 24, 2026 03:36
@Fishermanykx Fishermanykx changed the title [WIP][Feat.] Enable VAE parallel in HunyuanImage3 [Feat.] Enable VAE parallel in HunyuanImage3 Apr 24, 2026
@Fishermanykx Fishermanykx marked this pull request as ready for review April 24, 2026 03:36
@Fishermanykx
Copy link
Copy Markdown
Contributor Author

Fishermanykx commented Apr 24, 2026

PTAL @gcanlin @Semmer2

@Fishermanykx Fishermanykx changed the title [Feat.] Enable VAE parallel in HunyuanImage3 [Feat] Enable VAE parallel in HunyuanImage3 Apr 24, 2026
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 2 times, most recently from ee9b0b3 to a4502c4 Compare April 24, 2026 07:23
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

does it work in GPU as well?
does it affect the acc?

Copy link
Copy Markdown
Contributor

@Bounty-hunter Bounty-hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from 378289a to 2eacaf2 Compare May 11, 2026 12:03
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch 5 times, most recently from ae99ecc to c6f0e06 Compare May 14, 2026 02:42
@BLANKETusers
Copy link
Copy Markdown

Test Plan

Tested on 2xH200 GPU

VAE

python vllm-omni/examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm-omni/vllm_omni/deploy/hunyuan_image3_dit.yaml \
  --prompts "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style." \
  --output ./output/output_offline_vae \
  --vae-use-tiling

No VAE

python vllm-omni/examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm-omni/vllm_omni/deploy/hunyuan_image3_dit.yaml \
  --prompts "A cinematic medium shot captures a single Asian woman seated on a chair within a dimly lit room, creating an intimate and theatrical atmosphere. The composition is focused on the subject, rendered with rich colors and intricate textures that evoke a nostalgic and moody feeling.\n\nThe primary subject is a young Asian woman with a thoughtful and expressive countenance, her gaze directed slightly away from the camera. She is seated in a relaxed yet elegant posture on an ornate, vintage armchair. The chair is upholstered in a deep red velvet, its fabric showing detailed, intricate textures and slight signs of wear. She wears a simple, elegant dress in a dark teal hue, the material catching the light in a way that reveals its fine-woven texture. Her skin has a soft, matte quality, and the light delicately models the contours of her face and arms.\n\nThe surrounding room is characterized by its vintage decor, which contributes to the historic and evocative mood. In the immediate background, partially blurred due to a shallow depth of field consistent with a f/2.8 aperture, the wall is covered with wallpaper featuring a subtle, damask pattern. The overall color palette is a carefully balanced interplay of deep teal and rich red hues, creating a visually compelling and cohesive environment. The entire scene is detailed, from the fibers of the upholstery to the subtle patterns on the wall.\n\nThe lighting is highly dramatic and artistic, defined by high contrast and pronounced shadow play. A single key light source, positioned off-camera, projects gobo lighting patterns onto the scene, casting intricate shapes of light and shadow across the woman and the back wall. These dramatic shadows create a strong scense of depth and a theatrical quality. While some shadows are deep and defined, others remain soft, gently wrapping around the subject and preventing the loss of detail in darker areas. The soft focus on the background enhances the intimate feeling, drawing all attention to the expressive subject. The overall image presents a cinematic, photorealistic photography style." \
  --output ./output/output_offline_vae

Test Result

VAE

output_0_0

No VAE

output_0_0

CLIP Score

99.85/100

@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from c6f0e06 to 8c4b866 Compare May 14, 2026 09:13
@Gaohan123 Gaohan123 added this to the v0.22.0 milestone May 14, 2026
Copy link
Copy Markdown
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some suggestions:

  1. Please add simple UT for it
  2. I didn't notice any modification about NPU, which is not consistent with your PR description

@Fishermanykx
Copy link
Copy Markdown
Contributor Author

  • remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py
  1. done
  2. remove the NPU fused MoE init hook in vllm_omni/platforms/npu/models/hunyuan_fused_moe.py this is done in pull 2979, which is not merged when this pr proposed. As I rebase my code, this change no longer exists in this pr.

@Fishermanykx Fishermanykx requested a review from yenuo26 as a code owner May 15, 2026 07:02
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from f338f4d to 014b54b Compare May 15, 2026 07:05
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
Signed-off-by: KexiongYu <yukexiong1@huawei.com>
@Fishermanykx Fishermanykx force-pushed the yukexiong/hunyuan_vae_opt branch from 014b54b to 272bc98 Compare May 15, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants