[Model] Extend NPU support for HunyuanImage3 Diffusion Model by ElleElleWu · Pull Request #1689 · vllm-project/vllm-omni

ElleElleWu · 2026-03-05T12:12:40Z

Co-authored-by: skf1999 13234016272@163.com
Co-authored-by: Just-it 1161406585@qq.com
Co-authored-by: Semmer2 semmer@live.cn

Purpose

Support HunyuanImage as a DiT model in both GPU and NPU.

Test Result

1. Test Environment

GPU

CUDA         Version: 12.9
torch        Version: 2.9.1
vllm         Version: 0.16.0
vllm-omni    Version: 0.16.0

NPU

torch             Version: 2.9.0
torch_npu         Version: 2.9.0
vllm              Version: 0.16.0
vllm-ascend       Version: 0.16.0
vllm-omni         Version: 0.16.0

vllm-ascend: As there is no official release available for 0.16.0 yet, we have pinned the dependency to commit c7fd7a2 ([Doc][Misc] Fix msprobe_guide.md documentation issues (#6965)).

2. Offline inference

- CMD

/usr/local/python3.11.14/bin/python -u vllm-omni/examples/offline_inference/text_to_image/text_to_image.py \
    --model /mnt/share/HunyuanImage-3.0/ \
    --prompt "A brown and white dog is running on the grass" \
    --output output_image.png \
    --num-inference-steps 50 \
    --tensor-parallel-size 4 \
    --seed 1234 2>&1 \
    --enable-expert-parallel

- Execution Result Output

[Stage-0] INFO 03-05 12:03:34 [diffusion_engine.py:80] Generation completed successfully.
[Stage-0] INFO 03-05 12:03:34 [diffusion_engine.py:98] Post-processing completed in 0.0000 seconds


Processed prompts: 100%|██████████| 1/1 [00:28<00:00, 28.10s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]�[A
Processed prompts: 100%|██████████| 1/1 [00:28<00:00, 28.10s/img, est. speed stage-0 img/s: 0.00, avg e2e_lat: 0.0ms]

Adding requests:   0%|          | 0/1 [00:28<?, ?it/s]
Total generation time: 28.1028 seconds (28102.85 ms)
INFO 03-05 12:03:34 [text_to_image.py:407] Outputs: [OmniRequestOutput(request_id='', finished=True, stage_id=0, final_output_type='image', request_output=[OmniRequestOutput(request_id='0_0dc5fda8-537c-4420-887f-46e0ede5a511', finished=True, stage_id=None, final_output_type='image', request_output=None, images=[1 PIL Images], prompt={'prompt': 'A brown and white dog is running on the grass', 'negative_prompt': None, 'additional_information': {'global_request_id': ['0_0dc5fda8-537c-4420-887f-46e0ede5a511']}}, latents=None, metrics={'image_num': 1, 'resolution': 640, 'postprocess_time_ms': 0.002384185791015625}, multimodal_output={})], images=[], prompt=None, latents=None, metrics={}, multimodal_output={})]
Saved generated image to output_image.png
[Stage-0] INFO 03-05 12:03:34 [omni_stage.py:870] Received shutdown signal

3. Online Inference

- command

vllm serve "/data/HunyuanImage-3.0/" --omni --port "8091" --tensor_parallel_size 8  --enable-expert-parallel

- Online Request

curl -X POST http://localhost:8080/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A brown and white dog is running on the grass",
    "num_inference_steps": 50,
    "n": 4,
    "size": "1024x1024",
    "seed": 123
  }' | jq -r '.data[0].b64_json' | base64 -d > dragon.png
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2098k  100 2098k  100   152  82732      5  0:00:30  0:00:25  0:00:05  560k```

- Execution Result Output

(APIServer pid=1177212) INFO 03-05 12:06:56 [api_server.py:1038] Generating 4 image(s) 1024x1024
(APIServer pid=1177212) INFO 03-05 12:06:56 [async_omni.py:345] [AsyncOrchestrator] Entering scheduling loop: stages=1, final_stage=0
[Stage-0] INFO 03-05 12:06:56 [manager.py:592] Deactivating all adapters: 0 layers
[Stage-0] INFO 03-05 12:06:56 [manager.py:592] Deactivating all adapters: 0 layers
[Stage-0] INFO 03-05 12:06:56 [manager.py:592] Deactivating all adapters: 0 layers
[Stage-0] INFO 03-05 12:06:56 [manager.py:592] Deactivating all adapters: 0 layers
[Stage-0] WARNING 03-05 12:06:56 [kv_transfer_manager.py:421] No connector available for receiving KV cache
[Stage-0] WARNING 03-05 12:06:56 [kv_transfer_manager.py:421] No connector available for receiving KV cache
[Stage-0] WARNING 03-05 12:06:56 [kv_transfer_manager.py:421] No connector available for receiving KV cache
[Stage-0] WARNING 03-05 12:06:56 [kv_transfer_manager.py:421] No connector available for receiving KV cache
  2%|███▏                                                                                                                                                         | 1/50 [00:00<00:25,  1.96it/s][rank0]:[W305 12:06:57.969502890 compiler_depend.ts:4658] Warning: The current allgather operator has a defect in handling different tensor shape,         the work event forces a wait operation, and the allgather wait on the python side would be fake (function operator())
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:24<00:00,  2.03it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:24<00:00,  2.03it/s]

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:24<00:00,  2.03it/s]
[Stage-0] INFO 03-05 12:07:21 [diffusion_engine.py:80] Generation completed successfully.
[Stage-0] INFO 03-05 12:07:21 [diffusion_engine.py:98] Post-processing completed in 0.0000 seconds
(APIServer pid=1177212) INFO 03-05 12:07:21 [api_server.py:1058] Successfully generated 1 image(s)
(APIServer pid=1177212) INFO:     127.0.0.1:52494 - "POST /v1/images/generations HTTP/1.1" 200 OK

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b619c2d1f7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

hsliuustc0106 · 2026-03-05T14:23:47Z

can you edit the title of this PR? we have already supported DiT inference for HYImage3. This should be an enhancement.

gcanlin

Thanks for the contribution! BTW, the online test looks like forgetting to enable EP. It's better to cover it.

Feel free to ping me again if you're blocking the hardware dispatch design. I will try to help when I have more bandwidth.

hsliuustc0106

Review

Rating: 7/10 | Verdict: ⚠️ Changes Requested

Summary

Solid implementation adding HunyuanImage3 support with GPU and NPU compatibility. However, missing critical documentation and tests. Code quality issues in MoE implementation need addressing.

Issues

Missing documentation: No update to supported_models.md or example configs for the new model.
No unit tests: 263 lines of new code without any tests. At minimum, need tests for:
- is_moe property in OmniDiffusionConfig
- Expert parallel initialization error cases
- FusedMoE wrapper behavior
Memory requirements undocumented: HunyuanImage3 requires significant VRAM (40GB per skill docs), but PR doesn't specify minimum requirements or recommended configurations.
Stage config missing: No YAML config file for stage setup, which is required for new model support per PR checklist.

Highlights

✅ Comprehensive GPU and NPU implementation
✅ Excellent test coverage in PR description (offline + online)
✅ Proper expert parallelism support for MoE layers
✅ Good error handling for non-MoE models with EP enabled

Recommendation

Address documentation and test gaps before merge. Code implementation is solid but needs supporting artifacts.

Reviewed by OpenClaw with vllm-omni-skills 🦐

ElleElleWu · 2026-03-06T01:29:58Z

can you edit the title of this PR? we have already supported DiT inference for HYImage3. This should be an enhancement.

solved, please check

Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Signed-off-by: ElleElleWu <1608928702@qq.com>

Signed-off-by: ElleElleWu <1608928702@qq.com>

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-03-12T11:24:53Z

@xuechendi @hsliuustc0106 I push a micro-refactor for clearer hardware dispatch in HunYuanFusedMoE layer. PTAL.

And wait for @ElleElleWu test again. Thanks!

gcanlin

Just modified the dispatch. And other logic is good to me. Thanks for contributing!

ElleElleWu · 2026-03-12T11:32:42Z

@xuechendi @hsliuustc0106 I push a micro-refactor for clearer hardware dispatch in HunYuanFusedMoE layer. PTAL.

And wait for @ElleElleWu test again. Thanks!

Thanks for the update! I've finished testing, and the results are consistent with the previous version. Looks good to me.

hsliuustc0106

Review Summary

CI Gate: ✅ All gates passed (DCO, pre-commit, build, mergeable)

Highlights

Clean Platform Dispatch Pattern — Factory pattern with get_diffusion_model_impl_qualname() hook avoids hardcoded GPU/NPU branches in model code.
Comprehensive Test Coverage — New unit tests for HunyuanFusedMoE platform dispatch and is_moe property.
Documentation Updated — supported_models.md includes HunyuanImage3.

Suggestions

1. `is_moe` Threshold Change: `> 1` → `> 0`

# Before
return num_experts > 1

# After
return num_experts > 0

Question: Should num_experts = 1 really be considered MoE? Typically MoE requires multiple experts for the routing mechanism to make sense.

If this is intentional (e.g., single-expert models use the same infrastructure), a brief comment explaining the rationale would help future readers.

2. Global State Cleanup in `del`

def __del__(self):
    if vllm_ascend_parallel_state._MC2:
        vllm_ascend_parallel_state._MC2.destroy()
    vllm_ascend_parallel_state._MC2 = None

Potential Issue: If multiple AscendHunyuanFusedMoE instances exist, the first one to be garbage-collected will destroy the shared _MC2 group, potentially causing crashes in remaining instances.

Suggestion: Consider reference counting or ownership model for the MC2 group lifecycle.

Summary

Aspect	Rating
Architecture	8/10
Code Quality	8/10
Testing	9/10
Documentation	8/10

Verdict: ✅ Ready to merge after Buildkite CI passes. The suggestions above are non-blocking quality-of-life improvements.

Thanks for the NPU support contribution! 🚀

hsliuustc0106

lgtm

…oject#1689) Signed-off-by: ElleElleWu <1608928702@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: KexiongYu <yukexiong1@huawei.com>

…oject#1689) Signed-off-by: ElleElleWu <1608928702@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>

…oject#1689) Signed-off-by: ElleElleWu <1608928702@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Co-authored-by: gcanlin <canlinguosdu@gmail.com>

ElleElleWu requested a review from hsliuustc0106 as a code owner March 5, 2026 12:12

chatgpt-codex-connector Bot reviewed Mar 5, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/distributed/parallel_state.py Outdated

Comment thread vllm_omni/diffusion/models/hunyuan_image_3/hunyuan_fused_moe.py Outdated

gcanlin reviewed Mar 5, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/data.py Outdated

gcanlin reviewed Mar 5, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/hunyuan_image_3/hunyuan_fused_moe.py Outdated

gcanlin reviewed Mar 5, 2026

View reviewed changes

hsliuustc0106 requested changes Mar 6, 2026

View reviewed changes

ElleElleWu changed the title ~~[Model] Support HunyuanImage3 Diffusion Model in for GPU and NPU~~ [Model] Futher Support HunyuanImage3 Diffusion Model in NPU Mar 6, 2026

ElleElleWu changed the title ~~[Model] Futher Support HunyuanImage3 Diffusion Model in NPU~~ [Model] Extend NPU support for HunyuanImage3 Diffusion Model Mar 6, 2026

ElleElleWu force-pushed the HunyuanImage3_npu_0.16.0_and_ep branch 6 times, most recently from e44311f to 373cf76 Compare March 9, 2026 03:02

ElleElleWu requested a review from hsliuustc0106 March 9, 2026 03:05

xuechendi reviewed Mar 11, 2026

View reviewed changes

Comment thread vllm_omni/diffusion/models/hunyuan_image_3/hunyuan_fused_moe.py Outdated

ElleElleWu force-pushed the HunyuanImage3_npu_0.16.0_and_ep branch from 373cf76 to 6603fc7 Compare March 12, 2026 01:42

ElleElleWu requested a review from gcanlin March 12, 2026 02:06

ElleElleWu and others added 5 commits March 12, 2026 17:30

[Model] Support HunyuanImage3 Diffusion Model for GPU and NPU

c72fefe

Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Signed-off-by: ElleElleWu <1608928702@qq.com>

fix is_moe type and threshold, add UT for is_moe, Hunyuan_fused_moe

bac1937

Signed-off-by: ElleElleWu <1608928702@qq.com>

fix hunyuan_fused_moe for xpu and other devices

6603fc7

Signed-off-by: ElleElleWu <1608928702@qq.com>

Refactor hardware dispathc

79d77c9

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

rename

022e407

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added the ready label to trigger buildkite CI label Mar 12, 2026

gcanlin approved these changes Mar 12, 2026

View reviewed changes

gcanlin enabled auto-merge (squash) March 12, 2026 11:41

Merge branch 'main' into HunyuanImage3_npu_0.16.0_and_ep

c526392

hsliuustc0106 reviewed Mar 12, 2026

View reviewed changes

hsliuustc0106 self-requested a review March 12, 2026 12:49

hsliuustc0106 approved these changes Mar 12, 2026

View reviewed changes

gcanlin merged commit 28dd1a6 into vllm-project:main Mar 12, 2026
7 checks passed

Conversation

ElleElleWu commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Result

1. Test Environment

2. Offline inference

- CMD

- Execution Result Output

3. Online Inference

- command

- Online Request

- Execution Result Output

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

hsliuustc0106 commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

gcanlin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Review

Summary

Issues

Highlights

Recommendation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ElleElleWu commented Mar 6, 2026

Uh oh!

Uh oh!

gcanlin commented Mar 12, 2026

Uh oh!

gcanlin left a comment

Choose a reason for hiding this comment

Uh oh!

ElleElleWu commented Mar 12, 2026

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Review Summary

Highlights

Suggestions

1. is_moe Threshold Change: > 1 → > 0

2. Global State Cleanup in __del__

Summary

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ElleElleWu commented Mar 5, 2026 •

edited

Loading

gcanlin left a comment •

edited

Loading

1. `is_moe` Threshold Change: `> 1` → `> 0`

2. Global State Cleanup in `del`