[Model] Extend NPU support for HunyuanImage3 Diffusion Model#1689
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b619c2d1f7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
can you edit the title of this PR? we have already supported DiT inference for HYImage3. This should be an enhancement. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review
Rating: 7/10 | Verdict:
Summary
Solid implementation adding HunyuanImage3 support with GPU and NPU compatibility. However, missing critical documentation and tests. Code quality issues in MoE implementation need addressing.
Issues
-
Missing documentation: No update to
supported_models.mdor example configs for the new model. -
No unit tests: 263 lines of new code without any tests. At minimum, need tests for:
is_moeproperty in OmniDiffusionConfig- Expert parallel initialization error cases
- FusedMoE wrapper behavior
-
Memory requirements undocumented: HunyuanImage3 requires significant VRAM (40GB per skill docs), but PR doesn't specify minimum requirements or recommended configurations.
-
Stage config missing: No YAML config file for stage setup, which is required for new model support per PR checklist.
Highlights
- ✅ Comprehensive GPU and NPU implementation
- ✅ Excellent test coverage in PR description (offline + online)
- ✅ Proper expert parallelism support for MoE layers
- ✅ Good error handling for non-MoE models with EP enabled
Recommendation
Address documentation and test gaps before merge. Code implementation is solid but needs supporting artifacts.
Reviewed by OpenClaw with vllm-omni-skills 🦐
solved, please check |
e44311f to
373cf76
Compare
373cf76 to
6603fc7
Compare
Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Signed-off-by: ElleElleWu <1608928702@qq.com>
Signed-off-by: ElleElleWu <1608928702@qq.com>
Signed-off-by: ElleElleWu <1608928702@qq.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
@xuechendi @hsliuustc0106 I push a micro-refactor for clearer hardware dispatch in And wait for @ElleElleWu test again. Thanks! |
gcanlin
left a comment
There was a problem hiding this comment.
Just modified the dispatch. And other logic is good to me. Thanks for contributing!
Thanks for the update! I've finished testing, and the results are consistent with the previous version. Looks good to me. |
hsliuustc0106
left a comment
There was a problem hiding this comment.
Review Summary
CI Gate: ✅ All gates passed (DCO, pre-commit, build, mergeable)
Highlights
-
Clean Platform Dispatch Pattern — Factory pattern with
get_diffusion_model_impl_qualname()hook avoids hardcoded GPU/NPU branches in model code. -
Comprehensive Test Coverage — New unit tests for
HunyuanFusedMoEplatform dispatch andis_moeproperty. -
Documentation Updated —
supported_models.mdincludes HunyuanImage3.
Suggestions
1. is_moe Threshold Change: > 1 → > 0
# Before
return num_experts > 1
# After
return num_experts > 0Question: Should num_experts = 1 really be considered MoE? Typically MoE requires multiple experts for the routing mechanism to make sense.
If this is intentional (e.g., single-expert models use the same infrastructure), a brief comment explaining the rationale would help future readers.
2. Global State Cleanup in __del__
def __del__(self):
if vllm_ascend_parallel_state._MC2:
vllm_ascend_parallel_state._MC2.destroy()
vllm_ascend_parallel_state._MC2 = NonePotential Issue: If multiple AscendHunyuanFusedMoE instances exist, the first one to be garbage-collected will destroy the shared _MC2 group, potentially causing crashes in remaining instances.
Suggestion: Consider reference counting or ownership model for the MC2 group lifecycle.
Summary
| Aspect | Rating |
|---|---|
| Architecture | 8/10 |
| Code Quality | 8/10 |
| Testing | 9/10 |
| Documentation | 8/10 |
Verdict: ✅ Ready to merge after Buildkite CI passes. The suggestions above are non-blocking quality-of-life improvements.
Thanks for the NPU support contribution! 🚀
…oject#1689) Signed-off-by: ElleElleWu <1608928702@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: KexiongYu <yukexiong1@huawei.com>
…oject#1689) Signed-off-by: ElleElleWu <1608928702@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: yiliu30 <yi4.liu@intel.com>
…oject#1689) Signed-off-by: ElleElleWu <1608928702@qq.com> Signed-off-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: skf1999 <13234016272@163.com> Co-authored-by: Just-it <1161406585@qq.com> Co-authored-by: Semmer2 <semmer@live.cn> Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: skf1999 13234016272@163.com
Co-authored-by: Just-it 1161406585@qq.com
Co-authored-by: Semmer2 semmer@live.cn
Purpose
Support HunyuanImage as a DiT model in both GPU and NPU.
Test Result
1. Test Environment
GPU
NPU
vllm-ascend: As there is no official release available for 0.16.0 yet, we have pinned the dependency to commit c7fd7a2 ([Doc][Misc] Fix msprobe_guide.md documentation issues (#6965)).
2. Offline inference
- CMD
- Execution Result Output
3. Online Inference
- command
- Online Request
- Execution Result Output
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)