Skip to content

[NPU] Upgrade to v0.14.0#820

Merged
hsliuustc0106 merged 15 commits intovllm-project:mainfrom
gcanlin:npu-graph
Jan 27, 2026
Merged

[NPU] Upgrade to v0.14.0#820
hsliuustc0106 merged 15 commits intovllm-project:mainfrom
gcanlin:npu-graph

Conversation

@gcanlin
Copy link
Copy Markdown
Collaborator

@gcanlin gcanlin commented Jan 16, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR:

  • Upgrade to v0.14.0. Align with vLLM v0.14.0 and vLLM-Ascend v0.14.0rc1.
  • Align with GPU model runner to support MTP ACL graph.
  • Remove the temporary patch for vLLM-Ascend.

Test Plan

NPU:

  • Qwen3-Omni
    • eager
    • ACL graph
  • Qwen2.5-Omni
    • eager
    • ACL graph(Will fix acc problem in a following-up PR)
  • Qwen-Image
  • Qwen3-TTS

GPU:

  • CI

Test Result

Model enforce_eager time example
Qwen3-Omni true 81s use_audio
Qwen3-Omni false 70s use_audio
Qwen2.5-Omni true 180s use_mixed_modalities
Qwen2.5-Omni false have acc problem(Will fix it in a following-up PR) use_mixed_modalities

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

paste the comparison gain in performance

@hsliuustc0106
Copy link
Copy Markdown
Collaborator

also, please help optimize the log for metrics? currently it's very hard to understand

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Jan 16, 2026

@hsliuustc0106 I updated the benchmark result with #780 to compare before and after. But it's wired that enabling ACL graph makes performance regression... It needs more investigation. I will revert to eager mode by default. But this PR also fix the main branch breaking on NPU. So it's better to get it merged asap. Thanks!

also, please help optimize the log for metrics? currently it's very hard to understand

I think after benchmark is done, metrics seems to be unnecessary.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin changed the title [NPU] Enable ACL graph for Qwen3-Omni [NPU] Support ACL graph for Qwen3-Omni Jan 16, 2026
@gcanlin gcanlin changed the title [NPU] Support ACL graph for Qwen3-Omni [NPU] Support ACL graph for Qwen3-Omni MTP Jan 19, 2026
@gcanlin gcanlin changed the title [NPU] Support ACL graph for Qwen3-Omni MTP [NPU] Support ACL graph for Qwen3-Omni Talker MTP Jan 19, 2026
@david6666666 david6666666 added this to the v0.14.0rc1 milestone Jan 20, 2026
@hsliuustc0106 hsliuustc0106 added the ready label to trigger buildkite CI label Jan 20, 2026
@david6666666 david6666666 modified the milestones: v0.14.0rc1, v0.14.0 Jan 23, 2026
@hsliuustc0106
Copy link
Copy Markdown
Collaborator

could you fix this PR and test the result again? we expect this PR merged by next week

@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Jan 23, 2026

could you fix this PR and test the result again? we expect this PR merged by next week

After vLLM-Ascend v0.14.0rc1 releases, I will also upgrade NPUModelRunner to align v0.14.0rc1 in this PR.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin gcanlin changed the title [NPU] Support ACL graph for Qwen3-Omni Talker MTP [NPU] Upgrade to v0.14.0 Jan 26, 2026
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
@gcanlin
Copy link
Copy Markdown
Collaborator Author

gcanlin commented Jan 26, 2026

@hsliuustc0106 @david6666666 Could you please take a look? This PR is ready. Docs and other things will be done in #671. Thanks!

Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
self._shared_expert = shared_expert
self._shared_expert_gate = shared_expert_gate

@property
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the modification in the model file?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add those properties so the wrapper presents the same interface as a normal Qwen MLP. vLLM-Ascend’s SharedFusedMoE validation splits the shared expert into gate_up_proj → act_fn → down_proj and expects these attributes to exist. The wrapper only holds the real MLP, so we forward them to keep validation and split execution working on NPU. It doesn't affect GPU behaviour. When we upgrade to vLLM v0.15.0 or later, we will remove this hack wrapper because upstream has implemented shared expert in Qwen3-MoE that we can reuse directly.

@hsliuustc0106 hsliuustc0106 merged commit cadd772 into vllm-project:main Jan 27, 2026
7 checks passed
nussejzz pushed a commit to nussejzz/vllm-omni that referenced this pull request Jan 27, 2026
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
Signed-off-by: jzz <e1583181@u.nus.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants