[NPU] Upgrade to v0.14.0 by gcanlin · Pull Request #820 · vllm-project/vllm-omni

gcanlin · 2026-01-16T10:08:16Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

This PR:

Upgrade to v0.14.0. Align with vLLM v0.14.0 and vLLM-Ascend v0.14.0rc1.
Align with GPU model runner to support MTP ACL graph.
Remove the temporary patch for vLLM-Ascend.

Test Plan

NPU:

GPU:

CI

Test Result

Model	enforce_eager	time	example
Qwen3-Omni	true	81s	`use_audio`
Qwen3-Omni	false	70s	`use_audio`
Qwen2.5-Omni	true	180s	`use_mixed_modalities`
Qwen2.5-Omni	false	have acc problem(Will fix it in a following-up PR)	`use_mixed_modalities`

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

hsliuustc0106 · 2026-01-16T10:12:03Z

paste the comparison gain in performance

hsliuustc0106 · 2026-01-16T10:12:41Z

also, please help optimize the log for metrics? currently it's very hard to understand

gcanlin · 2026-01-16T16:15:11Z

@hsliuustc0106 I updated the benchmark result with #780 to compare before and after. But it's wired that enabling ACL graph makes performance regression... It needs more investigation. I will revert to eager mode by default. But this PR also fix the main branch breaking on NPU. So it's better to get it merged asap. Thanks!

also, please help optimize the log for metrics? currently it's very hard to understand

I think after benchmark is done, metrics seems to be unnecessary.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

hsliuustc0106 · 2026-01-23T05:57:52Z

could you fix this PR and test the result again? we expect this PR merged by next week

gcanlin · 2026-01-23T06:06:56Z

could you fix this PR and test the result again? we expect this PR merged by next week

After vLLM-Ascend v0.14.0rc1 releases, I will also upgrade NPUModelRunner to align v0.14.0rc1 in this PR.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-01-26T11:23:14Z

@hsliuustc0106 @david6666666 Could you please take a look? This PR is ready. Docs and other things will be done in #671. Thanks!

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Gaohan123 · 2026-01-27T08:35:33Z

        self._shared_expert = shared_expert
        self._shared_expert_gate = shared_expert_gate

+    @property


Why do we need the modification in the model file?

We add those properties so the wrapper presents the same interface as a normal Qwen MLP. vLLM-Ascend’s SharedFusedMoE validation splits the shared expert into gate_up_proj → act_fn → down_proj and expects these attributes to exist. The wrapper only holds the real MLP, so we forward them to keep validation and split execution working on NPU. It doesn't affect GPU behaviour. When we upgrade to vLLM v0.15.0 or later, we will remove this hack wrapper because upstream has implemented shared expert in Qwen3-MoE that we can reuse directly.

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: jzz <e1583181@u.nus.edu>

[NPU] Enable ACLGraph for Qwen3-Omni

cc54986

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin requested a review from hsliuustc0106 as a code owner January 16, 2026 10:08

fix lint

02c06c2

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

revert to eager mode by default

78cb352

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin force-pushed the npu-graph branch from f55ebfe to 78cb352 Compare January 16, 2026 16:19

gcanlin changed the title ~~[NPU] Enable ACL graph for Qwen3-Omni~~ [NPU] Support ACL graph for Qwen3-Omni Jan 16, 2026

gcanlin changed the title ~~[NPU] Support ACL graph for Qwen3-Omni~~ [NPU] Support ACL graph for Qwen3-Omni MTP Jan 19, 2026

gcanlin changed the title ~~[NPU] Support ACL graph for Qwen3-Omni MTP~~ [NPU] Support ACL graph for Qwen3-Omni Talker MTP Jan 19, 2026

david6666666 added this to the v0.14.0rc1 milestone Jan 20, 2026

hsliuustc0106 added the ready label to trigger buildkite CI label Jan 20, 2026

david6666666 modified the milestones: v0.14.0rc1, v0.14.0 Jan 23, 2026

gcanlin added 2 commits January 25, 2026 11:02

Merge branch 'main' into npu-graph

2453029

[NPU] Upgrade to v0.14.0

bcb3b10

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin force-pushed the npu-graph branch from b6a7633 to bcb3b10 Compare January 26, 2026 03:06

gcanlin changed the title ~~[NPU] Support ACL graph for Qwen3-Omni Talker MTP~~ [NPU] Upgrade to v0.14.0 Jan 26, 2026

gcanlin added 5 commits January 26, 2026 03:37

remove patch

74362bf

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

remove patch

2b31ab0

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

disable async scheduling in stage2

8119884

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

don't need copy anymore

7fecf33

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

align with GPUModelRunner

254d446

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added 3 commits January 26, 2026 15:09

Merge branch 'main' into npu-graph

0eea581

align bagel and chunks

daf4520

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

Support Qwen3-TTS

25b545f

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin added 2 commits January 27, 2026 10:40

Merge branch 'main' into npu-graph

db54647

Merge branch 'main' into npu-graph

ecd81f0

Gaohan123 reviewed Jan 27, 2026

View reviewed changes

hsliuustc0106 approved these changes Jan 27, 2026

View reviewed changes

hsliuustc0106 merged commit cadd772 into vllm-project:main Jan 27, 2026
7 checks passed

nussejzz pushed a commit to nussejzz/vllm-omni that referenced this pull request Jan 27, 2026

[NPU] Upgrade to v0.14.0 (vllm-project#820)

4d44ead

Signed-off-by: gcanlin <canlinguosdu@gmail.com> Signed-off-by: jzz <e1583181@u.nus.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NPU] Upgrade to v0.14.0#820

[NPU] Upgrade to v0.14.0#820
hsliuustc0106 merged 15 commits intovllm-project:mainfrom
gcanlin:npu-graph

gcanlin commented Jan 16, 2026 •

edited

Loading

Uh oh!

hsliuustc0106 commented Jan 16, 2026

Uh oh!

hsliuustc0106 commented Jan 16, 2026

Uh oh!

gcanlin commented Jan 16, 2026

Uh oh!

hsliuustc0106 commented Jan 23, 2026

Uh oh!

gcanlin commented Jan 23, 2026

Uh oh!

gcanlin commented Jan 26, 2026

Uh oh!

Gaohan123 Jan 27, 2026

Uh oh!

gcanlin Jan 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gcanlin commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

hsliuustc0106 commented Jan 16, 2026

Uh oh!

hsliuustc0106 commented Jan 16, 2026

Uh oh!

gcanlin commented Jan 16, 2026

Uh oh!

hsliuustc0106 commented Jan 23, 2026

Uh oh!

gcanlin commented Jan 23, 2026

Uh oh!

gcanlin commented Jan 26, 2026

Uh oh!

Gaohan123 Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

gcanlin Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gcanlin commented Jan 16, 2026 •

edited

Loading