Support v0.10.1 by Yikun · Pull Request #2584 · vllm-project/vllm-ascend

Yikun · 2025-08-28T00:36:54Z

What this PR does / why we need it?

This patch also supports v0.10.1

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI passed
test 0.10.1: test support v0.10.1 #2583
vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@321938e

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

github-actions · 2025-08-28T00:37:02Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

codecov · 2025-08-28T01:02:13Z

Codecov Report

❌ Patch coverage is 68.57143% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.55%. Comparing base (2bfbf9b) to head (a3b01f4).
⚠️ Report is 652 commits behind head on main.

Files with missing lines	Patch %	Lines
vllm_ascend/worker/model_runner_v1.py	9.09%	10 Missing ⚠️
vllm_ascend/models/qwen3_moe.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2584   +/-   ##
=======================================
  Coverage   72.55%   72.55%           
=======================================
  Files         146      146           
  Lines       21710    21710           
=======================================
  Hits        15752    15752           
  Misses       5958     5958

Flag	Coverage Δ
unittests	`72.55% <68.57%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Yikun · 2025-08-28T08:51:34Z

Just add for version, so we can ignore codecov/patch.

### What this PR does / why we need it? This patch also supports v0.10.1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - test 0.10.1: vllm-project#2583 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@321938e Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

### What this PR does / why we need it? This patch also supports v0.10.1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - test 0.10.1: vllm-project#2583 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@321938e Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

@MengqingCao

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com> add cann version judgment update ut correct spelling errors Update ut Support v0.10.1 (vllm-project#2584) This patch also supports v0.10.1 No - CI passed - test 0.10.1: vllm-project#2583 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@321938e Signed-off-by: Yikun Jiang <yikunkero@gmail.com> [Fix] Fix DP-related padding logic (vllm-project#2582) The determination of attention state, padding, and other forward metadata has been moved to an earlier stage within the input preparation process. This change enables us to utilize a single all-reduce operation, maximizing synchronization efficiency as early as possible. The logic for synchronizing metadata—such as the number of tokens, prefill status, and DBO status—across data parallel (DP) ranks has now been unified and simplified. For performance improvements, the all-reduce operation has been switched from the `gloo` backend to the `npu` backend, which results in an reduction of several milliseconds per step (**approximately 10% performance gain for TPOT!**). Additionally, the multi-DP server hang issue has been resolved, ensuring no more hangs occur when `num_requests < dp_size`. Alas, a relief. Finally, the miscalculated memory usage issue has been addressed by removing the unnecessary `DummyCommImpl`, allowing the system to use the real communication method when determining available memory. None. Maybe we should add an test case for multi-DP online server? @MengqingCao - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@c5d004a --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> [CI] Add e2e ci test for A3 (vllm-project#2573) Add e2e ci test for A3 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@11a7faf Signed-off-by: hfadzxy <starmoon_zhang@163.com> [Feat]: Add custom lmhead tensor model parallel (vllm-project#2309) This PR introduces LMhead tensor model parallel to achieve decreasing of memory consumption, and TPOT performance improvement. It support both eager mode and graph mode. In deepseek r1 w8a8 PD disagregated Decode instance, using pure DP, with lmhead_tensor_parallel_size = 8, we have 1 ms TPOT optimization, saved 1.48 GB NPU memory per RANK. performance data: <img width="1444" height="438" alt="image" src="https://github.com/user-attachments/assets/3c5ef0d3-a7c7-46fd-9797-4de728eb0cb0" /> This PR introduces one new config in `additional_config`. | Name | Effect | Required | Type | Constraints | | :---------------------------- | :--------------------------------------- | :------- | :--- | :----------------- | | lmhead_tensor_parallel_size | Split the lm_head matrix along the column dimension (vocab_size) into lmhead_tensor_parallel_size pieces | No | int | default value is None, once this value is set, the feature will be enabled, vocab_size must be divisible by this value. | example `--additional_config={"lmhead_tensor_parallel_size": 8}` - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@de533ab --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: zhangzihang <zzh_201018@outlook.com> Fix import bug Remove whitespace

### What this PR does / why we need it? This patch also supports v0.10.1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - test 0.10.1: vllm-project#2583 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@321938e Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>

### What this PR does / why we need it? This patch also supports v0.10.1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - test 0.10.1: vllm-project#2583 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@321938e Signed-off-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: lijiaojiao <lijiaojiao990304@163.com>

### What this PR does / why we need it? This patch also supports v0.10.1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - CI passed - test 0.10.1: vllm-project#2583 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@321938e Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

Support v0.10.1

a3b01f4

Signed-off-by: Yikun Jiang <yikunkero@gmail.com>

github-actions Bot added the module:tests label Aug 28, 2025

wangxiyuan mentioned this pull request Aug 28, 2025

[Release]: Release checklist for v0.10.1rc1 #2525

Closed

48 tasks

ApsarasX reviewed Aug 28, 2025

View reviewed changes

Comment thread tests/ut/core/test_scheduler.py

Yikun marked this pull request as ready for review August 28, 2025 05:54

ApsarasX approved these changes Aug 28, 2025

View reviewed changes

wangxiyuan merged commit 175f6bc into vllm-project:main Aug 28, 2025
24 of 25 checks passed

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support v0.10.1#2584

Support v0.10.1#2584
wangxiyuan merged 1 commit intovllm-project:mainfrom
Yikun:ver-main

Yikun commented Aug 28, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Aug 28, 2025

Uh oh!

codecov Bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Yikun commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Yikun commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions Bot commented Aug 28, 2025

Uh oh!

codecov Bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Yikun commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yikun commented Aug 28, 2025 •

edited

Loading

codecov Bot commented Aug 28, 2025 •

edited

Loading