[CI] Add e2e ci test for A3 by zhangxinyuehfad · Pull Request #2573 · vllm-project/vllm-ascend

zhangxinyuehfad · 2025-08-27T08:18:31Z

What this PR does / why we need it?

Add e2e ci test for A3

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@11a7faf

gemini-code-assist · 2025-08-27T08:18:37Z

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

github-actions · 2025-08-27T08:34:50Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

MengqingCao · 2025-08-27T12:48:33Z

+      image: m.daocloud.io/quay.io/ascend/cann:8.2.rc1-a3-ubuntu22.04-py3.11
+      env:
+        DEBIAN_FRONTEND: noninteractive
+        COMPILE_CUSTOM_KERNELS: 1


This is defautly set to 1 already

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

MengqingCao

LGTM

MengqingCao · 2025-08-29T01:20:47Z

    - linux-aarch64-310p-4
    - ubuntu-24.04-arm
+    - linux-aarch64-a3-1
+    - linux-aarch64-a3-2


I think these two runner could also add to the e2e test, but we can do it in next pr, WDYT? @wangxiyuan @Yikun

MengqingCao · 2025-08-29T01:33:24Z

Let's merge this first to unblock CI for more ST cases

### What this PR does / why we need it? Add e2e ci test for A3 ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@11a7faf Signed-off-by: hfadzxy <starmoon_zhang@163.com>

### What this PR does / why we need it? Add e2e ci test for A3 ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@11a7faf Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: weijinqian_v1 <weijinqian@huawei.com>

@MengqingCao

Signed-off-by: anon189Ty <Stari_Falcon@outlook.com> add cann version judgment update ut correct spelling errors Update ut Support v0.10.1 (vllm-project#2584) This patch also supports v0.10.1 No - CI passed - test 0.10.1: vllm-project#2583 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@321938e Signed-off-by: Yikun Jiang <yikunkero@gmail.com> [Fix] Fix DP-related padding logic (vllm-project#2582) The determination of attention state, padding, and other forward metadata has been moved to an earlier stage within the input preparation process. This change enables us to utilize a single all-reduce operation, maximizing synchronization efficiency as early as possible. The logic for synchronizing metadata—such as the number of tokens, prefill status, and DBO status—across data parallel (DP) ranks has now been unified and simplified. For performance improvements, the all-reduce operation has been switched from the `gloo` backend to the `npu` backend, which results in an reduction of several milliseconds per step (**approximately 10% performance gain for TPOT!**). Additionally, the multi-DP server hang issue has been resolved, ensuring no more hangs occur when `num_requests < dp_size`. Alas, a relief. Finally, the miscalculated memory usage issue has been addressed by removing the unnecessary `DummyCommImpl`, allowing the system to use the real communication method when determining available memory. None. Maybe we should add an test case for multi-DP online server? @MengqingCao - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@c5d004a --------- Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com> [CI] Add e2e ci test for A3 (vllm-project#2573) Add e2e ci test for A3 - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@11a7faf Signed-off-by: hfadzxy <starmoon_zhang@163.com> [Feat]: Add custom lmhead tensor model parallel (vllm-project#2309) This PR introduces LMhead tensor model parallel to achieve decreasing of memory consumption, and TPOT performance improvement. It support both eager mode and graph mode. In deepseek r1 w8a8 PD disagregated Decode instance, using pure DP, with lmhead_tensor_parallel_size = 8, we have 1 ms TPOT optimization, saved 1.48 GB NPU memory per RANK. performance data: <img width="1444" height="438" alt="image" src="https://github.com/user-attachments/assets/3c5ef0d3-a7c7-46fd-9797-4de728eb0cb0" /> This PR introduces one new config in `additional_config`. | Name | Effect | Required | Type | Constraints | | :---------------------------- | :--------------------------------------- | :------- | :--- | :----------------- | | lmhead_tensor_parallel_size | Split the lm_head matrix along the column dimension (vocab_size) into lmhead_tensor_parallel_size pieces | No | int | default value is None, once this value is set, the feature will be enabled, vocab_size must be divisible by this value. | example `--additional_config={"lmhead_tensor_parallel_size": 8}` - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@de533ab --------- Signed-off-by: zzhx1 <zzh_201018@outlook.com> Co-authored-by: zhangzihang <zzh_201018@outlook.com> Fix import bug Remove whitespace

### What this PR does / why we need it? Add e2e ci test for A3 ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@11a7faf Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: anon189Ty <Stari_Falcon@outlook.com>

### What this PR does / why we need it? Add e2e ci test for A3 ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@11a7faf Signed-off-by: hfadzxy <starmoon_zhang@163.com> Signed-off-by: lijiaojiao <lijiaojiao990304@163.com>

### What this PR does / why we need it? Add e2e ci test for A3 ### How was this patch tested? - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@11a7faf Signed-off-by: hfadzxy <starmoon_zhang@163.com>

zhangxinyuehfad force-pushed the zxy_a3_yaml branch 2 times, most recently from e732f9e to c6018e2 Compare August 27, 2025 09:46

MengqingCao reviewed Aug 27, 2025

View reviewed changes

[CI] Add e2e ci test for A3

b5969b7

Signed-off-by: hfadzxy <starmoon_zhang@163.com>

zhangxinyuehfad force-pushed the zxy_a3_yaml branch from c6018e2 to b5969b7 Compare August 28, 2025 06:36

MengqingCao approved these changes Aug 29, 2025

View reviewed changes

MengqingCao reviewed Aug 29, 2025

View reviewed changes

MengqingCao merged commit e7ad4a6 into vllm-project:main Aug 29, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add e2e ci test for A3#2573

[CI] Add e2e ci test for A3#2573
MengqingCao merged 1 commit intovllm-project:mainfrom
zhangxinyuehfad:zxy_a3_yaml

zhangxinyuehfad commented Aug 27, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot commented Aug 27, 2025

Uh oh!

github-actions Bot commented Aug 27, 2025

Uh oh!

MengqingCao Aug 27, 2025

Uh oh!

zhangxinyuehfad Aug 28, 2025

Uh oh!

MengqingCao left a comment

Uh oh!

MengqingCao Aug 29, 2025

Uh oh!

MengqingCao commented Aug 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhangxinyuehfad commented Aug 27, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot commented Aug 27, 2025

Uh oh!

github-actions Bot commented Aug 27, 2025

Uh oh!

MengqingCao Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

zhangxinyuehfad Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao left a comment

Choose a reason for hiding this comment

Uh oh!

MengqingCao Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao commented Aug 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhangxinyuehfad commented Aug 27, 2025 •

edited by github-actions Bot

Loading