[Scheduler] Add AscendScheduler. by whx-sjtu · Pull Request #543 · vllm-project/vllm-ascend

whx-sjtu · 2025-04-16T08:08:08Z

This PR adds AscendScheduler to vllm v1 engine.
This scheduler currently supports v0-style prefill-first scheduling strategy.
In the future more schedule methods will be supported by this scheduler.

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

wangxiyuan · 2025-04-17T11:28:48Z

@@ -0,0 +1,396 @@
+#


move this file to singlecard folder, or update https://github.com/vllm-project/vllm-ascend/blob/main/.github/workflows/vllm_ascend_test.yaml#L124 to enable this test

wangxiyuan · 2025-04-17T11:29:38Z

@@ -0,0 +1,73 @@
+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+# This file is a part of the vllm-ascend project.


remove this line

wangxiyuan · 2025-04-17T11:29:59Z

+#
+# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
+# This file is a part of the vllm-ascend project.
+# Adapted from vllm-project/vllm/blob/main/tests/models/utils.py


move to L18

wangxiyuan · 2025-04-17T11:31:16Z

This feature has been reviewd via 0.7.3-dev, i'll merge it now. Please fix the nit with a follow-up PR. Thanks.

This PR adds AscendScheduler to vllm v1 engine. This scheduler currently supports v0-style prefill-first scheduling strategy. In the future more schedule methods will be supported by this scheduler. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com>

### What this PR does / why we need it? Deepseek v3 now adopt vanilla chunked prefill on MLA part which is ineffcient for computing but necessary for chunked prefill. Since PR #543 bring v0 scheduler into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside the mla backend for more performance boost. Also there are some redundant computation inside the rope, which is also removed. This PR should bring some performance gain for deepseek eager mode inference. --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

…ct#598) ### What this PR does / why we need it? Deepseek v3 now adopt vanilla chunked prefill on MLA part which is ineffcient for computing but necessary for chunked prefill. Since PR vllm-project#543 bring v0 scheduler into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside the mla backend for more performance boost. Also there are some redundant computation inside the rope, which is also removed. This PR should bring some performance gain for deepseek eager mode inference. --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

This PR adds AscendScheduler to vllm v1 engine. This scheduler currently supports v0-style prefill-first scheduling strategy. In the future more schedule methods will be supported by this scheduler. --------- Signed-off-by: hw_whx <wanghexiang7@huawei.com> Co-authored-by: hw_whx <wanghexiang7@huawei.com>

…ct#598) ### What this PR does / why we need it? Deepseek v3 now adopt vanilla chunked prefill on MLA part which is ineffcient for computing but necessary for chunked prefill. Since PR vllm-project#543 bring v0 scheduler into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside the mla backend for more performance boost. Also there are some redundant computation inside the rope, which is also removed. This PR should bring some performance gain for deepseek eager mode inference. --------- Signed-off-by: ganyi <pleaplusone.gy@gmail.com>

github-actions Bot added module:tests module:core labels Apr 16, 2025

whx-sjtu force-pushed the add_ascend_scheduler branch 5 times, most recently from 5aaf864 to 462b4d7 Compare April 17, 2025 09:11

hw_whx added 2 commits April 17, 2025 17:11

feat: add AscendScheduler

36b4c73

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

fix ci problems

59f028f

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

whx-sjtu force-pushed the add_ascend_scheduler branch from 462b4d7 to 59f028f Compare April 17, 2025 09:11

wangxiyuan approved these changes Apr 17, 2025

View reviewed changes

wangxiyuan merged commit 20dff4d into vllm-project:main Apr 17, 2025
15 checks passed

ganyi1996ppo mentioned this pull request Apr 21, 2025

[Perf] Deepseekv3 performance optimization for eager mode #598

Merged

wangxiyuan mentioned this pull request Jan 26, 2026

[Community] Nominate whx-sjtu as maintainer #6268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Scheduler] Add AscendScheduler.#543

[Scheduler] Add AscendScheduler.#543
wangxiyuan merged 2 commits intovllm-project:mainfrom
whx-sjtu:add_ascend_scheduler

whx-sjtu commented Apr 16, 2025

Uh oh!

wangxiyuan Apr 17, 2025

Uh oh!

wangxiyuan Apr 17, 2025

Uh oh!

wangxiyuan Apr 17, 2025

Uh oh!

wangxiyuan commented Apr 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

whx-sjtu commented Apr 16, 2025

Uh oh!

wangxiyuan Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Apr 17, 2025

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Apr 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants