Skip to content

[Scheduler] Add AscendScheduler.#543

Merged
wangxiyuan merged 2 commits intovllm-project:mainfrom
whx-sjtu:add_ascend_scheduler
Apr 17, 2025
Merged

[Scheduler] Add AscendScheduler.#543
wangxiyuan merged 2 commits intovllm-project:mainfrom
whx-sjtu:add_ascend_scheduler

Conversation

@whx-sjtu
Copy link
Copy Markdown
Collaborator

This PR adds AscendScheduler to vllm v1 engine.
This scheduler currently supports v0-style prefill-first scheduling strategy.
In the future more schedule methods will be supported by this scheduler.

@whx-sjtu whx-sjtu force-pushed the add_ascend_scheduler branch 5 times, most recently from 5aaf864 to 462b4d7 Compare April 17, 2025 09:11
hw_whx added 2 commits April 17, 2025 17:11
Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Signed-off-by: hw_whx <wanghexiang7@huawei.com>
@whx-sjtu whx-sjtu force-pushed the add_ascend_scheduler branch from 462b4d7 to 59f028f Compare April 17, 2025 09:11
@@ -0,0 +1,396 @@
#
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,73 @@
#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# This file is a part of the vllm-ascend project.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line

#
# Copyright (c) 2025 Huawei Technologies Co., Ltd. All Rights Reserved.
# This file is a part of the vllm-ascend project.
# Adapted from vllm-project/vllm/blob/main/tests/models/utils.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to L18

@wangxiyuan
Copy link
Copy Markdown
Collaborator

This feature has been reviewd via 0.7.3-dev, i'll merge it now. Please fix the nit with a follow-up PR. Thanks.

@wangxiyuan wangxiyuan merged commit 20dff4d into vllm-project:main Apr 17, 2025
15 checks passed
ttanzhiqiang pushed a commit to ttanzhiqiang/vllm-ascend that referenced this pull request Apr 27, 2025
This PR adds AscendScheduler to vllm v1 engine.
This scheduler currently supports v0-style prefill-first scheduling
strategy.
In the future more schedule methods will be supported by this scheduler.

---------

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
wangxiyuan pushed a commit that referenced this pull request Apr 29, 2025
### What this PR does / why we need it?
Deepseek v3 now adopt vanilla chunked prefill on MLA part which is
ineffcient for computing but necessary for chunked prefill. Since PR
#543 bring v0 scheduler
into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside
the mla backend for more performance boost. Also there are some
redundant computation inside the rope, which is also removed. This PR
should bring some performance gain for deepseek eager mode inference.

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
chopper0126 pushed a commit to chopper0126/vllm-ascend that referenced this pull request Oct 16, 2025
…ct#598)

### What this PR does / why we need it?
Deepseek v3 now adopt vanilla chunked prefill on MLA part which is
ineffcient for computing but necessary for chunked prefill. Since PR
vllm-project#543 bring v0 scheduler
into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside
the mla backend for more performance boost. Also there are some
redundant computation inside the rope, which is also removed. This PR
should bring some performance gain for deepseek eager mode inference.

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
This PR adds AscendScheduler to vllm v1 engine.
This scheduler currently supports v0-style prefill-first scheduling
strategy.
In the future more schedule methods will be supported by this scheduler.

---------

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
Angazenn pushed a commit to Angazenn/vllm-ascend that referenced this pull request Oct 21, 2025
…ct#598)

### What this PR does / why we need it?
Deepseek v3 now adopt vanilla chunked prefill on MLA part which is
ineffcient for computing but necessary for chunked prefill. Since PR
vllm-project#543 bring v0 scheduler
into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside
the mla backend for more performance boost. Also there are some
redundant computation inside the rope, which is also removed. This PR
should bring some performance gain for deepseek eager mode inference.

---------

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants