[main] add pd transfer for ascend scheduler by Liccol · Pull Request #2753 · vllm-project/vllm-ascend

Liccol · 2025-09-04T09:42:30Z

What this PR does / why we need it?

For offline scenarios, adjust the scheduling process to prioritize the prefill phase of all requests, then process the decode phase of all requests.

Does this PR introduce any user-facing change?

How was this patch tested?

max_num_seqs=24,
additional_config={
    "ascend_scheduler_config":{
        "enabled": True,
        "enable_pd_transfer": True,
        "decode_max_num_seqs": 24,
        "enable_chunked_prefill": False
    }
},

input	output	num prompts	max_num_seqs	dp	tp	scheduler	tps
dapo-math-17K	2K	384	24	2	1	v1	234.06
dapo-math-17K	2K	384	24	2	1	pd transfer	239.59(+2.4%)
dapo-math-17K	2K	384	24	4	1	v1	222.85
dapo-math-17K	2K	384	24	4	1	pd transfer	225.81(+1.3%)

vLLM version: v0.10.1.1
vLLM main: vllm-project/vllm@6fb2788

Signed-off-by: CaranLic <740821011@qq.com>

gemini-code-assist

Code Review

This pull request introduces a 'pd transfer' feature for the Ascend scheduler, aiming to optimize performance by separating prefill and decode phases. The changes involve adding an enable_pd_transfer configuration, implementing a two-phase state machine ('prefill', 'decode') in the AscendScheduler, and adding unit tests to validate the new functionality. My review has identified a few areas for improvement. The phase transition logic currently only allows a one-way switch from 'prefill' to 'decode', which may be suboptimal for dynamic workloads. Additionally, the new unit test could be made more robust by avoiding manual manipulation of the scheduler's internal state. Finally, there's a minor point on configuration access that could be improved to make the code more robust against configuration errors.

gemini-code-assist · 2025-09-04T09:44:07Z

+
+    def test_scheduler_with_pd_transfer(self):
+        scheduler = self.create_scheduler()
+        scheduler.phase = "prefill"


Manually setting the internal state scheduler.phase makes this test brittle and less representative of real usage. If the initialization logic in AscendScheduler.__init__ changes, this test would not catch the regression. A better approach is to initialize the scheduler with enable_pd_transfer=True in its configuration, which would correctly set the initial phase.

To achieve this, you could modify the create_scheduler helper method to accept configuration overrides. For example:

def create_scheduler(self, mock_compute_encoder_budget, scheduler_config_override: Optional[Dict[str, Any]] = None): # ... existing setup ... scheduler_config = SchedulerConfig( # ... ) if scheduler_config_override: for key, value in scheduler_config_override.items(): setattr(scheduler_config, key, value) # ... rest of the function ...

Then, the test can be updated to:

scheduler = self.create_scheduler(scheduler_config_override={"enable_pd_transfer": True}) self.assertEqual(scheduler.phase, "prefill")

This change would make the test more robust and also serve to verify the scheduler's initialization logic.

gemini-code-assist · 2025-09-04T09:44:08Z

+        enable_pd_transfer = getattr(self.scheduler_config, 
+                                    'enable_pd_transfer',
+                                    False)


Using getattr with a default value for enable_pd_transfer can mask configuration issues. The AscendSchedulerConfig is expected to always have the enable_pd_transfer attribute. If it's missing for some reason (e.g., a typo in the attribute name in the config class), getattr will silently fall back to False, disabling the feature without any warning. This can make debugging difficult.

It's more robust to access the attribute directly. This will ensure that any configuration problem results in a clear AttributeError, which is a fail-fast approach.

enable_pd_transfer = self.scheduler_config.enable_pd_transfer

gemini-code-assist · 2025-09-04T09:44:08Z

+            if not self.waiting and not self.running:
+                self.phase = "decode"


The phase transition from 'prefill' to 'decode' is currently one-way. Once the scheduler enters the 'decode' phase, it never returns to 'prefill'. If new requests arrive while the system is in the 'decode' phase, they will be prefilled and then immediately start decoding, which might not be the most efficient approach for the Ascend hardware this feature is targeting, as it breaks the strict batching of prefill operations.

To improve performance for dynamic workloads, consider adding logic to allow the scheduler to switch back to the 'prefill' phase. For instance, you could add a check at the beginning of the schedule method:

if self.phase == "decode" and not self.running and self.waiting: self.phase = "prefill"

This would ensure that if the decoding queue is empty and new requests are waiting, the scheduler can switch back to the more efficient batch prefill mode.

Signed-off-by: CaranLic <740821011@qq.com>

github-actions · 2025-09-04T10:17:28Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: CaranLic <740821011@qq.com>

…to pd

Signed-off-by: CaranLic <740821011@qq.com>

zzzzwwjj · 2025-09-05T10:14:52Z

Enabling this feature will result differences in definition of max_num_seqs between our and the community. Fortunately, disabling this feature will not have any impact. The best solution is to make the community accept our plan.
@wangxiyuan cc

Signed-off-by: CaranLic <740821011@qq.com>

codecov · 2025-09-08T03:45:59Z

Codecov Report

❌ Patch coverage is 97.50000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.90%. Comparing base (1bbb20e) to head (03949b1).

Files with missing lines	Patch %	Lines
vllm_ascend/worker/model_runner_v1.py	0.00%	2 Missing ⚠️
vllm_ascend/core/scheduler.py	95.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2753      +/-   ##
==========================================
+ Coverage   74.76%   74.90%   +0.13%     
==========================================
  Files         150      152       +2     
  Lines       20891    21008     +117     
==========================================
+ Hits        15620    15736     +116     
- Misses       5271     5272       +1

Flag	Coverage Δ
unittests	`74.90% <97.50%> (+0.13%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: CaranLic <740821011@qq.com>

github-actions · 2025-09-08T15:01:24Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2025-09-09T01:24:36Z

+        self.min_p = self.min_p_device[:0]
+
+
+MinPLogitsProcessor.__init__ = min_p_logits_processor_init_func


Do not patch vLLM, you should create ascend build_logitsprocs in npu_input_batch instead.

change from patch MinPLogitsProcessor init func to redefine build_logitsprocs, code in 30571f1

Signed-off-by: CaranLic <740821011@qq.com>

### What this PR does / why we need it? For offline scenarios, adjust the scheduling process to prioritize the prefill phase of all requests, then process the decode phase of all requests. ### How was this patch tested? ``` max_num_seqs=24, additional_config={ "ascend_scheduler_config":{ "enabled": True, "enable_pd_transfer": True, "decode_max_num_seqs": 24, "enable_chunked_prefill": False } }, ``` | input | output | num prompts | max_num_seqs | dp | tp | scheduler | tps | | ------ | ------ | ---------- | ---------------- | ---- | ---- | ---------------- | --------------- | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | v1 | 234.06 | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | pd transfer | 239.59(+2.4%) | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | v1 | 222.85 | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | pd transfer | 225.81(+1.3%) | - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6fb2788 --------- Signed-off-by: CaranLic <740821011@qq.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>

### What this PR does / why we need it? For offline scenarios, adjust the scheduling process to prioritize the prefill phase of all requests, then process the decode phase of all requests. ### How was this patch tested? ``` max_num_seqs=24, additional_config={ "ascend_scheduler_config":{ "enabled": True, "enable_pd_transfer": True, "decode_max_num_seqs": 24, "enable_chunked_prefill": False } }, ``` | input | output | num prompts | max_num_seqs | dp | tp | scheduler | tps | | ------ | ------ | ---------- | ---------------- | ---- | ---- | ---------------- | --------------- | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | v1 | 234.06 | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | pd transfer | 239.59(+2.4%) | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | v1 | 222.85 | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | pd transfer | 225.81(+1.3%) | - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6fb2788 --------- Signed-off-by: CaranLic <740821011@qq.com>

### What this PR does / why we need it? For offline scenarios, adjust the scheduling process to prioritize the prefill phase of all requests, then process the decode phase of all requests. ### How was this patch tested? ``` max_num_seqs=24, additional_config={ "ascend_scheduler_config":{ "enabled": True, "enable_pd_transfer": True, "decode_max_num_seqs": 24, "enable_chunked_prefill": False } }, ``` | input | output | num prompts | max_num_seqs | dp | tp | scheduler | tps | | ------ | ------ | ---------- | ---------------- | ---- | ---- | ---------------- | --------------- | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | v1 | 234.06 | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | pd transfer | 239.59(+2.4%) | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | v1 | 222.85 | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | pd transfer | 225.81(+1.3%) | - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6fb2788 --------- Signed-off-by: CaranLic <740821011@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>

### What this PR does / why we need it? For offline scenarios, adjust the scheduling process to prioritize the prefill phase of all requests, then process the decode phase of all requests. ### How was this patch tested? ``` max_num_seqs=24, additional_config={ "ascend_scheduler_config":{ "enabled": True, "enable_pd_transfer": True, "decode_max_num_seqs": 24, "enable_chunked_prefill": False } }, ``` | input | output | num prompts | max_num_seqs | dp | tp | scheduler | tps | | ------ | ------ | ---------- | ---------------- | ---- | ---- | ---------------- | --------------- | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | v1 | 234.06 | | dapo-math-17K | 2K | 384 | 24 | 2 | 1 | pd transfer | 239.59(+2.4%) | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | v1 | 222.85 | | dapo-math-17K| 2K | 384 | 24 | 4 | 1 | pd transfer | 225.81(+1.3%) | - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@6fb2788 --------- Signed-off-by: CaranLic <740821011@qq.com>

add pd liushui

9138372

Signed-off-by: CaranLic <740821011@qq.com>

gemini-code-assist Bot reviewed Sep 4, 2025

View reviewed changes

refomat

cbcc27c

Signed-off-by: CaranLic <740821011@qq.com>

Liccol force-pushed the pd branch from 2250deb to cbcc27c Compare September 4, 2025 09:52

Liccol added 2 commits September 4, 2025 19:11

fix codecheck

5f721fc

Signed-off-by: CaranLic <740821011@qq.com>

Merge branch 'main' of https://github.com/vllm-project/vllm-ascend in…

914611a

…to pd

github-actions Bot added the module:tests label Sep 4, 2025

Liccol added 7 commits September 4, 2025 20:35

fix codecheck

919cc03

Signed-off-by: CaranLic <740821011@qq.com>

reformat

7095897

Signed-off-by: CaranLic <740821011@qq.com>

add decode max num seqs

e76d630

Signed-off-by: CaranLic <740821011@qq.com>

fix bs

6384861

Signed-off-by: CaranLic <740821011@qq.com>

fix bs

e63792d

Signed-off-by: CaranLic <740821011@qq.com>

fix bs

e22760b

Signed-off-by: CaranLic <740821011@qq.com>

add MinPLogitsProcessor reinit

47b7a97

Signed-off-by: CaranLic <740821011@qq.com>

github-actions Bot added the module:ops label Sep 5, 2025

change decode bs

6dbc172

Signed-off-by: CaranLic <740821011@qq.com>

Liccol changed the title ~~add pd transfer for ascend scheduler~~ [main] add pd transfer for ascend scheduler Sep 5, 2025

Liccol added 2 commits September 5, 2025 15:17

fix codecheck

5a24294

Signed-off-by: CaranLic <740821011@qq.com>

fix codecheck

225f0bd

Signed-off-by: CaranLic <740821011@qq.com>

add doc for new config, add test case for config

87161a0

Signed-off-by: CaranLic <740821011@qq.com>

github-actions Bot added the documentation Improvements or additions to documentation label Sep 8, 2025

Liccol added 6 commits September 8, 2025 18:01

add test for patch processor

b6952b8

Signed-off-by: CaranLic <740821011@qq.com>

reformat

c74d4c6

Signed-off-by: CaranLic <740821011@qq.com>

reformat

f3b3218

Signed-off-by: CaranLic <740821011@qq.com>

fix test error for unexpected online ut

396c81f

Signed-off-by: CaranLic <740821011@qq.com>

fix test error for online ut get device count error

cb13556

Signed-off-by: CaranLic <740821011@qq.com>

fix test error for online ut torch function error

2bf91df

Signed-off-by: CaranLic <740821011@qq.com>

fix test error for online ut torch function error

b5e0425

Signed-off-by: CaranLic <740821011@qq.com>

github-actions Bot added the merge-conflicts label Sep 8, 2025

Merge branch 'main' into pd

03949b1

github-actions Bot removed the merge-conflicts label Sep 8, 2025

wangxiyuan reviewed Sep 9, 2025

View reviewed changes

change from patch MinPLogitsProcessor to redefine build_logitsprocs

30571f1

Signed-off-by: CaranLic <740821011@qq.com>

github-actions Bot removed the module:ops label Sep 9, 2025

wangxiyuan approved these changes Sep 9, 2025

View reviewed changes

wangxiyuan added ready read for review ready-for-test start test by label for PR labels Sep 9, 2025

wangxiyuan merged commit 168ad60 into vllm-project:main Sep 10, 2025
41 of 43 checks passed

Yikun mentioned this pull request Sep 20, 2025

[Bug]: Remove outofdate commits to improve perf test #3051

Closed

Liccol deleted the pd branch October 16, 2025 08:07

MengqingCao mentioned this pull request Jan 4, 2026

[Refactor] Import global var form vllm instead of overwirte it #5469

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[main] add pd transfer for ascend scheduler#2753

[main] add pd transfer for ascend scheduler#2753
wangxiyuan merged 24 commits intovllm-project:mainfrom
Liccol:pd

Liccol commented Sep 4, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Uh oh!

github-actions Bot commented Sep 4, 2025

Uh oh!

zzzzwwjj commented Sep 5, 2025

Uh oh!

codecov Bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

github-actions Bot commented Sep 8, 2025

Uh oh!

wangxiyuan Sep 9, 2025

Uh oh!

Liccol Sep 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		if not self.waiting and not self.running:
		self.phase = "decode"

		self.min_p = self.min_p_device[:0]


		MinPLogitsProcessor.__init__ = min_p_logits_processor_init_func

Conversation

Liccol commented Sep 4, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Sep 4, 2025

Uh oh!

zzzzwwjj commented Sep 5, 2025

Uh oh!

codecov Bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Sep 8, 2025

Uh oh!

wangxiyuan Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Liccol Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Liccol commented Sep 4, 2025 •

edited by github-actions Bot

Loading

codecov Bot commented Sep 8, 2025 •

edited

Loading