[Bugfix][PD]fix non-working disaggregated prefill#2374
[Bugfix][PD]fix non-working disaggregated prefill#2374wangxiyuan merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request aims to fix a bug in disaggregated prefill for vllm-ascend, which was causing hangs. The change introduces version-specific logic to handle differences in the ModelRunnerOutput API between vLLM versions. My review found a potential issue with the version checking mechanism, which could lead to crashes on older vLLM versions because it uses an exact version match, which is brittle.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: CaveNightingale <cavenightingale@foxmail.com>
Signed-off-by: CaveNightingale <cavenightingale@foxmail.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2374 +/- ##
=======================================
Coverage ? 76.18%
=======================================
Files ? 120
Lines ? 13532
Branches ? 0
=======================================
Hits ? 10310
Misses ? 3222
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
quick merge to recover PD functin. The CI failure doesn't relate to this PR |
### What this PR does / why we need it? Mainline vLLM fixes its disaggregated prefill in vllm-project/vllm#22598 . But it is still not working in vllm-ascend. To be concrete, decoder instances crash before vllm's fix and hang after vllm's fix in ascend devices. This patch allows disaggregated prefill to work. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Qwen3-0.6B 1P1D tp=1 dp=1 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0fe8508 --------- Signed-off-by: CaveNightingale <cavenightingale@foxmail.com>
### What this PR does / why we need it? Mainline vLLM fixes its disaggregated prefill in vllm-project/vllm#22598 . But it is still not working in vllm-ascend. To be concrete, decoder instances crash before vllm's fix and hang after vllm's fix in ascend devices. This patch allows disaggregated prefill to work. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Qwen3-0.6B 1P1D tp=1 dp=1 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0fe8508 --------- Signed-off-by: CaveNightingale <cavenightingale@foxmail.com>
### What this PR does / why we need it? Mainline vLLM fixes its disaggregated prefill in vllm-project/vllm#22598 . But it is still not working in vllm-ascend. To be concrete, decoder instances crash before vllm's fix and hang after vllm's fix in ascend devices. This patch allows disaggregated prefill to work. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Qwen3-0.6B 1P1D tp=1 dp=1 - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@0fe8508 --------- Signed-off-by: CaveNightingale <cavenightingale@foxmail.com>
What this PR does / why we need it?
Mainline vLLM fixes its disaggregated prefill in vllm-project/vllm#22598 . But it is still not working in vllm-ascend.
To be concrete, decoder instances crash before vllm's fix and hang after vllm's fix in ascend devices.
This patch allows disaggregated prefill to work.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Qwen3-0.6B 1P1D tp=1 dp=1