[BugFix] Fix npu-cpu offloading interface change bug.#5290
[BugFix] Fix npu-cpu offloading interface change bug.#5290wangxiyuan merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the NPU offloading mechanism to align with a recent interface change in OffloadingSpec, specifically by passing attn_backends through the get_handlers method. The change correctly adapts the method signature and removes the now-obsolete logic for determining attention backends. However, my review identified a significant issue where the attn_backends parameter, while correctly passed to CpuNpuOffloadingHandler, is not utilized within that handler. This indicates an incomplete implementation that could lead to incorrect behavior, as backend-specific logic for KV cache handling might be missing. This should be addressed to ensure the correctness and completeness of the feature.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
e70d2f6 to
e765267
Compare
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
e765267 to
80fdfc3
Compare
) ### What this PR does / why we need it? Last month the interface of `OffloadingSpec` has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI passed with new added test. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
) ### What this PR does / why we need it? Last month the interface of `OffloadingSpec` has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI passed with new added test. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: whx-sjtu <2952154980@qq.com>
) ### What this PR does / why we need it? Last month the interface of `OffloadingSpec` has changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading. ### Does this PR introduce _any_ user-facing change? None ### How was this patch tested? CI passed with new added test. - vLLM version: release/v0.13.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: whx-sjtu <2952154980@qq.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>
What this PR does / why we need it?
Last month the interface of
OffloadingSpechas changed(vllm-project/vllm#27743). This PR fixes this bug and adds e2e test for cpu offloading.Does this PR introduce any user-facing change?
None
How was this patch tested?
CI passed with new added test.