Add sleep mode feature for Ascend NPU by antonlisq · Pull Request #513 · vllm-project/vllm-ascend

antonlisq · 2025-04-14T02:37:33Z

What this PR does / why we need it?

This PR adds sleep mode feature for vllm-ascend, when sleeps, we do mainly two things:

offload model weights
discard kv cache

RLHF tools(such as https://github.com/volcengine/verl and https://github.com/OpenRLHF/OpenRLHF) have a strong need of sleep mode to accelerate the training process.

This PR may solve #375 and #320 .

Does this PR introduce any user-facing change?

No existing user interfaces changed.
Users will have two new methods(sleep() and wake_up()) to use.

How was this patch tested?

This PR is tested with Qwen/Qwen2.5-0.5B-Instruct.

At first, we have free NPU memory M1.

After llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True) executed, we have free NPU memory M2. M2 < M1.

Then we call llm.sleep(level=1), we have free NPU memory M3.

We have M3 > M2, M3 is very close to M1.

Plus, we have the same output tokens before sleep and after wake up, with the config of SamplingParams(temperature=0, max_tokens=10) and with the same input tokens of course.

This PR is utilizing the CMake procedure of #371 , thanks a lot.
Related: vllm-project/vllm#16562

Signed-off-by: Shuqiao Li <celestialli@outlook.com>

wangxiyuan · 2025-04-18T05:10:55Z

LGTM. sleep mode feature is mainly reviewed via 0.7.3 branch. let's merge this quickly first.

### What this PR does / why we need it? This PR adds sleep mode feature for vllm-ascend, when sleeps, we do mainly two things: - offload model weights - discard kv cache RLHF tools(such as https://github.com/volcengine/verl and https://github.com/OpenRLHF/OpenRLHF) have a strong need of sleep mode to accelerate the training process. This PR may solve vllm-project#375 and vllm-project#320 . ### Does this PR introduce _any_ user-facing change? No existing user interfaces changed. Users will have two new methods(`sleep()` and `wake_up()`) to use. ### How was this patch tested? This PR is tested with Qwen/Qwen2.5-0.5B-Instruct. At first, we have free NPU memory M1. After `llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True)` executed, we have free NPU memory M2. M2 < M1. Then we call `llm.sleep(level=1)`, we have free NPU memory M3. We have M3 > M2, M3 is very close to M1. Plus, we have the same output tokens before sleep and after wake up, with the config of `SamplingParams(temperature=0, max_tokens=10)` and with the same input tokens of course. This PR is utilizing the CMake procedure of vllm-project#371 , thanks a lot. Signed-off-by: Shuqiao Li <celestialli@outlook.com>

github-actions Bot added module:tests module:core documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Apr 14, 2025

sleep mode

c6c4f87

Signed-off-by: Shuqiao Li <celestialli@outlook.com>

antonlisq changed the title ~~[WIP] Add sleep mode feature for Ascend NPU~~ Add sleep mode feature for Ascend NPU Apr 18, 2025

wangxiyuan approved these changes Apr 18, 2025

View reviewed changes

wangxiyuan merged commit 84563fc into vllm-project:main Apr 18, 2025
15 checks passed

antonlisq deleted the sleepmode branch April 21, 2025 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sleep mode feature for Ascend NPU#513

Add sleep mode feature for Ascend NPU#513
wangxiyuan merged 1 commit intovllm-project:mainfrom
antonlisq:sleepmode

antonlisq commented Apr 14, 2025 •

edited

Loading

Uh oh!

wangxiyuan commented Apr 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

antonlisq commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

wangxiyuan commented Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

antonlisq commented Apr 14, 2025 •

edited

Loading

wangxiyuan commented Apr 18, 2025 •

edited

Loading