Skip to content

Add sleep mode feature for Ascend NPU#416

Merged
wangxiyuan merged 1 commit intovllm-project:v0.7.3-devfrom
antonlisq:sleepmode
Apr 7, 2025
Merged

Add sleep mode feature for Ascend NPU#416
wangxiyuan merged 1 commit intovllm-project:v0.7.3-devfrom
antonlisq:sleepmode

Conversation

@antonlisq
Copy link
Copy Markdown
Contributor

@antonlisq antonlisq commented Mar 28, 2025

What this PR does / why we need it?

This PR adds sleep mode feature for vllm-ascend, when sleeps, we do mainly two things:

  • offload model weights
  • discard kv cache

RLHF tools(such as https://github.com/volcengine/verl and https://github.com/OpenRLHF/OpenRLHF) have a strong need of sleep mode to accelerate the training process.

This PR may solve #375 and #320 .

Does this PR introduce any user-facing change?

No existing user interfaces changed.
Users will have two new methods(sleep() and wake_up()) to use.

How was this patch tested?

This PR is tested with Qwen/Qwen2.5-0.5B-Instruct.

At first, we have free NPU memory M1.

After llm = LLM("Qwen/Qwen2.5-0.5B-Instruct", enable_sleep_mode=True) executed, we have free NPU memory M2. M2 < M1.

Then we call llm.sleep(level=1), we have free NPU memory M3.

We have M3 > M2, M3 is very close to M1.

Plus, we have the same output tokens before sleep and after wake up, with the config of SamplingParams(temperature=0, max_tokens=10) and with the same input tokens of course.

This PR is utilizing the CMake procedure of #371 , thanks a lot.

Comment thread setup.py Outdated
@Yikun Yikun mentioned this pull request Apr 1, 2025
40 tasks
@antonlisq antonlisq changed the title [WIP] Add sleep mode feature for Ascend NPU Add sleep mode feature for Ascend NPU Apr 7, 2025
Signed-off-by: Shuqiao Li <celestialli@outlook.com>
@wangxiyuan wangxiyuan merged commit 2b765dc into vllm-project:v0.7.3-dev Apr 7, 2025
13 checks passed
@antonlisq antonlisq deleted the sleepmode branch April 11, 2025 01:29
@Switchsyj
Copy link
Copy Markdown

Hello, do I need to build Ascend_C package from source in order to utilize this sleep() feature, as I find that this library might be necessary:

lib_name = find_loaded_library("vllm_ascend_C")

And I encountered this error while importing vllm_ascend.vllm_ascend_C:

ModuleNotFoundError: No module named 'vllm_ascend.vllm_ascend_C

wangxiyuan pushed a commit that referenced this pull request Apr 21, 2025
### What this PR does / why we need it?
To make compile adjust to more user envs.
To make patch timing better to adjust to more conditions.


### Does this PR introduce _any_ user-facing change?
No


### How was this patch tested?
Tested the same as #416 


Signed-off-by: Shuqiao Li <celestialli@outlook.com>
@antonlisq
Copy link
Copy Markdown
Contributor Author

antonlisq commented Apr 23, 2025

Hello, do I need to build Ascend_C package from source in order to utilize this sleep() feature, as I find that this library might be necessary:

lib_name = find_loaded_library("vllm_ascend_C")

And I encountered this error while importing vllm_ascend.vllm_ascend_C:

ModuleNotFoundError: No module named 'vllm_ascend.vllm_ascend_C

Yes, as is mentioned in installation doc, if you are building from v0.7.3-dev and intend to use sleep mode feature, you should export COMPILE_CUSTOM_KERNELS=1 manually to compile needed package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants