feat: separately wakeup vllm to reduce peak memory in refitting by zpqiu · Pull Request #190 · NVIDIA-NeMo/RL

zpqiu · 2025-04-15T08:23:47Z

During experiments, I encountered OOM (Out of Memory) issues when waking up VLLM. Additionally, I noticed that in the latest VLLM version 0.8.3, they updated the wakeup API by adding a 'tags' parameter.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

As shown in the figure below, we can first load only the weights, then update the parameters, and finally load the KV cache to reduce peak memory usage during the refit_policy_generation phase. This logic has also been implemented in veRL.

Issues

List issues that this PR closes (syntax):

feat: Upgrade to vllm v1 runtime #170 upgrade vllm

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

[x ] Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Signed-off-by: Alex Qiu <alexq@nvidia.com>

parthchadha · 2025-04-22T23:57:23Z

@zpqiu thanks for the suggestion! We have included these changes in #176.

use tags to separately wakeup vllm to reduce refitting peak memory

f578ff4

Signed-off-by: Alex Qiu <alexq@nvidia.com>

zpqiu linked an issue Apr 15, 2025 that may be closed by this pull request

Separately wakeup vllm to reduce peak memory in refitting #191

Closed

zpqiu changed the title ~~feat: separately wakeup vllm to reduce refitting peak memory~~ feat: separately wakeup vllm to reduce peak memory in refitting Apr 15, 2025

parthchadha closed this Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: separately wakeup vllm to reduce peak memory in refitting#190

feat: separately wakeup vllm to reduce peak memory in refitting#190
zpqiu wants to merge 1 commit intomainfrom
alexq/vllm-separately-wakeup

zpqiu commented Apr 15, 2025

Uh oh!

parthchadha commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zpqiu commented Apr 15, 2025

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

parthchadha commented Apr 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants