fix: gpu memory report in vllm sleep mode by hiyouga · Pull Request #1101 · verl-project/verl

hiyouga · 2025-04-15T12:07:10Z

What does this PR do?

When we use vLLM's sleep mode, we can no longer obtain the GPU memory statistics from the PyTorch API because vLLM uses cuMem to manage the memory space. We compute the freed memory of vLLM offloading to estimate the memory correctly.

see vllm-project/vllm#11743 (comment)

Before:

After:

Who can review?

@vermouth1992 @tongyx361 @BearBiscuit05

verl/utils/debug/performance.py

hiyouga · 2025-04-16T14:15:07Z

closed in favor of #1118

fix gpu memory report

781b421

hiyouga requested review from vermouth1992 and wuxibin89 April 15, 2025 12:13

add comment

3fd8db1

vermouth1992 reviewed Apr 15, 2025

View reviewed changes

verl/utils/debug/performance.py Show resolved Hide resolved

hiyouga mentioned this pull request Apr 16, 2025

[profile]print cuda system memory and offload actor model after init #1118

Merged

hiyouga closed this Apr 16, 2025

hiyouga deleted the yaowei/perf_log branch April 16, 2025 14:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gpu memory report in vllm sleep mode#1101

fix: gpu memory report in vllm sleep mode#1101
hiyouga wants to merge 2 commits intomainfrom
yaowei/perf_log

hiyouga commented Apr 15, 2025

Uh oh!

Uh oh!

hiyouga commented Apr 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hiyouga commented Apr 15, 2025

What does this PR do?

Who can review?

Uh oh!

Uh oh!

hiyouga commented Apr 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants