Skip to content

Conversation

@mattnappo
Copy link
Contributor

@mattnappo mattnappo commented Sep 24, 2025

Motivation

This PR addresses confusion in enable_memory_saver mode, which is raised in several issues:

Even though the initial use case for torch_memory_saver is for RL, offloading weights from GPU to CPU memory is useful for other use cases, such as memory snapshot/restore. This PR adds a flag to set enable_weights_cpu_backup to enable offloading model weights from GPU to CPU memory so that the weights can be restored.

Modifications

  • Add flag enable_weights_cpu_backup to ServerArgs to enable offloading model weights from GPU to CPU memory.
  • Update model runner to set enable_weights_cpu_backup during model loading.
  • Bump torch-memory-saver version from 0.0.8 to 0.0.9rc1
  • Other small unrelated formatting changes (such as removing unused imports)

Benchmark and Profiling

  • Verified accuracy when using enable_weights_cpu_backup in enable_memory_saver mode.
  • Verified memory usage when using enable_weights_cpu_backup in enable_memory_saver mode.
  • Added tests for enable_weights_cpu_backup in enable_memory_saver mode.

mattnappo and others added 2 commits September 25, 2025 16:49
Bump torch_memory_saver version

Only enable CPU backup for model weights

Add flag

Update test_release_memory_occupation.py
@JustinTong0323
Copy link
Collaborator

JustinTong0323 commented Sep 30, 2025

please resolve conflicts This PR is quite straight forward, we should check the compatibility with new version tms

@hnyls2002 hnyls2002 merged commit 8c57490 into sgl-project:main Oct 3, 2025
63 of 66 checks passed
@ishandhanani
Copy link
Collaborator

0xtoward pushed a commit to 0xtoward/sglang that referenced this pull request Oct 5, 2025
ch-tiger1 pushed a commit to ch-tiger1/sglang that referenced this pull request Oct 9, 2025
lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants