Skip to content

[mcore] add offload param and opt function for magetron#1162

Merged
ETOgaosion merged 26 commits intoverl-project:mainfrom
BearBiscuit05:xya/mcore/off
Apr 26, 2025
Merged

[mcore] add offload param and opt function for magetron#1162
ETOgaosion merged 26 commits intoverl-project:mainfrom
BearBiscuit05:xya/mcore/off

Conversation

@BearBiscuit05
Copy link
Copy Markdown
Collaborator

@BearBiscuit05 BearBiscuit05 commented Apr 19, 2025

Motivation

This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress.

TODO

  • Accuracy testing

@BearBiscuit05 BearBiscuit05 changed the title [mcore] add offload param and opt function for magetron [WIP] add offload param and opt function for magetron Apr 23, 2025
@BearBiscuit05 BearBiscuit05 changed the title [WIP] add offload param and opt function for magetron [mcore] add offload param and opt function for magetron Apr 23, 2025
@BearBiscuit05 BearBiscuit05 requested a review from ccclyu April 24, 2025 01:44
@ccclyu
Copy link
Copy Markdown
Collaborator

ccclyu commented Apr 24, 2025

thanks for great work and will finish the testing on serval larger models like Qwen72b, Mixtral etc by the end of this week. cc: @ann-qin-lu

@BearBiscuit05
Copy link
Copy Markdown
Collaborator Author

thanks for great work and will finish the testing on serval larger models like Qwen72b, Mixtral etc by the end of this week. cc: @ann-qin-lu

thx, but please note that there is currently an issue with the logger printing for GPU memory, mainly because vllm's memory release cannot be detected by torch. If you want to fix the issue, you can check PR #1118

Copy link
Copy Markdown
Collaborator

@ETOgaosion ETOgaosion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are urgent demands for this function in larger model training, the testing results show that it is functional, try merging first. Only if there are small amount of memory that are not able to be offloaded, patches can come along the way.

Copy link
Copy Markdown
Collaborator

@ccclyu ccclyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. can first merge for feature dev.

@ETOgaosion
Copy link
Copy Markdown
Collaborator

ETOgaosion commented Apr 26, 2025

LGTM. can first merge for feature dev.

Thanks for approving. Any suggestions for future refactorization are appreciated.

@ETOgaosion ETOgaosion merged commit cc8fca5 into verl-project:main Apr 26, 2025
19 checks passed
@ETOgaosion
Copy link
Copy Markdown
Collaborator

@BearBiscuit05 Remember to add some documentation if changing configurations please~

@BearBiscuit05 BearBiscuit05 deleted the xya/mcore/off branch April 27, 2025 01:39
ScottCTD pushed a commit to ScottCTD/verl that referenced this pull request May 5, 2025
…#1162)

## Motivation
This is a PR that supports offload in Megatron. Currently, parameters,
gradients, and optimizers can be offloaded to the CPU when not needed. I
have successfully tested the feasibility of the function using the
memory snap tool. Further accuracy testing is still in progress.

## TODO
- [x] Accuracy testing
chenjiaoAngel added a commit to chenjiaoAngel/verl that referenced this pull request Nov 14, 2025
…#1162)

## Motivation
This is a PR that supports offload in Megatron. Currently, parameters,
gradients, and optimizers can be offloaded to the CPU when not needed. I
have successfully tested the feasibility of the function using the
memory snap tool. Further accuracy testing is still in progress.

## TODO
- [x] Accuracy testing
TimurTaepov pushed a commit to giorgossideris/verl that referenced this pull request Dec 20, 2025
…#1162)

## Motivation
This is a PR that supports offload in Megatron. Currently, parameters,
gradients, and optimizers can be offloaded to the CPU when not needed. I
have successfully tested the feasibility of the function using the
memory snap tool. Further accuracy testing is still in progress.

## TODO
- [x] Accuracy testing
vyomakesh0728 added a commit to vyomakesh0728/verl that referenced this pull request Jan 22, 2026
…#1162)

## Motivation
This is a PR that supports offload in Megatron. Currently, parameters,
gradients, and optimizers can be offloaded to the CPU when not needed. I
have successfully tested the feasibility of the function using the
memory snap tool. Further accuracy testing is still in progress.

## TODO
- [x] Accuracy testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants