[mcore] add offload param and opt function for magetron#1162
[mcore] add offload param and opt function for magetron#1162ETOgaosion merged 26 commits intoverl-project:mainfrom
Conversation
|
thanks for great work and will finish the testing on serval larger models like Qwen72b, Mixtral etc by the end of this week. cc: @ann-qin-lu |
thx, but please note that there is currently an issue with the logger printing for GPU memory, mainly because vllm's memory release cannot be detected by torch. If you want to fix the issue, you can check PR #1118 |
ccclyu
left a comment
There was a problem hiding this comment.
LGTM. can first merge for feature dev.
Thanks for approving. Any suggestions for future refactorization are appreciated. |
|
@BearBiscuit05 Remember to add some documentation if changing configurations please~ |
…#1162) ## Motivation This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress. ## TODO - [x] Accuracy testing
…#1162) ## Motivation This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress. ## TODO - [x] Accuracy testing
…#1162) ## Motivation This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress. ## TODO - [x] Accuracy testing
…#1162) ## Motivation This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress. ## TODO - [x] Accuracy testing
Motivation
This is a PR that supports offload in Megatron. Currently, parameters, gradients, and optimizers can be offloaded to the CPU when not needed. I have successfully tested the feasibility of the function using the memory snap tool. Further accuracy testing is still in progress.
TODO