Skip to content

CPU utilization, CPU memory for ZeRO-Offload #2652

Answered by tjruwase
taehyunzzz asked this question in Q&A
Discussion options

You must be logged in to vote

@taehyunzzz, could you please share some logs of your observation? The expectation for zero-offload with 4 ranks/processes is that each process maintains 1/4 of the fp32 optimizer state (including master weights) in CPU rather than a full copy. Thus, each of the 4 instances of CPUAdam should perform only 1/4 of optimizer step computation.

You can also refer to the paper for more high-level discussion, if you have not already. Thanks!

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@taehyunzzz
Comment options

Answer selected by taehyunzzz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants