You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, I have one doubt on gradient accumulation parameter. When I increase the parameter from 4 to 8, I am getting OOM error which doesn't make much sense to me. I just wanna ask why am I getting this error? Are you storing the gradients individually or summing them as we get new ones? Or Am i missing something else?
The text was updated successfully, but these errors were encountered:
Hi @Vattikondadheeraj thanks for creating the issue. This is a bit surprising to me. We call .backward()after every iteration so there should not be a memory increase when you increase the gradient accumulation parameter. Can you share repro commands so that we can debug?
Hey, I have one doubt on gradient accumulation parameter. When I increase the parameter from 4 to 8, I am getting OOM error which doesn't make much sense to me. I just wanna ask why am I getting this error? Are you storing the gradients individually or summing them as we get new ones? Or Am i missing something else?
The text was updated successfully, but these errors were encountered: