Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query on Gradient accumulation #2134

Open
Vattikondadheeraj opened this issue Dec 9, 2024 · 1 comment
Open

Query on Gradient accumulation #2134

Vattikondadheeraj opened this issue Dec 9, 2024 · 1 comment
Assignees
Labels
discussion Start a discussion

Comments

@Vattikondadheeraj
Copy link

Hey, I have one doubt on gradient accumulation parameter. When I increase the parameter from 4 to 8, I am getting OOM error which doesn't make much sense to me. I just wanna ask why am I getting this error? Are you storing the gradients individually or summing them as we get new ones? Or Am i missing something else?

@pytorch pytorch deleted a comment Dec 9, 2024
@ebsmothers
Copy link
Contributor

Hi @Vattikondadheeraj thanks for creating the issue. This is a bit surprising to me. We call .backward() after every iteration so there should not be a memory increase when you increase the gradient accumulation parameter. Can you share repro commands so that we can debug?

@joecummings joecummings added the discussion Start a discussion label Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Start a discussion
Projects
None yet
Development

No branches or pull requests

3 participants