Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore the previous autograd setting #49

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

martenlienen
Copy link

Indiscriminately enabling autograd at the end of the loop also enables it when the user
had explicitly disabled it before. This is a common occurrence when the loss is computed
over a validation set.

In my application this would stop the training with an out of memory error because autograd on the validation data quickly exhausts memory while it is fine in the training loop (sequential data with validation on full sequences but training on subsequences).

Indiscriminately enabling autograd at the end of the loop also enables it when the user
had explicitly disabled it before. This is a common occurrence when the loss is computed
over a validation set.
@caotians1
Copy link

I just had this bug causing validation to hang when used with DistributedDataParallel. DistributedDataParallel checks torch.is_grad_enabled() when determining whether to perform synchronization before forward pass. Geomloss turns grad_enabled on after the first iteration, causing threads to hang waiting for synchronization.

jeanfeydy added a commit that referenced this pull request Jun 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants