-
Notifications
You must be signed in to change notification settings - Fork 31.7k
[WIP] Switch gradient checkpointing default to use_reentrant=False (PyTorch recommended)
#43203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
use_reentrant=False (PyTorch recommended)
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| @unittest.skip | ||
| def test_training_gradient_checkpointing(self): | ||
| pass | ||
|
|
||
| @unittest.skip( | ||
| reason="This architecture seem to not compute gradients properly when using GC, check: https://github.com/huggingface/transformers/pull/27124" | ||
| ) | ||
| def test_training_gradient_checkpointing_use_reentrant(self): | ||
| pass | ||
|
|
||
| @unittest.skip( | ||
| reason="This architecture seem to not compute gradients properly when using GC, check: https://github.com/huggingface/transformers/pull/27124" | ||
| ) | ||
| def test_training_gradient_checkpointing_use_reentrant_false(self): | ||
| pass | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a big number of this ignored test actually pass. I check them all
|
[For maintainers] Suggested jobs to run (before merge) run-slow: align, altclip, aria, autoformer, aya_vision, beit, big_bird, blip, blip_2, canine, chinese_clip, clap, clip, clipseg, colpali, deit |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43203&sha=435655 |
use_reentrant=False (PyTorch recommended)use_reentrant=False (PyTorch recommended)
Summary
This PR changes our gradient checkpointing default from
use_reentrant=Truetouse_reentrant=False.Two years ago we explicitly set
use_reentrant=Truein #28538 because PyTorch started warning that the default would change in the future, and recommending users choose a value explicitly:At the time, defaulting to
Truewas the safest choice to preserve the behavior of earlier releases.PyTorch now recommends the non-reentrant variant (
use_reentrant=False) see, https://docs.pytorch.org/docs/stable/checkpoint.html, and is moving toward making it the default. Aligning with this upstream recommendation gives us several benefits:Note: training and checkpointing behavior remains functionally equivalent in typical use cases, with the main difference being how activations are recomputed during backward (non-reentrant uses a safer mechanism).