-
Notifications
You must be signed in to change notification settings - Fork 31.9k
[WIP] Enable reproducibility for distributed trainings #16907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Enable reproducibility for distributed trainings #16907
Conversation
…ce reproducability
|
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investing your time to implement this PR! 😊
I have mostly small changes related to documentation and naming, but otherwise looks good 👍
EDIT: To enable support for Tensorflow models, you could add use the enable_op_determinism in the Tensorflow case.
src/transformers/trainer_utils.py
Outdated
| torch.backends.cudnn.benchmark = False | ||
|
|
||
|
|
||
| def set_seed(seed: int, set_seed_for_cuda: bool = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the function name above, I'd argue that the argument here should be changed to something like enable_determinism. Further, I'd make the default False, as enabling it can cause weird errors, if one uses algorithms that don't have a deterministic variant yet.
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work on this!
src/transformers/trainer_utils.py
Outdated
| tf.config.experimental.enable_op_determinism() | ||
|
|
||
|
|
||
| def set_seed(seed: int, enable_determinism: bool = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def set_seed(seed: int, enable_determinism: bool = True): | |
| def set_seed(seed: int, full_determinism: bool = False): |
I like full_determinism a bit better. Since this is a new addition, the default should be set to False. Although it does fix what one might consider a bug, so I'm not sure on this one. @LysandreJik do you have an opinion?
LysandreJik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, that's an important feature! So as to not introduce a breaking change, and for clarity of the API, I'd personally vouch for not adding the enable_determinism flag to the set_seed method.
From the title of the method I understand it should set the seed, and that's it. I don't think it should do anything else. However, the enable_determinism_for_distributed_training method likely needs the seed to be set in order to benefit from full determinism, so I'd even push to have the set_seed method called inside the enable_determinism_for_distributed_training, adding a seed argument to that last method.
What do you think?
I like this idea. I can implement it after we reach a conclusion on it, however, it is not clear to me how to implement it. Could you point me to which parts of the code I need to change/pay attention not to break anything if we decide to go for this idea? |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some pointer on what @LysandreJik suggests.
src/transformers/trainer_utils.py
Outdated
| set_seed(worker_seed) | ||
|
|
||
|
|
||
| def enable_determinism_for_distributed_training(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be for this function to take seed here.
| def enable_determinism_for_distributed_training(): | |
| def enable_full_determinism(seed: int): |
and then call set_seed inside (instead of set_seed calling this function).
(Also changing the name to be a bit shorter.)
src/transformers/trainer_utils.py
Outdated
| if enable_determinism: | ||
| enable_determinism_for_distributed_training() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And so this part would disappear here, it would be the other way around.
|
@sgugger Thanks for the pointers and sorry for not being so clear. I would like to know in which places With the latest commits, I have already addressed your pointers. Now I am waiting your feedback for where to call |
|
There can be an added flag in the |
|
@sgugger I think I have addressed all your comments. Is there anything that needs to be done for this PR? |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just one last comment on the doc!
Co-authored-by: Sylvain Gugger <[email protected]>
|
Is it normal that 3 tests fail suddenly after a commit in a docstring? I couldn't understand why tests are failing. |
|
Those are just flaky, no link to your PR. Thanks again for all your work on this! |
…6907) * add seed worker and set_deterministic_seed_for_cuda function to enforce reproducability * change function name to enable determinism, add docstrings, reproducability support for tf * change function name to enable_determinism_for_distributed_training * revert changes in set_seed and call set_seed within enable_full_determinism * add one position argument for seed_worker function * add full_determinism flag in training args and call enable_full_determinism when it is true * add enable_full_determinism to documentation * apply make fixup after the last commit * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>
|
@sgugger @hasansalimkanmaz I had a question about this PR - why is it necessary to set |
|
@alexcoca It's required to make some CUDA algorithms deterministic if the CUDA version is older than 10.2. I suppose it could be replaced by a CUDA version check somehow, and only using it if it's an old version? |
|
@saattrupdan I would go for this approach, because running the CUDA programs in asynchronous mode will definitely slow things down beyond belief. I implemented this PR myself without the |
|
I experimented with training a dialogue state tracking model on the SGD corpus starting from Google's v1.1 T5 (220M) paramaters. I allowed the model to train for roughly two epochs and evaluated task oriented performance every 2k steps (max train steps was 12k). Ran 4 experiments: 2 in which I set the seed, and an additional 2 where I do roughly the same as I guess the moral of the story here is that one could:
@sgugger ? |
|
Agreed for the first one. For the second one, we could avoid overriding an existing |
|
Yes, I agree with the above! I'm at ACL next week but I'll try and open a small PR to address this the week after! |
|
Thanks, @alexcoca for noticing this and for your time. |
…6907) * add seed worker and set_deterministic_seed_for_cuda function to enforce reproducability * change function name to enable determinism, add docstrings, reproducability support for tf * change function name to enable_determinism_for_distributed_training * revert changes in set_seed and call set_seed within enable_full_determinism * add one position argument for seed_worker function * add full_determinism flag in training args and call enable_full_determinism when it is true * add enable_full_determinism to documentation * apply make fixup after the last commit * Update src/transformers/training_args.py Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]>


What does this PR do?
This PR ensures reproducibility for distributed trainings by setting seed for worker in dataloader and setting environment variables for cuda.
This PR is motivated by this issue.
Who can review?
@saattrupdan @sgugger I am looking forward to your feedback