-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add strategy
argument to Trainer
#8597
Add strategy
argument to Trainer
#8597
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8597 +/- ##
=======================================
- Coverage 93% 89% -4%
=======================================
Files 178 178
Lines 15668 15695 +27
=======================================
- Hits 14526 13943 -583
- Misses 1142 1752 +610 |
since training type plugins are themselves in beta, i have a naming question: training type isn't only for training, but also other stages like evaluation and prediction. people could be confused why the plugin name references training if it also applies during these other situations. with that in mind, is there another name we should formalize his under? i fully acknowledge renaming existing training type plugins would be super annoying, but it'll be much harder to change once this is on the trainer constructor |
@ananthsub I fully agree. This comes back from when we introduced it. Back then there was mainly training and validation (which was considered to be only a part of training). How would you call it though? Some kind of |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
accelerator_strategy
argument to Trainerstrategy
argument to Trainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGMT !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! once merged imo we should do two things:
- update docs
- send a message in our community slack notifying people of this change in general, and the motivations behind it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG!
Co-authored-by: Rohit Gupta <[email protected]>
for more information, see https://pre-commit.ci
raise MisconfigurationException( | ||
f"You have passed `Trainer(strategy={self.strategy})` but have" | ||
f" also passed `Trainer(distributed_backend={distributed_backend})`." | ||
f"HINT: Use just `Trainer(strategy={self.strategy})` instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing whitespace here
rank_zero_deprecation( | ||
f"Passing {accelerator} `strategy` to the `accelerator` flag in Trainer has been deprecated" | ||
f" in v1.5 and will be removed in v1.7. Use `Trainer(strategy={accelerator})` instead." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we weren't going to deprecate the previous accelerator
and instead just print a warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think accelerator flag is still there.. it's just that passing one of the strategies to it is deprecated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I understood from our offline discussion was that support gpus=N
and accelerator="ddp"
would not be deprecated and removed as its widely used but a warning would be printed suggesting adopting the new flags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flags gpus
, tpu_cores
, etc will still be supported but passing training strategies to accelerator
will be deprecated. This decision will also help with the internal cleanup of AcceleratorConnector.
Co-authored-by: Rohit Gupta <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do?
Supports #6090
Related Issue #9053
strategy
argument supports passing training type aliases (ddp
,ddp_spawn
),TrainingTypeRegistry
plugins ("ddp_spawn_find_unused_parameters_false"
) and custom plugin objects (DDPPlugin()
)At the moment, there’s a single flag accelerator tied for Accelerators as well as Training Type plugins. We wish to have them decoupled!
Alternate flags to set Training Types
accelerator
Optional[Union[str, Accelerator]]
= Nonedistributed_backend
Optional[str]
= Noneaccelerator
insteadplugins
Optional[Union[List[Union[Plugin, ClusterEnvironment, str]], Plugin, ClusterEnvironment, str]]
= NoneWhat's the difference between passing training type to accelerator, distributed_backend, or plugins?
accelerator
anddistributed_backend
only supportDistributedType
, whereasplugins
support Custom Training Types.Exceptions:
Trainer(distributed_backend="ddp_cpu", strategy="ddp_spawn")
Trainer(accelerator="ddp", strategy="ddp_spawn")
Trainer(plugins="ddp_find_unused_parameters_false", strategy="ddp_spawn")
Deprecations: (Deprecated in v1.5 & will be removed in v1.6)
accelerator
flagplugins
flagDoes your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃