-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate Trainer.devices
in favor of Trainer.num_devices
and Trainer.device_ids
#12151
Conversation
7c7ec33
to
de52d05
Compare
Trainer.devices
in favor of Trainer.num_devices
and Trainer.device_ids
Trainer.devices
in favor of Trainer.num_devices
and Trainer.device_ids
de52d05
to
a5ce5a7
Compare
Trainer.devices
in favor of Trainer.num_devices
and Trainer.device_ids
Trainer.devices
in favor of Trainer.num_devices
and Trainer.device_ids
05ca9c5
to
ff4b735
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, small comment
ff4b735
to
ead59b3
Compare
930e479
to
122d546
Compare
f0c3099
to
b7873c8
Compare
f27d46b
to
84ea0c5
Compare
84ea0c5
to
0cc5b05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delayed review.
@@ -2010,6 +2010,23 @@ def should_rank_save_checkpoint(self) -> bool: | |||
def num_nodes(self) -> int: | |||
return getattr(self.strategy, "num_nodes", 1) | |||
|
|||
@property | |||
def device_ids(self) -> List[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have preferred if this was implemented on the accelerator connector. It is IMO not the responsibility of the Trainer to compute the device_ids off the strategy. Note the the strategy is owned by the connector, not the trainer. As the trainer has already many responsibilities, we should delegate as much as possible to the concrete components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing @awaelchli!
My personal thoughts: the accelerator connector should stay internal, and accelerator connector should not expose those device related public properties.
Note the the strategy is owned by the connector, not the trainer.
Personally I feel the accelerator connector does a lot of checks/set up/initialization and configuration of the accelerator and strategy. After things are set up already, strategy/accelerator are already accessible on the Trainer, while the connector stays internal.
I think delegating computations using accelerator/strategy to the accelerator_connector will make us add many public properties to the internal-facing accelerator_connector. Deriving from strategy/accelerator directly are more straight forward, and they(strategy, accelerator) are very external facing as well.
I'm very new to this and my feelings might be wrong. Feel free to add more thoughts to issue #12171!
What does this PR do?
Part of #12171
Does your PR introduce any breaking changes? If yes, please list them.
N/A
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
cc @four4fish @awaelchli