-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2/n inter batch parallelism #9047
Conversation
for more information, see https://pre-commit.ci
…rchLightning/pytorch-lightning into 1/n_inter_batch_parallelism
…lightning into 0/n_inter_batch_parallelism
Codecov Report
@@ Coverage Diff @@
## master #9047 +/- ##
======================================
Coverage 89% 89%
======================================
Files 175 175
Lines 14463 14497 +34
======================================
+ Hits 12812 12844 +32
- Misses 1651 1653 +2 |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few inline comments while trying to undestand the reasoning around some of the complexity in this diff and other diffs in the stack.
@@ -280,6 +280,8 @@ def _training_step( | |||
training_step_output = self.trainer.accelerator.training_step(step_kwargs) | |||
self.trainer.accelerator.post_training_step() | |||
|
|||
del step_kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
@@ -91,8 +91,9 @@ def advance( | |||
if batch is None: | |||
raise StopIteration | |||
|
|||
with self.trainer.profiler.profile("evaluation_batch_to_device"): | |||
batch = self.trainer.accelerator.batch_to_device(batch, dataloader_idx=dataloader_idx) | |||
if not self.trainer.data_connector.evaluation_data_fetcher.store_on_device: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we plan to remove this?
@@ -47,22 +47,27 @@ class SimpleDataFetcher(AbstractDataFetcher): | |||
def fetching_function(self): | |||
while True: | |||
try: | |||
yield next(self.dataloader_iter), False | |||
return next(self.dataloader_iter), False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change?
I am having a hard time to undertsand why we have a while loop that will only run for one iteration. It's not clear why this hasn't been replaced by an if statement, or why we didn't just remove the while loop here.
Also it seems that this function now became just semantically equivalent to implement next in a regular iterator, why do we still need this fecthing_function here?
@aazzolini I also noticed some of the things you mention here and opened #9058 to address them. Thanks for reviewing! |
What does this PR do?
This PR adds DataFetcher to
data_connector
and loops.Parts #8867
Parts #8316
Does your PR introduce any breaking changes? If yes, please list them.
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃