Add a flavor of training_step that takes dataloader_iter as an argument #8807

yifuwang · 2021-08-09T08:49:24Z

What does this PR do?

Implements the feature request #8316

Introduced FlexibleOptimizationFlow, which is a "batch processor" alternative to TrainingBatchLoop when the training_step takes dataloader_iter as an argument.

New tests are WIP.

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2021-08-09T08:50:48Z

Codecov Report

Merging #8807 (5f8bdd5) into master (89156b7) will decrease coverage by 4%.
The diff coverage is 96%.

@@           Coverage Diff           @@
##           master   #8807    +/-   ##
=======================================
- Coverage      93%     89%    -4%     
=======================================
  Files         173     176     +3     
  Lines       14182   14268    +86     
=======================================
- Hits        13174   12678   -496     
- Misses       1008    1590   +582

pytorch_lightning/__about__.py

pytorch_lightning/loops/batch/flexible_optimization_flow.py

pytorch_lightning/trainer/trainer.py

pytorch_lightning/loops/batch/flexible_optimization_flow.py

pytorch_lightning/utilities/batch_loop_common.py

tests/loops/test_flexible_optimization_flow.py

pytorch_lightning/loops/batch/flexible_optimization_flow.py

pytorch_lightning/utilities/batch_loop_common.py

pep8speaks · 2021-08-10T21:01:23Z

Hello @yifuwang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file tests/loops/test_inter_batch_parallelism.py:

Line 56:23: E203 whitespace before ':'

Comment last updated at 2021-08-12 21:17:43 UTC

pytorch_lightning/loops/epoch/training_epoch_loop.py

pytorch_lightning/loops/utilities.py

pytorch_lightning/loops/epoch/training_epoch_loop.py

pytorch_lightning/loops/utilities.py

SeanNaren · 2021-08-11T16:49:13Z

Just catching up here, but by introducing a method to allow the dataloader to be passed into the training step, we're bypassing a loop, migrating the loop into the training step right?

If so I think I understand @tchaton concerns of it being a risky hole to delve into without care. Shouldn't the logic exist within a loop past into Trainer or as done within #8700?

tchaton · 2021-08-11T19:37:15Z

Dear @yifuwang @ananthsub @carmocca @awaelchli @justusschock @williamFalcon @SeanNaren @kaushikb11 @Borda

After chatting with @yifuwang, I understood there is some urgency on this feature to enable converting libraries to PL.

However, the plan would be to start replacing this as soon as it is merged in follow-ups PRs by introducing a LightningPrefetcher with best practices which could benefit to all users, respect fault tolerant training requirements and can be extended for new use-cases.

The LightningPrefetcher should provide a way to provide iterator directly to the training_step while tracking progress to replace entirely this PR changes.

@yifuwang committed to co-author the refactor once this PR has been merged starting from the works within #8476 and #8700.

So let's get this PR merged asap :)

Best,
T.C

yifuwang · 2021-08-11T20:18:44Z

Hi Thomas, thanks for the review and feedback!

I'm fully onboard with enabling this through a broader design for data prefetching use cases (i.e. #8700). I'll be committed to helping execute the direction and replacing this before the next release.

ananthsub · 2021-08-11T21:41:47Z

@tchaton I agree, thank you for summarizing this! I'll also echo @yifuwang 's comment here as well!

I'm fully onboard with enabling this through a broader design for data prefetching use cases (i.e. #8700). I'll be committed to helping execute the direction and replacing this before the next release.

+1000

tchaton · 2021-08-12T07:41:44Z

@tchaton I agree, thank you for summarizing this! I'll also echo @yifuwang 's comment here as well!

I'm fully onboard with enabling this through a broader design for data prefetching use cases (i.e. #8700). I'll be committed to helping execute the direction and replacing this before the next release.

+1000

Thanks @ananthsub @yifuwang for confirming :)

IMO, Here is the scope around the refactoring:

Drop IteratorBatchProcessor once refactor is finalised.
Anything else within this PR seems appropriate refactors and should be conserved.

Finally, I created an RFC for the refactor re-design.

awaelchli

premature accept due to urgency
I do not fully understand what's going on here. I hope there is still a bit of time for us to test it.

I trust that we will follow up with meetings and discussions with how all of this fits into Lightnings philosophy. Ideally the order would be the other way around.

pytorch_lightning/loops/processors/iterator_batch_processor.py

pytorch_lightning/loops/utilities.py

tchaton

LGTM !

pytorch_lightning/loops/utilities.py

ananthsub

agreed with the feedback, and that this is just the start for how:

we can build a foundation around how the trainer interacts with the training step (and especially multiple flavors of training steps)
how we can bring better clarity to the individual batch loops, so the code is straightforward to read for both developers and users, especially those coming from pure pytorch training
how we can better select which loops to run based on the trainer & module attributes

pytorch_lightning/loops/utilities.py

tests/loops/test_inter_batch_parallelism.py

…aloader_iter

…nt (#8807) * Add a flavor of training_step that takes dataloader_iter as an argument

yifuwang requested review from awaelchli, Borda, carmocca, justusschock, SeanNaren, tchaton and williamFalcon as code owners August 9, 2021 08:49

yifuwang force-pushed the training_step_dataloader_iter branch from 0fb8317 to 81a8010 Compare August 9, 2021 08:54

tchaton reviewed Aug 9, 2021

View reviewed changes

pytorch_lightning/__about__.py Outdated Show resolved Hide resolved

pytorch_lightning/loops/batch/flexible_optimization_flow.py Outdated Show resolved Hide resolved

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

yifuwang force-pushed the training_step_dataloader_iter branch 2 times, most recently from 9f78849 to 2297f23 Compare August 9, 2021 23:20

yifuwang requested a review from kaushikb11 as a code owner August 9, 2021 23:20

yifuwang force-pushed the training_step_dataloader_iter branch from a2bc359 to 1cec6f8 Compare August 9, 2021 23:24

yifuwang changed the title ~~[WIP] Add a flavor of training_step that takes dataloader_iter as an argument~~ Add a flavor of training_step that takes dataloader_iter as an argument Aug 10, 2021

ananthsub reviewed Aug 10, 2021

View reviewed changes

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

pytorch_lightning/loops/batch/flexible_optimization_flow.py Outdated Show resolved Hide resolved

pytorch_lightning/utilities/batch_loop_common.py Outdated Show resolved Hide resolved

kaushikb11 reviewed Aug 10, 2021

View reviewed changes

tests/loops/test_flexible_optimization_flow.py Outdated Show resolved Hide resolved

carmocca reviewed Aug 10, 2021

View reviewed changes

pytorch_lightning/loops/batch/flexible_optimization_flow.py Outdated Show resolved Hide resolved

pytorch_lightning/loops/batch/flexible_optimization_flow.py Outdated Show resolved Hide resolved

pytorch_lightning/utilities/batch_loop_common.py Outdated Show resolved Hide resolved

ananthsub reviewed Aug 10, 2021

View reviewed changes

awaelchli approved these changes Aug 12, 2021

View reviewed changes

awaelchli added feature Is an improvement or enhancement refactor labels Aug 12, 2021

awaelchli added this to the v1.5 milestone Aug 12, 2021

awaelchli linked an issue Aug 12, 2021 that may be closed by this pull request

Add a flavor of training_step that allows for expressing inter-batch parallelism #8316

Closed

Yifu Wang added 12 commits August 12, 2021 13:32

Add a flavor of training_step that takes dataloader_iter as an argument

011c217

Add tests

55d8cea

Revert unintended change in __about__.py

c1df600

Update CHANGELOG.md

2e68d59

Doc fix

58bc432

Remove the mention of FlexibleOptimizationFlow in warnings and errors

96a929c

FlexibleOptimizationFlow -> IteratorBatchProcessor

b4d1947

utilities/batch_loop_common.py -> loops/utilities.py

4ccba85

Add an inter-batch parallelism test

120ab84

Fix typo

d09605f

trainer_ref -> trainer, model_ref -> model

ff8dd74

Only run test_inter_batch_parallelism when GPU is available

377a340

yifuwang force-pushed the training_step_dataloader_iter branch from 13a03c7 to 377a340 Compare August 12, 2021 20:33

Run config validator before switching to IteratorBatchProcessor

71bbccf

tchaton approved these changes Aug 13, 2021

View reviewed changes

pytorch_lightning/loops/utilities.py Show resolved Hide resolved

mergify bot added the ready PRs ready to be merged label Aug 13, 2021

ananthsub approved these changes Aug 13, 2021

View reviewed changes

Borda reviewed Aug 14, 2021

View reviewed changes

Yifu Wang added 2 commits August 14, 2021 13:16

Apply review suggestions

e99b9ae

Apply review suggestions

7fb52df

mergify bot added the has conflicts label Aug 16, 2021

Merge remote-tracking branch 'upstream/master' into training_step_dat…

5f8bdd5

…aloader_iter

mergify bot removed the has conflicts label Aug 16, 2021

ananthsub enabled auto-merge (squash) August 16, 2021 18:45

ananthsub merged commit 938a191 into Lightning-AI:master Aug 16, 2021

awaelchli pushed a commit that referenced this pull request Aug 17, 2021

Add a flavor of training_step that takes dataloader_iter as an argume…

9a790c6

…nt (#8807) * Add a flavor of training_step that takes dataloader_iter as an argument

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a flavor of training_step that takes dataloader_iter as an argument #8807

Add a flavor of training_step that takes dataloader_iter as an argument #8807

yifuwang commented Aug 9, 2021 •

edited

Loading

codecov bot commented Aug 9, 2021 •

edited

Loading

pep8speaks commented Aug 10, 2021 •

edited

Loading

SeanNaren commented Aug 11, 2021

tchaton commented Aug 11, 2021 •

edited

Loading

yifuwang commented Aug 11, 2021

ananthsub commented Aug 11, 2021

tchaton commented Aug 12, 2021 •

edited

Loading

awaelchli left a comment •

edited

Loading

tchaton left a comment

ananthsub left a comment

Add a flavor of training_step that takes dataloader_iter as an argument #8807

Add a flavor of training_step that takes dataloader_iter as an argument #8807

Conversation

yifuwang commented Aug 9, 2021 • edited Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

codecov bot commented Aug 9, 2021 • edited Loading

Codecov Report

pep8speaks commented Aug 10, 2021 • edited Loading

Comment last updated at 2021-08-12 21:17:43 UTC

SeanNaren commented Aug 11, 2021

tchaton commented Aug 11, 2021 • edited Loading

yifuwang commented Aug 11, 2021

ananthsub commented Aug 11, 2021

tchaton commented Aug 12, 2021 • edited Loading

awaelchli left a comment • edited Loading

Choose a reason for hiding this comment

tchaton left a comment

Choose a reason for hiding this comment

ananthsub left a comment

Choose a reason for hiding this comment

yifuwang commented Aug 9, 2021 •

edited

Loading

codecov bot commented Aug 9, 2021 •

edited

Loading

pep8speaks commented Aug 10, 2021 •

edited

Loading

tchaton commented Aug 11, 2021 •

edited

Loading

tchaton commented Aug 12, 2021 •

edited

Loading

awaelchli left a comment •

edited

Loading