Feature/3521 Enable purely iteration based training #4086

richardqiu · 2020-10-11T19:28:19Z

What does this PR do?

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

rohitgr7 · 2020-10-11T19:36:24Z

can you set your Black to ignore ' and " with -S or --skip-string-normalization?

codecov · 2020-10-11T20:52:58Z

Codecov Report

Merging #4086 (3c68864) into release/1.2-dev (8dfcc07) will increase coverage by 4%.
The diff coverage is 100%.

@@                Coverage Diff                @@
##           release/1.2-dev   #4086     +/-   ##
=================================================
+ Coverage               89%     93%     +4%     
=================================================
  Files                  164     117     -47     
  Lines                12218    8977   -3241     
=================================================
- Hits                 10928    8350   -2578     
+ Misses                1290     627    -663

pytorch_lightning/trainer/training_loop.py

williamFalcon · 2020-10-12T10:03:23Z

@richardqiu can you write a completely separate test and try not to modify the existing tests?

A good place to put the test is in trainer/flags/test_min_max_steps_epochs.py

williamFalcon

please add a different test and try not to touch current tests

mergify · 2020-10-21T18:37:32Z

This pull request is now in conflict... :(

tchaton · 2020-10-23T07:26:14Z

Hey @richardqiu,

Would you mind updating your PR ?

Best,
T.C

pep8speaks · 2020-10-24T02:33:30Z

Hello @richardqiu! Thanks for updating this PR.

In the file pytorch_lightning/trainer/__init__.py:

Line 535:121: E501 line too long (126 > 120 characters)
Line 536:121: E501 line too long (125 > 120 characters)
Line 843:121: E501 line too long (121 > 120 characters)
Line 861:121: E501 line too long (122 > 120 characters)

In the file pytorch_lightning/trainer/trainer.py:

Line 182:44: W291 trailing whitespace

In the file pytorch_lightning/trainer/training_loop.py:

Line 868:1: W293 blank line contains whitespace

Comment last updated at 2021-01-27 16:15:46 UTC

ananthsub · 2020-10-24T15:28:40Z

pytorch_lightning/trainer/__init__.py

+Stop training after this number of steps.
+If both max_epochs and max_steps are specified, training will stop if either max_steps or max_epochs have been
+reached (earliest).


the documentation for min/max_epochs above needs to be updated too

ananthsub · 2020-10-24T15:32:15Z

pytorch_lightning/trainer/trainer.py

+            epochs = range(self.current_epoch, self.max_epochs) if self.max_epochs else count(self.current_epoch)
+            for epoch in epochs:


what happens if self.max_epochs is None, and self.max_steps is large enough that we make more than 1 pass through the data? the count(self.current_epoch) would return an iterable over a single value right? does that mean the loop would stop too early?

Don't believe so, unless I'm misunderstanding? the count() iterator is indefinite/doesn't raise StopIteration.

ananthsub · 2020-10-24T15:33:36Z

tests/trainer/flags/test_min_max_steps_epochs.py

+    trainer = Trainer(
+        default_root_dir=tmpdir,
+        min_epochs=0,
+        max_steps=3,
+        min_steps=0,
+        weights_summary=None,
+    )


could you also test with a larger max_steps value? i want to test a case where trainer.current_epoch should be > 1 even if max_epochs is None

ananthsub · 2020-10-24T15:35:47Z

tests/trainer/flags/test_min_max_steps_epochs.py

+def test_min_steps_only(tmpdir):
+    """
+    Tests that min_steps can be used without min_epochs
+    """
+
+    model = BoringModel()
+
+    trainer = Trainer(
+        default_root_dir=tmpdir,
+        min_steps=3,
+        max_epochs=2,
+        weights_summary=None,
+    )
+
+    trainer.fit(model)


are there other min/max epoch test cases that could be moved into this file?

we should comprehensively test all possibilities for min/max epochs/steps. i think that's 8 test cases, which you could parameterize

… per-epoch

richardqiu · 2020-12-02T05:42:44Z

There are two failing tests in tests/trainer/test_dataloaders.py involving the changes to check_val_interval, which I'm not sure how to resolve. Advice here would be appreciated!

justusschock · 2020-12-02T09:55:09Z

@richardqiu This is also coming from the rebasing when we introduced lightning optimizer in #4658 . Seems like the rebasing skipped the added arguments there.

mergify · 2020-12-12T15:00:32Z

This pull request is now in conflict... :(

hadim · 2021-01-24T08:59:13Z

@richardqiu are you still working on this PR?

Borda · 2021-01-26T23:16:27Z

@richardqiu as it is a feature I am changing the target branch to feat 1.2, mind rebase pr and resolve conflicts? Thx your contribution and feel free to ping us back if you need something or need some help... 🐰

ananthsub · 2021-01-27T07:06:41Z

@Borda I'll take this up and re-submit it

kaushikb11 · 2021-01-27T07:57:44Z

@Borda I'll take this up and re-submit it

@ananthsub I believe this PR could resolve this Issue as well. Do let me know if I could help here.

…ning

richardqiu · 2021-01-27T13:59:56Z

@Borda I'll take this up and re-submit it

Thanks, I appreciate it! Apologies for the long delays here folks, and thanks for the feedback I've gotten here so far!

Continues Lightning-AI#4086

Continues Lightning-AI#4086 Fixes Lightning-AI#3521

mergify bot requested a review from a team October 11, 2020 19:28

richardqiu changed the title ~~Feature/3521 iteration based training~~ Feature/3521 Enable purely iteration based training Oct 11, 2020

justusschock requested changes Oct 12, 2020

View reviewed changes

pytorch_lightning/trainer/training_loop.py Show resolved Hide resolved

mergify bot requested a review from a team October 12, 2020 07:05

williamFalcon requested changes Oct 12, 2020

View reviewed changes

mergify bot requested a review from a team October 12, 2020 10:08

richardqiu force-pushed the feature/3521_iteration_based_training branch from 589a471 to 0db3507 Compare October 16, 2020 13:24

richardqiu requested review from williamFalcon and justusschock and removed request for a team October 16, 2020 17:26

mergify bot requested a review from a team October 16, 2020 17:27

Borda added this to the 1.1 milestone Oct 20, 2020

richardqiu force-pushed the feature/3521_iteration_based_training branch from 0db3507 to b9f6b69 Compare October 24, 2020 02:33

richardqiu requested review from ananyahjha93, awaelchli, Borda, nateraw, SeanNaren, tchaton and teddykoker as code owners October 24, 2020 02:33

ananthsub reviewed Oct 24, 2020

View reviewed changes

ananthsub suggested changes Oct 24, 2020

View reviewed changes

justusschock mentioned this pull request Oct 28, 2020

Enable trainer val_check_interval to be greater than number of the training batches #4406

Closed

Richard Qiu added 6 commits December 2, 2020 00:32

Minor docs updates and commenting

f94de2a

Added parameterized tests

2dbb0a8

Fixed failing checkpoint tests

23063d5

Changed val_check_interval to use global training batches rather than…

d3e27d0

… per-epoch

Updated changelog

3797c5b

Fixed rebase issues

992f4e2

richardqiu force-pushed the feature/3521_iteration_based_training branch from 3c68864 to 992f4e2 Compare December 2, 2020 05:36

github-actions bot added the has conflicts label Jan 12, 2021

Borda assigned ananthsub Jan 26, 2021

Borda changed the base branch from master to release/1.2-dev January 26, 2021 23:16

Borda modified the milestones: 1.1.x, 1.2 Jan 26, 2021

Borda added the feature Is an improvement or enhancement label Jan 26, 2021

Merge branch 'release/1.2-dev' into feature/3521_iteration_based_trai…

9f0cbf9

…ning

mergify bot removed the has conflicts label Jan 27, 2021

fix imports

2142a8d

github-actions bot added the has conflicts label Jan 27, 2021

ananthsub added a commit to ananthsub/pytorch-lightning that referenced this pull request Jan 28, 2021

Pick up Lightning-AI#4086

1dece84

Continues Lightning-AI#4086

ananthsub mentioned this pull request Jan 28, 2021

Support purely iteration based training #5686

Closed

12 tasks

ananthsub added a commit to ananthsub/pytorch-lightning that referenced this pull request Jan 28, 2021

Enable purely iteration based training

6f6468a

Continues Lightning-AI#4086 Fixes Lightning-AI#3521

ananthsub mentioned this pull request Jan 28, 2021

Enable purely iteration based training #5687

Closed

12 tasks

justusschock closed this Feb 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/3521 Enable purely iteration based training #4086

Feature/3521 Enable purely iteration based training #4086

richardqiu commented Oct 11, 2020

rohitgr7 commented Oct 11, 2020

codecov bot commented Oct 11, 2020 •

edited

Loading

williamFalcon commented Oct 12, 2020

williamFalcon left a comment

mergify bot commented Oct 21, 2020

tchaton commented Oct 23, 2020

pep8speaks commented Oct 24, 2020 •

edited

Loading

ananthsub Oct 24, 2020 •

edited

Loading

ananthsub Oct 24, 2020

richardqiu Nov 19, 2020

ananthsub Oct 24, 2020

ananthsub Oct 24, 2020

richardqiu commented Dec 2, 2020

justusschock commented Dec 2, 2020

mergify bot commented Dec 12, 2020

hadim commented Jan 24, 2021

Borda commented Jan 26, 2021

ananthsub commented Jan 27, 2021

kaushikb11 commented Jan 27, 2021

richardqiu commented Jan 27, 2021

		epochs = range(self.current_epoch, self.max_epochs) if self.max_epochs else count(self.current_epoch)
		for epoch in epochs:

Feature/3521 Enable purely iteration based training #4086

Feature/3521 Enable purely iteration based training #4086

Conversation

richardqiu commented Oct 11, 2020

What does this PR do?

Before submitting

PR review

Did you have fun?

rohitgr7 commented Oct 11, 2020

codecov bot commented Oct 11, 2020 • edited Loading

Codecov Report

williamFalcon commented Oct 12, 2020

williamFalcon left a comment

Choose a reason for hiding this comment

mergify bot commented Oct 21, 2020

tchaton commented Oct 23, 2020

pep8speaks commented Oct 24, 2020 • edited Loading

Comment last updated at 2021-01-27 16:15:46 UTC

ananthsub Oct 24, 2020 • edited Loading

Choose a reason for hiding this comment

ananthsub Oct 24, 2020

Choose a reason for hiding this comment

richardqiu Nov 19, 2020

Choose a reason for hiding this comment

ananthsub Oct 24, 2020

Choose a reason for hiding this comment

ananthsub Oct 24, 2020

Choose a reason for hiding this comment

richardqiu commented Dec 2, 2020

justusschock commented Dec 2, 2020

mergify bot commented Dec 12, 2020

hadim commented Jan 24, 2021

Borda commented Jan 26, 2021

ananthsub commented Jan 27, 2021

kaushikb11 commented Jan 27, 2021

richardqiu commented Jan 27, 2021

codecov bot commented Oct 11, 2020 •

edited

Loading

pep8speaks commented Oct 24, 2020 •

edited

Loading

ananthsub Oct 24, 2020 •

edited

Loading