Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/3521 Enable purely iteration based training #4086

Conversation

richardqiu
Copy link

What does this PR do?

Fixes #3521

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team October 11, 2020 19:28
@richardqiu richardqiu changed the title Feature/3521 iteration based training Feature/3521 Enable purely iteration based training Oct 11, 2020
@rohitgr7
Copy link
Contributor

can you set your Black to ignore ' and " with -S or --skip-string-normalization?

@codecov
Copy link

codecov bot commented Oct 11, 2020

Codecov Report

Merging #4086 (3c68864) into release/1.2-dev (8dfcc07) will increase coverage by 4%.
The diff coverage is 100%.

@@                Coverage Diff                @@
##           release/1.2-dev   #4086     +/-   ##
=================================================
+ Coverage               89%     93%     +4%     
=================================================
  Files                  164     117     -47     
  Lines                12218    8977   -3241     
=================================================
- Hits                 10928    8350   -2578     
+ Misses                1290     627    -663     

@mergify mergify bot requested a review from a team October 12, 2020 07:05
@williamFalcon
Copy link
Contributor

@richardqiu can you write a completely separate test and try not to modify the existing tests?

A good place to put the test is in trainer/flags/test_min_max_steps_epochs.py

Copy link
Contributor

@williamFalcon williamFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a different test and try not to touch current tests

@mergify mergify bot requested a review from a team October 12, 2020 10:08
@richardqiu richardqiu force-pushed the feature/3521_iteration_based_training branch from 589a471 to 0db3507 Compare October 16, 2020 13:24
@richardqiu richardqiu requested review from williamFalcon and justusschock and removed request for a team October 16, 2020 17:26
@mergify mergify bot requested a review from a team October 16, 2020 17:27
@Borda Borda added this to the 1.1 milestone Oct 20, 2020
@mergify
Copy link
Contributor

mergify bot commented Oct 21, 2020

This pull request is now in conflict... :(

@tchaton
Copy link
Contributor

tchaton commented Oct 23, 2020

Hey @richardqiu,

Would you mind updating your PR ?

Best,
T.C

@pep8speaks
Copy link

pep8speaks commented Oct 24, 2020

Hello @richardqiu! Thanks for updating this PR.

Line 535:121: E501 line too long (126 > 120 characters)
Line 536:121: E501 line too long (125 > 120 characters)
Line 843:121: E501 line too long (121 > 120 characters)
Line 861:121: E501 line too long (122 > 120 characters)

Line 182:44: W291 trailing whitespace

Line 868:1: W293 blank line contains whitespace

Comment last updated at 2021-01-27 16:15:46 UTC

Comment on lines +879 to +881
Stop training after this number of steps.
If both max_epochs and max_steps are specified, training will stop if either max_steps or max_epochs have been
reached (earliest).
Copy link
Contributor

@ananthsub ananthsub Oct 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation for min/max_epochs above needs to be updated too

Comment on lines +476 to +514
epochs = range(self.current_epoch, self.max_epochs) if self.max_epochs else count(self.current_epoch)
for epoch in epochs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if self.max_epochs is None, and self.max_steps is large enough that we make more than 1 pass through the data? the count(self.current_epoch) would return an iterable over a single value right? does that mean the loop would stop too early?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't believe so, unless I'm misunderstanding? the count() iterator is indefinite/doesn't raise StopIteration.

Comment on lines 12 to 37
trainer = Trainer(
default_root_dir=tmpdir,
min_epochs=0,
max_steps=3,
min_steps=0,
weights_summary=None,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also test with a larger max_steps value? i want to test a case where trainer.current_epoch should be > 1 even if max_epochs is None

Comment on lines 27 to 41
def test_min_steps_only(tmpdir):
"""
Tests that min_steps can be used without min_epochs
"""

model = BoringModel()

trainer = Trainer(
default_root_dir=tmpdir,
min_steps=3,
max_epochs=2,
weights_summary=None,
)

trainer.fit(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • are there other min/max epoch test cases that could be moved into this file?
  • we should comprehensively test all possibilities for min/max epochs/steps. i think that's 8 test cases, which you could parameterize

@richardqiu richardqiu force-pushed the feature/3521_iteration_based_training branch from 3c68864 to 992f4e2 Compare December 2, 2020 05:36
@richardqiu
Copy link
Author

There are two failing tests in tests/trainer/test_dataloaders.py involving the changes to check_val_interval, which I'm not sure how to resolve. Advice here would be appreciated!

@justusschock
Copy link
Member

@richardqiu This is also coming from the rebasing when we introduced lightning optimizer in #4658 . Seems like the rebasing skipped the added arguments there.

@mergify
Copy link
Contributor

mergify bot commented Dec 12, 2020

This pull request is now in conflict... :(

@hadim
Copy link
Contributor

hadim commented Jan 24, 2021

@richardqiu are you still working on this PR?

@Borda Borda changed the base branch from master to release/1.2-dev January 26, 2021 23:16
@Borda Borda modified the milestones: 1.1.x, 1.2 Jan 26, 2021
@Borda Borda added the feature Is an improvement or enhancement label Jan 26, 2021
@Borda
Copy link
Member

Borda commented Jan 26, 2021

@richardqiu as it is a feature I am changing the target branch to feat 1.2, mind rebase pr and resolve conflicts? Thx your contribution and feel free to ping us back if you need something or need some help... 🐰

@ananthsub
Copy link
Contributor

@Borda I'll take this up and re-submit it

@kaushikb11
Copy link
Contributor

@Borda I'll take this up and re-submit it

@ananthsub I believe this PR could resolve this Issue as well. Do let me know if I could help here.

@mergify mergify bot removed the has conflicts label Jan 27, 2021
@richardqiu
Copy link
Author

@Borda I'll take this up and re-submit it

Thanks, I appreciate it! Apologies for the long delays here folks, and thanks for the feedback I've gotten here so far!

ananthsub added a commit to ananthsub/pytorch-lightning that referenced this pull request Jan 28, 2021
ananthsub added a commit to ananthsub/pytorch-lightning that referenced this pull request Jan 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement has conflicts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable training purely based on number of iterations instead of epochs
10 participants