Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove trainer hidden state | sanity refactor [1 / n] #7437

Merged
merged 19 commits into from
May 11, 2021

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented May 7, 2021

What does this PR do?

Restores sanity around trainer state.
These attributes are now owned by the train loop and are read only on the trainer:

  • global_step
  • current_epoch
  • max/min_steps
  • max/min_epochs
  • batch_idx
  • total_batch_idx

Exception: Tuner is allowed to reach into anything.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@awaelchli awaelchli added this to the v1.4 milestone May 7, 2021
@codecov
Copy link

codecov bot commented May 7, 2021

Codecov Report

Merging #7437 (ab13f17) into master (8660d8c) will decrease coverage by 4%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #7437    +/-   ##
=======================================
- Coverage      92%     88%    -4%     
=======================================
  Files         199     199            
  Lines       13030   13047    +17     
=======================================
- Hits        11989   11421   -568     
- Misses       1041    1626   +585     

Comment on lines 93 to 96
def should_skip_training(self):
should_by_max_steps = self.trainer.max_steps is not None and self.trainer.global_step >= self.trainer.max_steps
should_by_epoch = self.trainer.max_epochs is not None and self.trainer.current_epoch >= self.trainer.max_epochs
should_by_max_steps = self.max_steps is not None and self.global_step >= self.max_steps
should_by_epoch = self.max_epochs is not None and self.current_epoch >= self.max_epochs
return should_by_max_steps or should_by_epoch or self.trainer.num_training_batches == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: bool signature
and you can return early after each line instead of handling all the ors at the end

Copy link
Contributor Author

@awaelchli awaelchli May 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does Python evaluate the other side of the and if the first one is already False? I though not? That would be weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you asking me to do this:

if condition1
    return True # return early
if condition2
    return True # return early
...
return False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in anticipation of upcoming changes to the train loop I will hold back on extensive typing for now

@mergify mergify bot added the has conflicts label May 8, 2021
Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@awaelchli awaelchli enabled auto-merge (squash) May 10, 2021 07:53
@mergify mergify bot removed the has conflicts label May 10, 2021
@awaelchli awaelchli added the ready PRs ready to be merged label May 10, 2021
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks much cleaner now :]

Comment on lines +491 to +512
def global_step(self) -> int:
return self.train_loop.global_step

@property
def current_epoch(self) -> int:
return self.train_loop.current_epoch

@property
def max_epochs(self) -> Optional[int]:
return self.train_loop.max_epochs

@property
def min_epochs(self) -> Optional[int]:
return self.train_loop.min_epochs

@property
def max_steps(self) -> Optional[int]:
return self.train_loop.max_steps

@property
def min_steps(self) -> Optional[int]:
return self.train_loop.min_steps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this be deprecated in future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The user is allowed to access it and it's useful, but it should be read-only :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is batch_idx intentionally left out of these read-only properties?

Copy link
Contributor Author

@awaelchli awaelchli May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's not meant to be on the trainer because 1) it is accessible to the user through the hooks already 2) it anyway wouldn't be clear what the meaning of this variable is since it is specific to the loop.

@awaelchli awaelchli merged commit ad9118f into master May 11, 2021
@awaelchli awaelchli deleted the refactor/trainer-state-sanity-0 branch May 11, 2021 09:09
@ruotianluo
Copy link
Contributor

The link in the CHANGELOG is wrong.

@carmocca
Copy link
Contributor

carmocca commented Jun 6, 2021

@ruotianluo Do you want to send a patch with the correct link?

@ruotianluo ruotianluo mentioned this pull request Jun 6, 2021
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants