Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1/N] Define dataclasses for progress tracking #6603

Merged
merged 14 commits into from
May 19, 2021

Conversation

ananthsub
Copy link
Contributor

@ananthsub ananthsub commented Mar 20, 2021

What does this PR do?

For #6429

This PR introduces dataclasses for progress tracking:

  • one for the base
  • one for the base loop
  • one for the trainer, which has member for each of the different loop stages (train/val/test/predict)

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@SeanNaren
Copy link
Contributor

thanks ananth! any chance you have a branch showing how the final state would look?

Copy link
Contributor

@tchaton tchaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but the validation LoopProgress won't entirely work as a validation can be performed within training loop.
Also, some training type plugin such as DeepSpeed will need to access the TrainLoopProgress the increment logic diverge from current one.

"""
total_epochs_processed: int = 0
total_batches_processed: int = 0
batches_processed_this_epoch: int = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other suggestion for this variable name ? batches_processed_this_epoch

@Borda Borda added the feature Is an improvement or enhancement label Mar 28, 2021
@Borda Borda added this to the 1.3 milestone Mar 28, 2021
Comment on lines 27 to 29
total_epochs_processed: int = 0
total_batches_processed: int = 0
batches_processed_this_epoch: int = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need the _processed if it in all vars and whenever you use the attribute becomes very long and yield in not nice like breaking (due to formating)

Suggested change
total_epochs_processed: int = 0
total_batches_processed: int = 0
batches_processed_this_epoch: int = 0
total_epochs: int = 0
total_batches: int = 0
batches_this_epoch: int = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Borda - sorry for the delay. I want to be as explicit about the naming as possible. for example, there are also fields like this: https://github.com/PyTorchLightning/pytorch-lightning/blob/29357ba94eea0af101a41553fe82d71630af3c78/pytorch_lightning/trainer/data_loading.py#L47

from the name alone, it's impossible to tell what num_training_batches refers to. is it the batches processed? this epoch? for the whole training? only from reading through the code can one say that its an estimation based on the dataloader length and limit train batches per epoch, but that it can also be infinite.

my hope for processed is that its clear that we've read the batch and run the step on it. this is also terminology i've seen used elsewhere to good effect

@edenlightning edenlightning modified the milestones: 1.3, 1.4 Apr 16, 2021
@ananthsub ananthsub requested a review from kaushikb11 as a code owner May 2, 2021 01:46
@codecov
Copy link

codecov bot commented May 2, 2021

Codecov Report

Merging #6603 (9fd7841) into master (608de6a) will decrease coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #6603   +/-   ##
======================================
- Coverage      88%     88%   -0%     
======================================
  Files         197     198    +1     
  Lines       12871   12909   +38     
======================================
+ Hits        11325   11347   +22     
- Misses       1546    1562   +16     

@ananthsub
Copy link
Contributor Author

ananthsub commented May 2, 2021

LGTM, but the validation LoopProgress won't entirely work as a validation can be performed within training loop.
Also, some training type plugin such as DeepSpeed will need to access the TrainLoopProgress the increment logic diverge from current one.

@tchaton the evaluation loop will own its loop progress tracker, so even if the train loop triggers validation, the validation loop will update its stats accordingly. this lets us know how many passes through the validation data we've done, which is something we don't currently report i think. Could you describe the requirements for deepspeed plugin and how the increment logic will be different?

@pep8speaks
Copy link

pep8speaks commented May 7, 2021

Hello @ananthsub! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-19 20:00:16 UTC

@ananthsub ananthsub added design Includes a design discussion refactor labels May 7, 2021
@awaelchli
Copy link
Contributor

awaelchli commented May 7, 2021

Can you rename _progress to progress.py? It does not follow our naming scheme otherwise. We can later move it to the loop submodule folder when we have a loop interface classes.

@mergify mergify bot added the has conflicts label May 7, 2021
@carmocca carmocca enabled auto-merge (squash) May 19, 2021 19:58
@carmocca carmocca merged commit 9f5d495 into Lightning-AI:master May 19, 2021
@ananthsub ananthsub deleted the feat-progress-1 branch May 19, 2021 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants