-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/N] Define dataclasses for progress tracking #6603
Conversation
thanks ananth! any chance you have a branch showing how the final state would look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but the validation LoopProgress
won't entirely work as a validation can be performed within training loop.
Also, some training type plugin such as DeepSpeed will need to access the TrainLoopProgress the increment logic diverge from current one.
""" | ||
total_epochs_processed: int = 0 | ||
total_batches_processed: int = 0 | ||
batches_processed_this_epoch: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any other suggestion for this variable name ? batches_processed_this_epoch
total_epochs_processed: int = 0 | ||
total_batches_processed: int = 0 | ||
batches_processed_this_epoch: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need the _processed if it in all vars and whenever you use the attribute becomes very long and yield in not nice like breaking (due to formating)
total_epochs_processed: int = 0 | |
total_batches_processed: int = 0 | |
batches_processed_this_epoch: int = 0 | |
total_epochs: int = 0 | |
total_batches: int = 0 | |
batches_this_epoch: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Borda - sorry for the delay. I want to be as explicit about the naming as possible. for example, there are also fields like this: https://github.com/PyTorchLightning/pytorch-lightning/blob/29357ba94eea0af101a41553fe82d71630af3c78/pytorch_lightning/trainer/data_loading.py#L47
from the name alone, it's impossible to tell what num_training_batches
refers to. is it the batches processed? this epoch? for the whole training? only from reading through the code can one say that its an estimation based on the dataloader length and limit train batches per epoch, but that it can also be infinite.
my hope for processed
is that its clear that we've read the batch and run the step on it. this is also terminology i've seen used elsewhere to good effect
Codecov Report
@@ Coverage Diff @@
## master #6603 +/- ##
======================================
- Coverage 88% 88% -0%
======================================
Files 197 198 +1
Lines 12871 12909 +38
======================================
+ Hits 11325 11347 +22
- Misses 1546 1562 +16 |
@tchaton the evaluation loop will own its loop progress tracker, so even if the train loop triggers validation, the validation loop will update its stats accordingly. this lets us know how many passes through the validation data we've done, which is something we don't currently report i think. Could you describe the requirements for deepspeed plugin and how the increment logic will be different? |
Hello @ananthsub! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-05-19 20:00:16 UTC |
Can you rename _progress to progress.py? It does not follow our naming scheme otherwise. We can later move it to the loop submodule folder when we have a loop interface classes. |
13652ab
to
12be935
Compare
8669b8b
to
402fe24
Compare
What does this PR do?
For #6429
This PR introduces dataclasses for progress tracking:
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃