-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable training purely based on number of iterations instead of epochs #3521
Comments
@williamFalcon should we move to 1.1? |
Hey, I cant see in the code where the default values for min/max_epochs are set. |
Hi! Has anyone taken this issue? @williamFalcon @edenlightning |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
Are there still plans to enable this feature? Seems particularly helpful to deal with IterableDatasets. |
cc @edenafek |
Sorry folks, been a bit busy professionally as of late. I think I should be able to update the PR again within the next week or so. Thanks! |
closing this issue then? |
I would also be interested in a training loop purely based on the number of iterations. Currently we set the max epoch argument to a very high number but it feels a bit hacky. |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
I was wondering if it would be possible to revisit this idea and look at some enhancements to it. It seems like even if we are training purely based on iterations there is still a notion of epochs built in to lightning which is resulting in a very long context switch between "epochs". For example, if I only have one sample in my training set and I want to train on that one sample for 700 iterations, I still get an "epochs" progress bar which processes the single sample, then waits for a while, then increments the epoch number. I think it would be better to do away with the idea of epochs entirely if the target application calls for a purely iteration based training loop. |
🚀 Feature
Enable training purely based on number of iterations instead of epochs
Motivation
This can be useful for certain training runs. Without this feature, the user must set an unreachably high value for
max_epochs
and setmax_steps
to the desired iteration count. With this setup, the trainer will break from the training loop based onmax_steps
since we'd never reachmax_epochs
. For example, Detectron2 uses iteration-based training in their train loop.Pitch
The solution could be pretty simple. We can make
min_epochs
andmax_epochs
as Optional. If all of min/max_epochs and max_steps are unset, use the defaults we have today (min_epochs=1, max_epochs=1000). If onlymax_steps
is set, use that (and keepmin/max_epochs
as None). If all are set, stop based on whichever condition is hit first.Specifically, we'd need to:
max_epochs
: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L340-L375 .Are there other spots I'm missing?
Alternatives
Without touching the Trainer code, a super hacky solution would be setting
max_epochs
to be an outrageously large value and setmax_steps
to my desired iteration count. then we'll break from the train loop according tomax_steps
since we'd meet this condition first instead ofmax_epochs
. However this could be better handledThe text was updated successfully, but these errors were encountered: