Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable training purely based on number of iterations instead of epochs #3521

Closed
ananthsub opened this issue Sep 16, 2020 · 12 comments · Fixed by #5726
Closed

Enable training purely based on number of iterations instead of epochs #3521

ananthsub opened this issue Sep 16, 2020 · 12 comments · Fixed by #5726
Labels
design Includes a design discussion feature Is an improvement or enhancement good first issue Good for newcomers help wanted Open to be worked on won't fix This will not be worked on
Milestone

Comments

@ananthsub
Copy link
Contributor

ananthsub commented Sep 16, 2020

🚀 Feature

Enable training purely based on number of iterations instead of epochs

Motivation

This can be useful for certain training runs. Without this feature, the user must set an unreachably high value for max_epochs and set max_steps to the desired iteration count. With this setup, the trainer will break from the training loop based on max_steps since we'd never reach max_epochs. For example, Detectron2 uses iteration-based training in their train loop.

Pitch

The solution could be pretty simple. We can make min_epochs and max_epochs as Optional. If all of min/max_epochs and max_steps are unset, use the defaults we have today (min_epochs=1, max_epochs=1000). If only max_steps is set, use that (and keep min/max_epochs as None). If all are set, stop based on whichever condition is hit first.

Specifically, we'd need to:

Are there other spots I'm missing?

Alternatives

Without touching the Trainer code, a super hacky solution would be setting max_epochs to be an outrageously large value and set max_steps to my desired iteration count. then we'll break from the train loop according to max_steps since we'd meet this condition first instead of max_epochs. However this could be better handled

@ananthsub ananthsub added feature Is an improvement or enhancement help wanted Open to be worked on labels Sep 16, 2020
@edenlightning edenlightning added the design Includes a design discussion label Sep 16, 2020
@edenlightning edenlightning added this to the 0.9.x milestone Sep 17, 2020
@edenlightning edenlightning added the good first issue Good for newcomers label Sep 17, 2020
@edenlightning
Copy link
Contributor

@williamFalcon should we move to 1.1?

@Nilanshrajput
Copy link
Contributor

Hey, I cant see in the code where the default values for min/max_epochs are set.

@richardqiu
Copy link

Hi! Has anyone taken this issue? @williamFalcon @edenlightning

@stale
Copy link

stale bot commented Nov 4, 2020

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Nov 4, 2020
@stale stale bot closed this as completed Nov 11, 2020
@Iwontbecreative
Copy link

Are there still plans to enable this feature? Seems particularly helpful to deal with IterableDatasets.

@williamFalcon
Copy link
Contributor

cc @edenafek

@williamFalcon williamFalcon reopened this Nov 12, 2020
@stale stale bot removed the won't fix This will not be worked on label Nov 12, 2020
@richardqiu
Copy link

Sorry folks, been a bit busy professionally as of late. I think I should be able to update the PR again within the next week or so. Thanks!

@Devansh-Walia
Copy link

closing this issue then?

@edenlightning edenlightning removed this from the 1.1 milestone Nov 30, 2020
@hadim
Copy link
Contributor

hadim commented Dec 23, 2020

I would also be interested in a training loop purely based on the number of iterations. Currently we set the max epoch argument to a very high number but it feels a bit hacky.

@stale
Copy link

stale bot commented Jan 23, 2021

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

@stale stale bot added the won't fix This will not be worked on label Jan 23, 2021
ananthsub added a commit to ananthsub/pytorch-lightning that referenced this issue Jan 28, 2021
@stale stale bot closed this as completed Jan 30, 2021
ananthsub added a commit to ananthsub/pytorch-lightning that referenced this issue Jan 31, 2021
@ananthsub ananthsub reopened this Jan 31, 2021
@ananthsub ananthsub added this to the 1.2 milestone Jan 31, 2021
@ananthsub ananthsub linked a pull request Jan 31, 2021 that will close this issue
12 tasks
@stale stale bot closed this as completed Feb 7, 2021
@Queuecumber
Copy link
Contributor

I was wondering if it would be possible to revisit this idea and look at some enhancements to it. It seems like even if we are training purely based on iterations there is still a notion of epochs built in to lightning which is resulting in a very long context switch between "epochs". For example, if I only have one sample in my training set and I want to train on that one sample for 700 iterations, I still get an "epochs" progress bar which processes the single sample, then waits for a while, then increments the epoch number. I think it would be better to do away with the idea of epochs entirely if the target application calls for a purely iteration based training loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Includes a design discussion feature Is an improvement or enhancement good first issue Good for newcomers help wanted Open to be worked on won't fix This will not be worked on
Projects
None yet
10 participants