Enable training purely based on number of iterations instead of epochs #3521

ananthsub · 2020-09-16T15:36:16Z

🚀 Feature

Enable training purely based on number of iterations instead of epochs

Motivation

This can be useful for certain training runs. Without this feature, the user must set an unreachably high value for max_epochs and set max_steps to the desired iteration count. With this setup, the trainer will break from the training loop based on max_steps since we'd never reach max_epochs. For example, Detectron2 uses iteration-based training in their train loop.

Pitch

The solution could be pretty simple. We can make min_epochs and max_epochs as Optional. If all of min/max_epochs and max_steps are unset, use the defaults we have today (min_epochs=1, max_epochs=1000). If only max_steps is set, use that (and keep min/max_epochs as None). If all are set, stop based on whichever condition is hit first.

Specifically, we'd need to:

Initialize things correctly here: https://github.com/PyTorchLightning/pytorch-lightning/blob/c64520e658806a87f282c74299b2ea4b7ea493ea/pytorch_lightning/trainer/training_loop.py#L46
Update the training loop here to not depend on max_epochs: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L340-L375 .

Are there other spots I'm missing?

Alternatives

Without touching the Trainer code, a super hacky solution would be setting max_epochs to be an outrageously large value and set max_steps to my desired iteration count. then we'll break from the train loop according to max_steps since we'd meet this condition first instead of max_epochs. However this could be better handled

The text was updated successfully, but these errors were encountered:

edenlightning · 2020-09-17T19:28:35Z

@williamFalcon should we move to 1.1?

Nilanshrajput · 2020-10-01T09:10:23Z

Hey, I cant see in the code where the default values for min/max_epochs are set.

rohitgr7 · 2020-10-04T18:26:44Z

@Nilanshrajput https://github.com/PyTorchLightning/pytorch-lightning/blob/1aa9d39506fa00558c07900924aa4e54ad9a762d/pytorch_lightning/trainer/trainer.py#L103-L104

richardqiu · 2020-10-04T23:24:36Z

Hi! Has anyone taken this issue? @williamFalcon @edenlightning

stale · 2020-11-04T00:09:34Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Iwontbecreative · 2020-11-12T17:57:26Z

Are there still plans to enable this feature? Seems particularly helpful to deal with IterableDatasets.

williamFalcon · 2020-11-12T18:52:13Z

cc @edenafek

richardqiu · 2020-11-12T19:32:31Z

Sorry folks, been a bit busy professionally as of late. I think I should be able to update the PR again within the next week or so. Thanks!

Devansh-Walia · 2020-11-18T06:45:00Z

closing this issue then?

hadim · 2020-12-23T15:40:00Z

I would also be interested in a training loop purely based on the number of iterations. Currently we set the max epoch argument to a very high number but it feels a bit hacky.

stale · 2021-01-23T01:46:11Z

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!

Continues Lightning-AI#4086 Fixes Lightning-AI#3521

Fixes Lightning-AI#3521

Queuecumber · 2024-05-13T17:12:09Z

I was wondering if it would be possible to revisit this idea and look at some enhancements to it. It seems like even if we are training purely based on iterations there is still a notion of epochs built in to lightning which is resulting in a very long context switch between "epochs". For example, if I only have one sample in my training set and I want to train on that one sample for 700 iterations, I still get an "epochs" progress bar which processes the single sample, then waits for a while, then increments the epoch number. I think it would be better to do away with the idea of epochs entirely if the target application calls for a purely iteration based training loop.

ananthsub added feature Is an improvement or enhancement help wanted Open to be worked on labels Sep 16, 2020

edenlightning added the design Includes a design discussion label Sep 16, 2020

edenlightning added this to the 0.9.x milestone Sep 17, 2020

edenlightning added the good first issue Good for newcomers label Sep 17, 2020

edenlightning assigned williamFalcon Sep 17, 2020

edenlightning modified the milestones: 0.9.x, 1.1 Sep 23, 2020

edenlightning unassigned williamFalcon Sep 24, 2020

edenlightning added the Hacktoberfest label Sep 24, 2020

richardqiu mentioned this issue Oct 11, 2020

Feature/3521 Enable purely iteration based training #4086

Closed

7 tasks

stale bot added the won't fix This will not be worked on label Nov 4, 2020

stale bot closed this as completed Nov 11, 2020

williamFalcon reopened this Nov 12, 2020

stale bot removed the won't fix This will not be worked on label Nov 12, 2020

edenlightning removed this from the 1.1 milestone Nov 30, 2020

stale bot added the won't fix This will not be worked on label Jan 23, 2021

ananthsub added a commit to ananthsub/pytorch-lightning that referenced this issue Jan 28, 2021

Enable purely iteration based training

6f6468a

Continues Lightning-AI#4086 Fixes Lightning-AI#3521

ananthsub mentioned this issue Jan 28, 2021

Enable purely iteration based training #5687

Closed

12 tasks

stale bot closed this as completed Jan 30, 2021

ananthsub added a commit to ananthsub/pytorch-lightning that referenced this issue Jan 31, 2021

Enable purely iteration-based training

d062a81

Fixes Lightning-AI#3521

ananthsub mentioned this issue Jan 31, 2021

Enable purely iteration-based training #5726

Merged

12 tasks

ananthsub reopened this Jan 31, 2021

ananthsub added this to the 1.2 milestone Jan 31, 2021

ananthsub linked a pull request Jan 31, 2021 that will close this issue

Enable purely iteration-based training #5726

Merged

12 tasks

stale bot closed this as completed Feb 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable training purely based on number of iterations instead of epochs #3521

Enable training purely based on number of iterations instead of epochs #3521

ananthsub commented Sep 16, 2020 •

edited

Loading

edenlightning commented Sep 17, 2020

Nilanshrajput commented Oct 1, 2020

rohitgr7 commented Oct 4, 2020

richardqiu commented Oct 4, 2020

stale bot commented Nov 4, 2020

Iwontbecreative commented Nov 12, 2020

williamFalcon commented Nov 12, 2020

richardqiu commented Nov 12, 2020

Devansh-Walia commented Nov 18, 2020

hadim commented Dec 23, 2020

stale bot commented Jan 23, 2021

Queuecumber commented May 13, 2024

Enable training purely based on number of iterations instead of epochs #3521

Enable training purely based on number of iterations instead of epochs #3521

Comments

ananthsub commented Sep 16, 2020 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

edenlightning commented Sep 17, 2020

Nilanshrajput commented Oct 1, 2020

rohitgr7 commented Oct 4, 2020

richardqiu commented Oct 4, 2020

stale bot commented Nov 4, 2020

Iwontbecreative commented Nov 12, 2020

williamFalcon commented Nov 12, 2020

richardqiu commented Nov 12, 2020

Devansh-Walia commented Nov 18, 2020

hadim commented Dec 23, 2020

stale bot commented Jan 23, 2021

Queuecumber commented May 13, 2024

ananthsub commented Sep 16, 2020 •

edited

Loading