-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Support a Trainer.train()
API
#10888
Comments
Do you think users might want to implement different training logic when they call I guess |
No, I think and Trainer functions are higher-level compositions which can run 1+ RunningStages: https://github.com/PyTorchLightning/pytorch-lightning/blob/a28b4cd0c0bba30c21cae571e650877f66cf5588/pytorch_lightning/trainer/states.py#L34 as you note, if the user wants some logic to happen in |
Trainer.train()
API Trainer.train()
API
Some questions:
|
No, I'd expect
The need would be far less, but I think a dedicated entry point is clearer for users and provides greater confidence that the framework doesn't initialize or check anything related to validation (including validation sanity checks).
I think we could keep the same behavior of not checking validation if Then users have a really clear path for onboarding:
One could argue that |
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team! |
Could you please provide a model use case? |
It's a sensible proposal but has the big drawback of creating confusion with However, I see the advantage for external loop customization or orchestrating multiple calls. Would you implement the loop class used as a copy of the |
After reflecting on this, @ananthsub I believe this shouldn't be added. First, because the Trainer API is final, but more importantly because it would force bad practices on the user. I am 100 % sure @williamFalcon and co at the beginning thought hard about it and the fact this option doesn't exist was meant to be from scratch. IMO, the best practice induced by the trainer.fit default is to perform a sanity checking. Furthermore, opt-in out is quite simple but should be the responsibility of the user, e.g Trainer(limit_val_batches=0) or no validation_datalaoder. +1 for added confusion. @awaelchli @carmocca I would recommend to close this RFC. Thanks @ananthsub for your time and effort proposing this :) |
The existing solution e.g. |
The solution is indeed sufficient, although improved documentation would be welcome. For those interested, another use case is training models to reconstruct shapes or scenes, e.g., DeepSDF https://openaccess.thecvf.com/content_CVPR_2019/papers/Park_DeepSDF_Learning_Continuous_Signed_Distance_Functions_for_Shape_Representation_CVPR_2019_paper.pdf. I believe neural radiance fields would have a similar use case, and are very popular. |
🚀 Feature
Add a new entry point to the Trainer which runs only the training loop with no validation.
Motivation
This makes it clear that if users only define
training_step
andtrain_dataloader
then they can call train without any risk of errors due to not implementing validation hooks. Though the framework checks this today.Another motivation is that users who do implement validation steps/dataloaders may only want to run training without validation. (for example, in the case of online training). Today, those users would need to ensure they set
limit_val_batches=0
before callingtrainer.fit
Finally, such a feature makes it easier to interleave train/validate/test/predict calls. For example, past requests have been made to run the test loop after each validation pass. In conjunction with #10444 this makes writing more complex patterns far simpler with Lightning.
This is slightly different from loop customization. In this case, I don't want to change any of the fundamental building blocks, but I may want to change the order/sequencing in which they're called.
Pitch
Offer a top-level function on the Trainer:
Alternatives
One could try to work around this as follows:
However, this is somewhat clunky to write, and requires users to dig through the various trainer properties/attributes to reset state across calls, which is not straightforward.
Additional context
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
cc @Borda @justusschock @kaushikb11 @awaelchli @ananthsub @ninginthecloud @jjenniferdai @rohitgr7
The text was updated successfully, but these errors were encountered: