Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need check_val_every_n_steps in Trainer #5565

Closed
del2z opened this issue Jan 19, 2021 · 4 comments
Closed

Need check_val_every_n_steps in Trainer #5565

del2z opened this issue Jan 19, 2021 · 4 comments
Labels
feature Is an improvement or enhancement help wanted Open to be worked on

Comments

@del2z
Copy link

del2z commented Jan 19, 2021

🚀 Feature

Add an argument check_val_every_n_steps in Trainer.__init__ function to check metrics of validation set for certain steps.

Motivation

For many tasks, large models are trained in steps not complete epochs, especially pretrained models in CV and NLP. As a consequence, step-based arguments like max_steps, log_every_n_steps may be more convenient than epoch-based ones. However, the Trainer API only has a check_val_every_n_epoch argument for computing metrics of validation data. It's very helpful to have an additional argument like check_val_every_n_steps in Trainer constructor.

Pitch

Trainer.init(..., check_val_every_n_epoch=1, check_val_every_n_steps=100, ...)

@del2z del2z added feature Is an improvement or enhancement help wanted Open to be worked on labels Jan 19, 2021
@del2z
Copy link
Author

del2z commented Jan 19, 2021

Another confusing concept is batch_idx in training_step, validation_step and test_step. A detailed example or illustration may be helpful to understand this concept. From my experience, batch_idx may not be widely used for developing models.

@rohitgr7
Copy link
Contributor

there is val_check_interval for that.

@del2z del2z closed this as completed Feb 1, 2021
@yuvalkirstain
Copy link

@rohitgr7 val_check_interval can't exceed a single epoch. So it does not support evaluation every n steps where n is larger than the amount of batches in the dataloader.

@rohitgr7
Copy link
Contributor

rohitgr7 commented Feb 1, 2022

hey @yuvalkirstain not yet.
here is the tracking issue: #8135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

4 participants
@del2z @rohitgr7 @yuvalkirstain and others