-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix num_sanity_val_steps is clipped to limit_val_batches #2917
Conversation
Just want to confirm should |
num_sanity_val_steps should be int or -1. this is really meant to be a sanity check... floats will break this. You need this min because if your dataset is smaller than the limit_val_batches it will crash |
Not talking about changing # val_dataloader_len = 20
Trainer(num_sanity_val_steps=-1, limit_val_batches=0.1) on master, it runs for 20 batches but I suggest it should run for 2 only and Also with |
pytorch_lightning/trainer/trainer.py
Outdated
using_val_step = ref_model.val_dataloader is not None and self.is_overridden('validation_step') | ||
should_sanity_check = using_val_step and self.num_sanity_val_steps > 0 and self.limit_val_batches > 0 | ||
|
||
# run tiny validation (if validation defined) | ||
# to make sure program won't crash during val | ||
if should_sanity_check: | ||
self.reset_val_dataloader(ref_model) | ||
self._num_sanity_val_steps = [min(self.num_sanity_val_steps, val_batches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we have already fields for num_training_batches etc, should we call this simply num_sanity_val_batches for consistency?
I guess the reason why you made it protected is because you want to differentiate between the user input arg and the internal list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that is one reason, the other one is when using lr_find
or scale_batch_size
or any other case when trainer.fit
is called more than once. In such cases I don't think changing the init parameters itself is a good idea, tests will fail too, that's why I used _num_sanity_val_steps
.
Actually there are two ways to handle this:
- create
_num_sanity_val_steps
once and use it again while initializingprogressbar.total
in the other PR. - or do this
num_batches = [min(self.num_sanity_val_steps, val_batches) for val_batches in self.num_val_batches]
again while calculatingprogressbar.total
and avoid_num_sanity_val_steps
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer 1. since this is consistent with the other totals we compute internally. If possible, I would also prefer if we named this internal variable num_sanity_val_batches, also for consistency with the other totals. no strong preference though, the important part is that the implementation is clean fulfills our needs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping me if you need more help with this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli yeah was thinking to change it to num_sanity_val_batches
. I just need clarifications for #2917 (comment).
Hello @rohitgr7! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-20 23:06:17 UTC |
@rohitgr7 I fixed one test. Let's try to finish this and unblock the other PR. If @williamFalcon disagrees we can always change it. Anyway, the important part I guess is that we can internally have the list and this way easily support the counts for multiple dataloaders and access them elsewhere. |
Codecov Report
@@ Coverage Diff @@
## master #2917 +/- ##
======================================
Coverage 90% 90%
======================================
Files 81 81
Lines 7734 7734
======================================
Hits 6980 6980
Misses 754 754 |
4bee960
to
f006ebf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This pull request is now in conflict... :( |
@Borda can we merge this? |
I have not checked it, but it seems yo have 3 approves so yes... |
What does this PR do?
Fixes #2882
Before submitting
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃