Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix num_sanity_val_steps is clipped to limit_val_batches #2917

Merged
merged 11 commits into from
Aug 21, 2020

Conversation

rohitgr7
Copy link
Contributor

@rohitgr7 rohitgr7 commented Aug 11, 2020

What does this PR do?

Fixes #2882

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team August 11, 2020 17:37
@rohitgr7
Copy link
Contributor Author

rohitgr7 commented Aug 11, 2020

Just want to confirm should num_sanity_val_steps==-1 depend on limit_val_batches? I suggest it should but just wanted to confirm before making changes. Will fix and add new tests accordingly if required :)

@Borda Borda added the bug Something isn't working label Aug 11, 2020
@williamFalcon
Copy link
Contributor

num_sanity_val_steps should be int or -1.

this is really meant to be a sanity check... floats will break this.

You need this min because if your dataset is smaller than the limit_val_batches it will crash

@rohitgr7
Copy link
Contributor Author

Not talking about changing num_sanity_val_steps to be a float or something. Let me give an example:

# val_dataloader_len = 20
Trainer(num_sanity_val_steps=-1, limit_val_batches=0.1)

on master, it runs for 20 batches but I suggest it should run for 2 only and limit_val_batches should be considered when num_sanity_val_steps=-1

Also with limit_val_batches=float doesn't work with num_sanity_val_steps != -1 on master.

using_val_step = ref_model.val_dataloader is not None and self.is_overridden('validation_step')
should_sanity_check = using_val_step and self.num_sanity_val_steps > 0 and self.limit_val_batches > 0

# run tiny validation (if validation defined)
# to make sure program won't crash during val
if should_sanity_check:
self.reset_val_dataloader(ref_model)
self._num_sanity_val_steps = [min(self.num_sanity_val_steps, val_batches)
Copy link
Contributor

@awaelchli awaelchli Aug 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we have already fields for num_training_batches etc, should we call this simply num_sanity_val_batches for consistency?
I guess the reason why you made it protected is because you want to differentiate between the user input arg and the internal list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that is one reason, the other one is when using lr_find or scale_batch_size or any other case when trainer.fit is called more than once. In such cases I don't think changing the init parameters itself is a good idea, tests will fail too, that's why I used _num_sanity_val_steps.

Actually there are two ways to handle this:

  1. create _num_sanity_val_steps once and use it again while initializing progressbar.total in the other PR.
  2. or do this num_batches = [min(self.num_sanity_val_steps, val_batches) for val_batches in self.num_val_batches] again while calculating progressbar.total and avoid _num_sanity_val_steps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer 1. since this is consistent with the other totals we compute internally. If possible, I would also prefer if we named this internal variable num_sanity_val_batches, also for consistency with the other totals. no strong preference though, the important part is that the implementation is clean fulfills our needs :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping me if you need more help with this pr.

Copy link
Contributor Author

@rohitgr7 rohitgr7 Aug 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awaelchli yeah was thinking to change it to num_sanity_val_batches. I just need clarifications for #2917 (comment).

@mergify mergify bot requested a review from a team August 14, 2020 08:16
@pep8speaks
Copy link

pep8speaks commented Aug 19, 2020

Hello @rohitgr7! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-20 23:06:17 UTC

@awaelchli
Copy link
Contributor

@rohitgr7 I fixed one test. Let's try to finish this and unblock the other PR. If @williamFalcon disagrees we can always change it. Anyway, the important part I guess is that we can internally have the list and this way easily support the counts for multiple dataloaders and access them elsewhere.

@codecov
Copy link

codecov bot commented Aug 19, 2020

Codecov Report

Merging #2917 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #2917   +/-   ##
======================================
  Coverage      90%     90%           
======================================
  Files          81      81           
  Lines        7734    7734           
======================================
  Hits         6980    6980           
  Misses        754     754           

@rohitgr7 rohitgr7 force-pushed the fix_val_sanity_check branch from 4bee960 to f006ebf Compare August 19, 2020 18:44
@rohitgr7 rohitgr7 changed the title [WIP] Fix num_sanity_val_steps according to limit_val_batches Fix num_sanity_val_steps is clipped to limit_val_batches Aug 19, 2020
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@mergify mergify bot requested a review from a team August 19, 2020 19:57
@mergify
Copy link
Contributor

mergify bot commented Aug 19, 2020

This pull request is now in conflict... :(

@mergify mergify bot requested a review from a team August 19, 2020 22:36
@mergify
Copy link
Contributor

mergify bot commented Aug 19, 2020

This pull request is now in conflict... :(

Copy link
Member

@SkafteNicki SkafteNicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Borda Borda added this to the 0.9.x milestone Aug 20, 2020
@mergify
Copy link
Contributor

mergify bot commented Aug 20, 2020

This pull request is now in conflict... :(

@rohitgr7
Copy link
Contributor Author

@Borda can we merge this?

@Borda
Copy link
Member

Borda commented Aug 21, 2020

@Borda can we merge this?

I have not checked it, but it seems yo have 3 approves so yes...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Int num_sanity_val_steps is always replaced by float limit_val_batches
7 participants