Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset epoch progress with batch size scaler #13846

Merged

Conversation

cschell
Copy link
Contributor

@cschell cschell commented Jul 26, 2022

What does this PR do?

  1. This adds a test to check for the bug reported in scale_batch_size does not work anymore? #13696
  2. Fixes the reported bugs. I'm not entirely convinced by my fix, it seems like this can easily break again: I fixed the issue by calling the reset functions of two deeply nested objects. Ideally, there should be a method dedicated to reset the state of the Trainer so that _run_binsearch_scaling and _run_power_scaling don't have to know about the Trainer internals. However, it seems to work fine for now, and since there is a test now for that particular issue, any breaking changes should at least get noticed.

Fixes #13696

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

🦦

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Jul 26, 2022
@codecov
Copy link

codecov bot commented Jul 26, 2022

Codecov Report

Merging #13846 (dabf495) into master (6a999f1) will decrease coverage by 3%.
The diff coverage is 86%.

@@            Coverage Diff             @@
##           master   #13846      +/-   ##
==========================================
- Coverage      79%      76%      -3%     
==========================================
  Files         111      332     +221     
  Lines        7258    26909   +19651     
==========================================
+ Hits         5740    20450   +14710     
- Misses       1518     6459    +4941     

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be fixed by #11089, but test could help :)

src/pytorch_lightning/tuner/batch_size_scaling.py Outdated Show resolved Hide resolved
src/pytorch_lightning/tuner/batch_size_scaling.py Outdated Show resolved Hide resolved
tests/tests_pytorch/tuner/test_scale_batch_size.py Outdated Show resolved Hide resolved
@rohitgr7 rohitgr7 added this to the pl:1.6.x milestone Jul 26, 2022
@rohitgr7 rohitgr7 added bug Something isn't working tuner labels Jul 26, 2022
@carmocca carmocca added the community This PR is from the community label Jul 26, 2022
@carmocca carmocca modified the milestones: pl:1.6.x, pl:1.7.x Jul 28, 2022
@rohitgr7 rohitgr7 changed the title Fix auto batch tuner Reset epoch progress with batch size scaler Aug 22, 2022
@rohitgr7 rohitgr7 requested a review from otaj as a code owner August 22, 2022 19:31
src/pytorch_lightning/tuner/batch_size_scaling.py Outdated Show resolved Hide resolved
src/pytorch_lightning/tuner/batch_size_scaling.py Outdated Show resolved Hide resolved
@mergify mergify bot added the ready PRs ready to be merged label Aug 23, 2022
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Aug 25, 2022
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Aug 26, 2022
@rohitgr7 rohitgr7 merged commit 70deac2 into Lightning-AI:master Aug 26, 2022
rohitgr7 added a commit that referenced this pull request Aug 27, 2022
Co-authored-by: Christian Schell <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>
lexierule pushed a commit that referenced this pull request Aug 31, 2022
Co-authored-by: Christian Schell <[email protected]>
Co-authored-by: Rohit Gupta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community This PR is from the community pl Generic label for PyTorch Lightning package ready PRs ready to be merged tuner
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

scale_batch_size does not work anymore?
5 participants