Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hot Fix] Ensure process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader #6015

Merged
merged 4 commits into from
Feb 16, 2021

Conversation

tchaton
Copy link
Contributor

@tchaton tchaton commented Feb 16, 2021

What does this PR do?

There was a regression in speed when using the TPU accelerator. After inspection, we realised that during the accelerator refactor we missed migrating the process_dataloader hook to the TrainingTypePlugin. This meant that we never wrapped the model when TPU cores > 1 in a parallel data-loader. This PR adds the correct logic.

A bigger discussion needs to take place in finding these corner cases which would be shown with regression/benchmarking tests. In the interim we should fix this particular error for release.

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@tchaton tchaton added this to the 1.2 milestone Feb 16, 2021
@tchaton tchaton self-assigned this Feb 16, 2021
Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a test for it?
also, edit chlog...?

@Borda Borda added the bug Something isn't working label Feb 16, 2021
@SeanNaren SeanNaren added _Will priority: 0 High priority task labels Feb 16, 2021
@codecov
Copy link

codecov bot commented Feb 16, 2021

Codecov Report

Merging #6015 (2401e3d) into master (fcfa7fa) will decrease coverage by 0%.
The diff coverage is 33%.

@@           Coverage Diff           @@
##           master   #6015    +/-   ##
=======================================
- Coverage      90%     90%    -0%     
=======================================
  Files         170     159    -11     
  Lines       11784   11209   -575     
=======================================
- Hits        10664   10140   -524     
+ Misses       1120    1069    -51     

@Borda Borda added the ready PRs ready to be merged label Feb 16, 2021
CHANGELOG.md Outdated Show resolved Hide resolved
@SeanNaren SeanNaren changed the title [HotFix] Forgot to call process_dataloader for tpu_cores > 1 [Hot Fix] Ensure process_dataloader is called when tpu_cores > 1 to use Parallel DataLoader Feb 16, 2021
@SeanNaren
Copy link
Contributor

I made an edit to the main post, I think we need to have some benchmark tests for TPUs that ensure that we stay within reasonable limits for speed/accuracy. We test for model change, but would be good to have a general test for catching these regressions

@tchaton tchaton enabled auto-merge (squash) February 16, 2021 21:57
@tchaton tchaton merged commit a52be5b into master Feb 16, 2021
@tchaton tchaton deleted the hotfix_tpus branch February 16, 2021 22:02
@williamFalcon
Copy link
Contributor

nice... yeah we need a test for this so we don't regress in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority: 0 High priority task ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants