Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid wrapping prediction dataloader twice on TPU #16571

Merged
merged 2 commits into from
Feb 3, 2023

Conversation

Liyang90
Copy link
Contributor

@Liyang90 Liyang90 commented Jan 31, 2023

Avoid wrapping dataloader with MpDeviceLoader more than once. Add batch_sampler back to the dataloader.

What does this PR do?

Fixes #16572

When doing prediction on TPU with:

trainer=Trainer(accelerator='tpu', devices=8)
trainer.predict(model=model, dataloaders=dm)

the code would crash because the prediction dataloader is wrapped twice with MpDeviceLoader, once in DataConnector._reset_eval_dataloader() and once in PredictionLoop.advance().

To avoid double wrapping, a checking is added in TPUSpawnStrategy.process_dataloader().

The batch_sampler is also added to the wrapped dataloader to avoid warning in prediction_epoch_loop.

Does your PR introduce any breaking changes? If yes, please list them.

None

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Yes
Make sure you had fun coding 🙃

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Jan 31, 2023
@Liyang90 Liyang90 changed the title Update tpu_spawn.py Avoid wrapping prediction dataloader twice on TPU Jan 31, 2023
@awaelchli awaelchli added accelerator: tpu Tensor Processing Unit bug Something isn't working labels Jan 31, 2023
@awaelchli awaelchli added this to the v1.9.x milestone Jan 31, 2023
@awaelchli awaelchli self-assigned this Jan 31, 2023
@awaelchli awaelchli added the community This PR is from the community label Jan 31, 2023
@carmocca carmocca added the data handling Generic data-related topic label Jan 31, 2023
@github-actions github-actions bot added the fabric lightning.fabric.Fabric label Jan 31, 2023
@mergify mergify bot added the has conflicts label Feb 1, 2023
@mergify mergify bot added ready PRs ready to be merged has conflicts and removed has conflicts ready PRs ready to be merged labels Feb 1, 2023
@github-actions github-actions bot added the app (removed) Generic label for Lightning App package label Feb 2, 2023
@mergify mergify bot added has conflicts and removed ready PRs ready to be merged labels Feb 2, 2023
@github-actions github-actions bot removed the app (removed) Generic label for Lightning App package label Feb 2, 2023
@mergify mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Feb 2, 2023
@carmocca
Copy link
Contributor

carmocca commented Feb 2, 2023

CI status: blocked by #16613

@github-actions github-actions bot removed the ci Continuous Integration label Feb 2, 2023
@awaelchli awaelchli enabled auto-merge (squash) February 2, 2023 19:04
@carmocca carmocca disabled auto-merge February 2, 2023 19:19
@Borda Borda merged commit e20172d into Lightning-AI:master Feb 3, 2023
@carmocca carmocca mentioned this pull request Feb 3, 2023
Borda pushed a commit that referenced this pull request Feb 9, 2023
Co-authored-by: Carlos Mocholí <[email protected]>

(cherry picked from commit e20172d)
Borda pushed a commit that referenced this pull request Feb 9, 2023
Co-authored-by: Carlos Mocholí <[email protected]>

(cherry picked from commit e20172d)
Borda pushed a commit that referenced this pull request Feb 9, 2023
Co-authored-by: Carlos Mocholí <[email protected]>

(cherry picked from commit e20172d)
Borda pushed a commit that referenced this pull request Feb 9, 2023
Co-authored-by: Carlos Mocholí <[email protected]>

(cherry picked from commit e20172d)
Borda pushed a commit that referenced this pull request Feb 9, 2023
Co-authored-by: Carlos Mocholí <[email protected]>

(cherry picked from commit e20172d)
lantiga pushed a commit that referenced this pull request Feb 10, 2023
Co-authored-by: Carlos Mocholí <[email protected]>

(cherry picked from commit e20172d)
@Liyang90 Liyang90 deleted the xla_prediction branch March 11, 2023 01:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accelerator: tpu Tensor Processing Unit bug Something isn't working community This PR is from the community data handling Generic data-related topic fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

trainer.predict() would crash on TPU
4 participants