Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrap tpu tests with process decorator #2582

Closed
wants to merge 11 commits into from
Closed

wrap tpu tests with process decorator #2582

wants to merge 11 commits into from

Conversation

williamFalcon
Copy link
Contributor

What does this PR do?

Fixes # (issue)

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together? Otherwise, we ask you to create a separate PR for every change.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@mergify mergify bot requested a review from a team July 11, 2020 13:22
@williamFalcon
Copy link
Contributor Author

williamFalcon commented Jul 11, 2020

@Borda let's see if this works...
@dlibenzi @zcain117

@codecov
Copy link

codecov bot commented Jul 11, 2020

Codecov Report

Merging #2582 into master will increase coverage by 2%.
The diff coverage is 18%.

@@           Coverage Diff            @@
##           master   #2582     +/-   ##
========================================
+ Coverage      89%     91%     +2%     
========================================
  Files          80      70     -10     
  Lines        7531    5770   -1761     
========================================
- Hits         6738    5252   -1486     
+ Misses        793     518    -275     

@Borda Borda added the bug Something isn't working label Jul 11, 2020
@@ -23,6 +24,7 @@


@pytest.mark.skipif(not TPU_AVAILABLE, reason="test requires TPU machine")
@dutils.pl_multi_process_test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when was this function added? and never used till now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added it a few days ago when the spawn stuff came out...

@mergify mergify bot requested a review from a team July 11, 2020 13:46
@williamFalcon williamFalcon changed the title add proc for tests [wip] add proc for tests Jul 11, 2020
@williamFalcon williamFalcon changed the title [wip] add proc for tests [wip] wrap tpu tests with process decorator Jul 11, 2020
@williamFalcon
Copy link
Contributor Author

@dlibenzi maybe the bug is about when we download data?

The current approach is:

  • download on global rank = 0
  • then spawn the multi-gpu stuff (.spawn())
    (this means inside the spawn call we don't download data)

Trace here (https://github.com/PyTorchLightning/pytorch-lightning/pull/2582/checks?check_run_id=861612147)

    dataloader = dataloader_fx()
  File "/pytorch-lightning/tests/base/model_valid_dataloaders.py", line 29, in val_dataloader__long
    num_samples=15000, digits=(0, 1, 2, 5, 8)), batch_size=32)
  File "/pytorch-lightning/tests/base/datasets.py", line 161, in __init__
    download=download
  File "/pytorch-lightning/tests/base/datasets.py", line 64, in __init__
    self.data, self.targets = torch.load(os.path.join(self.cached_folder_path, data_file))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 854, in _load
    result = unpickler.load()
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 846, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 834, in load_tensor
    storage = zip_file.get_storage_from_record(name, size, dtype).storage()
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/94756640: file read failed

Exception in device=TPU:4: unexpected EOF, expected 7791310 more bytes. The file might be corrupted.

@mergify
Copy link
Contributor

mergify bot commented Jul 22, 2020

This pull request is now in conflict... :(

@Borda Borda added this to the 0.9.0 milestone Aug 6, 2020
@Borda Borda changed the title [wip] wrap tpu tests with process decorator wrap tpu tests with process decorator Aug 11, 2020
@Borda
Copy link
Member

Borda commented Aug 11, 2020

@williamFalcon is this still valid? adding barriers looks reasonable...

@mergify
Copy link
Contributor

mergify bot commented Aug 13, 2020

This pull request is now in conflict... :(

@edenlightning edenlightning modified the milestones: 0.9.0, 0.9.x Aug 20, 2020
@mergify
Copy link
Contributor

mergify bot commented Sep 9, 2020

This pull request is now in conflict... :(

@Borda
Copy link
Member

Borda commented Sep 21, 2020

implemented in #2632

@Borda Borda closed this Sep 21, 2020
@williamFalcon williamFalcon deleted the tpu8tests branch October 4, 2020 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants