Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compatibility TM v0.4 #8206

Merged
merged 5 commits into from
Jun 30, 2021
Merged

compatibility TM v0.4 #8206

merged 5 commits into from
Jun 30, 2021

Conversation

Borda
Copy link
Member

@Borda Borda commented Jun 29, 2021

What does this PR do?

Fixes #<issue_number>

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

@Borda Borda added bug Something isn't working metrics labels Jun 29, 2021
@Borda Borda added this to the v1.3.x milestone Jun 29, 2021
@Borda Borda changed the base branch from master to release/1.3.x June 29, 2021 20:44
@Borda Borda marked this pull request as ready for review June 29, 2021 20:51
@kaushikb11
Copy link
Contributor

@Borda Why is this required?

@Borda
Copy link
Member Author

Borda commented Jun 29, 2021

@Borda Why is this required?

TM with a recent 0.4 dropped some arguments...
cc: @awaelchli

@codecov
Copy link

codecov bot commented Jun 29, 2021

Codecov Report

Merging #8206 (80902bb) into release/1.3.x (e5fcc3d) will increase coverage by 2%.
The diff coverage is 83%.

@@              Coverage Diff               @@
##           release/1.3.x   #8206    +/-   ##
==============================================
+ Coverage             86%     87%    +2%     
==============================================
  Files                200     200            
  Lines              13069   13096    +27     
==============================================
+ Hits               11213   11434   +221     
+ Misses              1856    1662   -194     

@Borda Borda added the ready PRs ready to be merged label Jun 30, 2021
@Borda
Copy link
Member Author

Borda commented Jun 30, 2021

@tchaton is this also the logging you were talking about?

E       torch.multiprocessing.spawn.ProcessRaisedException: 
E       
E       -- Process 0 terminated with the following error:
E       Traceback (most recent call last):
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
E           fn(i, *args)
E         File "/home/runner/work/pytorch-lightning/pytorch-lightning/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 175, in new_process
E           self.transfer_distrib_spawn_state_on_fit_end(results)
E         File "/home/runner/work/pytorch-lightning/pytorch-lightning/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 252, in transfer_distrib_spawn_state_on_fit_end
E           atomic_save(self.on_save(self.lightning_module.state_dict()), last_path)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1259, in state_dict
E           module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 421, in state_dict
E           with self.sync_context(dist_sync_fn=self.dist_sync_fn):
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/contextlib.py", line 113, in __enter__
E           return next(self.gen)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 299, in sync_context
E           cache = self.sync(
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 272, in sync
E           self._sync_dist(dist_sync_fn, process_group=process_group)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 213, in _sync_dist
E           output_dict = apply_to_collection(
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/data.py", line 195, in apply_to_collection
E           return elem_type({k: apply_to_collection(v, dtype, function, *args, **kwargs) for k, v in data.items()})
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/data.py", line 195, in <dictcomp>
E           return elem_type({k: apply_to_collection(v, dtype, function, *args, **kwargs) for k, v in data.items()})
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/data.py", line 191, in apply_to_collection
E           return function(data, *args, **kwargs)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/distributed.py", line 120, in gather_all_tensors
E           torch.distributed.barrier(group=group)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2531, in barrier
E           work.wait()
E       RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.1.0.19]:33684

@tchaton
Copy link
Contributor

tchaton commented Jun 30, 2021

@tchaton is this also the logging you were talking about?

E       torch.multiprocessing.spawn.ProcessRaisedException: 
E       
E       -- Process 0 terminated with the following error:
E       Traceback (most recent call last):
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
E           fn(i, *args)
E         File "/home/runner/work/pytorch-lightning/pytorch-lightning/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 175, in new_process
E           self.transfer_distrib_spawn_state_on_fit_end(results)
E         File "/home/runner/work/pytorch-lightning/pytorch-lightning/pytorch_lightning/plugins/training_type/ddp_spawn.py", line 252, in transfer_distrib_spawn_state_on_fit_end
E           atomic_save(self.on_save(self.lightning_module.state_dict()), last_path)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1259, in state_dict
E           module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 421, in state_dict
E           with self.sync_context(dist_sync_fn=self.dist_sync_fn):
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/contextlib.py", line 113, in __enter__
E           return next(self.gen)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 299, in sync_context
E           cache = self.sync(
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 272, in sync
E           self._sync_dist(dist_sync_fn, process_group=process_group)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/metric.py", line 213, in _sync_dist
E           output_dict = apply_to_collection(
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/data.py", line 195, in apply_to_collection
E           return elem_type({k: apply_to_collection(v, dtype, function, *args, **kwargs) for k, v in data.items()})
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/data.py", line 195, in <dictcomp>
E           return elem_type({k: apply_to_collection(v, dtype, function, *args, **kwargs) for k, v in data.items()})
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/data.py", line 191, in apply_to_collection
E           return function(data, *args, **kwargs)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torchmetrics/utilities/distributed.py", line 120, in gather_all_tensors
E           torch.distributed.barrier(group=group)
E         File "/opt/hostedtoolcache/Python/3.8.10/x64/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 2531, in barrier
E           work.wait()
E       RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:598] Connection closed by peer [10.1.0.19]:33684

Yes.

@awaelchli
Copy link
Contributor

one test failing, not yet sure what's up

@pep8speaks
Copy link

pep8speaks commented Jun 30, 2021

Hello @Borda! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-06-30 17:12:46 UTC

@awaelchli awaelchli force-pushed the tm/v0.4 branch 2 times, most recently from fd867aa to 84a4567 Compare June 30, 2021 16:48
@lexierule lexierule merged commit afc69e4 into release/1.3.x Jun 30, 2021
@lexierule lexierule deleted the tm/v0.4 branch June 30, 2021 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants