Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destroy process group in DDP destructor #8080

Closed
wants to merge 37 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c700cab
Add ddp training type teardown
carmocca Jun 22, 2021
e5602c9
Update CHANGELOG
carmocca Jun 22, 2021
0b94b6c
Use destructor
carmocca Jun 23, 2021
aaf32ab
Update CHANGELOG.md
carmocca Jun 23, 2021
0444d54
RPC destructor
carmocca Jun 23, 2021
5d4f811
Update pytorch_lightning/plugins/training_type/ddp.py
carmocca Jun 23, 2021
bf8766d
Why do you not work :(
carmocca Jun 23, 2021
48bcb7e
Missing condition
carmocca Jun 23, 2021
5d6fa39
Merge branch 'master' into bug/teardown-ddp-process-group
carmocca Jun 23, 2021
21ad2d8
Fix deepspeed test
carmocca Jun 24, 2021
bbc489e
GC collect in conftest
carmocca Jun 24, 2021
5b06fd2
Do not show warnings for special tests
carmocca Jun 24, 2021
5e69ed8
Needs to run on 1.8
carmocca Jun 24, 2021
aed51a2
Run torch 1.8
carmocca Jun 24, 2021
e0a3e87
Skip test due to 'Python bus error'
carmocca Jun 24, 2021
9ee2d19
Debug NCCL
carmocca Jun 24, 2021
3588aaa
shm size
carmocca Jun 24, 2021
067bf1a
Disable warnings for special tests
carmocca Jun 24, 2021
6060b05
Remove NCCL_DEBUG statement
carmocca Jun 24, 2021
f0fa1b7
Try smaller shm size
carmocca Jun 24, 2021
6dd7038
Revert "Skip test due to 'Python bus error'"
carmocca Jun 24, 2021
53082bf
Merge branch 'ci/gpu-tests-torch-1.8' into bug/teardown-ddp-process-g…
carmocca Jun 24, 2021
73e62f8
README and adjust versions
carmocca Jun 24, 2021
902ef02
Avoid self.on_gpu call
carmocca Jun 24, 2021
4ce0f9a
empty cache cleanup
carmocca Jun 24, 2021
990b2e9
Merge branch 'master' into bug/teardown-ddp-process-group
carmocca Jun 24, 2021
738daa5
More garbage collection
carmocca Jun 24, 2021
236aa97
Unroll parametrizations
awaelchli Jun 24, 2021
ffa532d
Do not reuse mock
carmocca Jun 24, 2021
7a12354
Remove abbreviation
carmocca Jun 25, 2021
4e14803
Merge branch 'master' into bug/teardown-ddp-process-group
carmocca Jun 25, 2021
c59da71
Merge branch 'master' into bug/teardown-ddp-process-group
carmocca Jun 29, 2021
74d8a7d
Has initialized ddp
carmocca Jun 29, 2021
f457aad
Merge branch 'master' into bug/teardown-ddp-process-group
carmocca Jul 2, 2021
3c2ac10
Merge master
carmocca Jul 2, 2021
9ee6ee2
Merge master
carmocca Jul 2, 2021
fc6338e
Unnecessary annotation
carmocca Jul 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Use destructor
carmocca committed Jun 23, 2021
commit 0b94b6c269cd04c3ec495a0beebd58bcda949b29
3 changes: 1 addition & 2 deletions pytorch_lightning/plugins/training_type/ddp.py
Original file line number Diff line number Diff line change
@@ -374,7 +374,6 @@ def register_plugins(cls, plugin_registry: Dict) -> None:
find_unused_parameters=False
)

def teardown(self) -> None:
def __del__(self) -> None:
if torch_distrib.is_initialized():
torch_distrib.destroy_process_group()
super().teardown()
4 changes: 3 additions & 1 deletion tests/accelerators/test_ddp.py
Original file line number Diff line number Diff line change
@@ -109,6 +109,7 @@ class TestModel(BoringModel):

def setup(self, stage: Optional[str] = None) -> None:
assert torch.distributed.is_initialized()
raise SystemExit()

model = TestModel()
trainer = Trainer(
@@ -117,7 +118,8 @@ def setup(self, stage: Optional[str] = None) -> None:
accelerator="ddp",
gpus=1,
)
trainer.fit(model)
with pytest.raises(SystemExit):
trainer.fit(model)


@RunIf(min_gpus=2, min_torch="1.8.1", special=True)