[bugfix] Apex never instantiated. #7274

tchaton · 2021-04-29T13:26:55Z

What does this PR do?

This PR adds a dispatch to accelerator as it is needed for properly instantiating Apex with ddp_spawn.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2021-04-29T13:28:24Z

Codecov Report

Merging #7274 (b48017d) into master (44fd017) will decrease coverage by 0%.
The diff coverage is 100%.

@@          Coverage Diff           @@
##           master   #7274   +/-   ##
======================================
- Coverage      91%     91%   -0%     
======================================
  Files         200     200           
  Lines       12850   12854    +4     
======================================
- Hits        11730   11722    -8     
- Misses       1120    1132   +12

SeanNaren · 2021-04-29T15:43:51Z

Just to ensure I understand from a high level, because the model is only moved to device after calling connect in the precision plugin, you've introduced dispatch that is called after pre_dispatch which allows the precision plugin to be called after the model has moved to the correct device? This fixes the Apex issue as the model has to be moved to CUDA first.

EDIT: after speaking to @tchaton he confirmed, as well as mentioned that this now works for ddp spawn which is pretty neat :)

SeanNaren · 2021-04-29T15:49:31Z

pytorch_lightning/accelerators/accelerator.py

@@ -107,6 +107,11 @@ def pre_dispatch(self, trainer: 'pl.Trainer') -> None:
            self.setup_optimizers(trainer)
        self.precision_plugin.pre_dispatch()

+    def dispatch(self, trainer: 'pl.Trainer') -> None:
+        """Hook to do something before the training/evaluation/prediction starts."""


It might be a good idea to clear up somewhere here that this happens after accelerator setup? Otherwise this looks the same as pre_dispatch

@tchaton
As the order of hooks being executed could be confusing.

i find the pre/dispatch/post confusing now :/

Yes, we should think about the naming of these hooks. But more importantly, I think we can do a better job at formally defining what these hook are supposed to do. Maybe another action item for 1.3 is to do a full pass over the plugins and improve all these docs. That would help everyone 1. implementing plugins 2. fix 3. review Plugin PRs.

n00b question: would this be easier if the precision plugin was owned by the training type plugin instead of the accelerator?

Yes, I think that could make it easier to interleave these operations between one plugin and the other. Here in this PR we see that the precision plugin needs to configure the model before it is wrapped, and needs to overwrite the reference in the training plugin. this really breaks the contract that these plugins currently have with each other.

Yes, we should definitely refactor this and move optimizers, lr_schedulers to the training_type_plugin.

SeanNaren

Approved, but definitely would like to see @justusschock or @awaelchli comments!

ananthsub · 2021-04-29T17:45:49Z

pytorch_lightning/accelerators/accelerator.py

@@ -107,6 +107,11 @@ def pre_dispatch(self, trainer: 'pl.Trainer') -> None:
            self.setup_optimizers(trainer)
        self.precision_plugin.pre_dispatch()

+    def dispatch(self, trainer: 'pl.Trainer') -> None:
+        """Hook to do something before the training/evaluation/prediction starts."""


i find the pre/dispatch/post confusing now :/

ananthsub · 2021-04-29T17:47:23Z

pytorch_lightning/plugins/precision/apex_amp.py

+    def dispatch(self, trainer: "pl.Trainer") -> None:
+        if not self._connected:
+            accelerator = trainer.accelerator
+            model, optimizers = self.configure_apex(accelerator.lightning_module, accelerator.optimizers)
+            self.reinit_scheduler_properties(optimizers, accelerator.lr_schedulers)
+            self.model = model
+            self.optimizers = optimizers
+            self._connected = True
+        return super().dispatch(trainer)


its not specific to apex. any optimizer that defines state which isn't lazily instantiated needs to handle the device move?

Hey @ananthsub. Not sure to fully follow you :)

@ananthsub I think this is handled by @awaelchli in #7277

tchaton · 2021-04-29T17:50:03Z

Hey @ananthsub,

I will make another PR to rename them. on_predispatch, on_dispatch, etc..

Best,
T.C

awaelchli

cautious approval
please see my comments :))

pytorch_lightning/plugins/precision/apex_amp.py

awaelchli · 2021-04-29T19:23:06Z

pytorch_lightning/accelerators/accelerator.py

@@ -107,6 +107,11 @@ def pre_dispatch(self, trainer: 'pl.Trainer') -> None:
            self.setup_optimizers(trainer)
        self.precision_plugin.pre_dispatch()

+    def dispatch(self, trainer: 'pl.Trainer') -> None:
+        """Hook to do something before the training/evaluation/prediction starts."""


Yes, we should think about the naming of these hooks. But more importantly, I think we can do a better job at formally defining what these hook are supposed to do. Maybe another action item for 1.3 is to do a full pass over the plugins and improve all these docs. That would help everyone 1. implementing plugins 2. fix 3. review Plugin PRs.

awaelchli · 2021-04-29T19:37:01Z

tests/plugins/test_amp_plugins.py

+@pytest.mark.parametrize("amp_level", ['O2'])
+def test_amp_apex_ddp_fit(amp_level, tmpdir):
+


We have another apex test in
tests/models/test_amp.py::test_amp_with_apex It uses 1 gpu.
@tchaton your fix only applies to multi-gpu due to the dispatch, am I right? single gpu seems to be ok.

justusschock

Also carefully approving. I get that this is necessary, but I don't like simply assigning optimiser and model to every class :)

pytorch_lightning/plugins/precision/apex_amp.py

…g/pytorch-lightning into apex_resolve_bug

pep8speaks · 2021-04-30T14:25:12Z

Hello @tchaton! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-30 16:50:17 UTC

pytorch_lightning/plugins/precision/apex_amp.py

CHANGELOG.md

…g/pytorch-lightning into apex_resolve_bug

tchaton added 2 commits April 29, 2021 13:22

update

ba35a45

update

7e86c43

tchaton added this to the v1.3 milestone Apr 29, 2021

tchaton self-assigned this Apr 29, 2021

tchaton requested review from awaelchli, Borda, carmocca, justusschock, kaushikb11, SeanNaren and williamFalcon as code owners April 29, 2021 13:26

tchaton changed the title ~~[bugfix] Apex was broken~~ [bugfix] Apex never instantiated. Apr 29, 2021

update apex

d097b34

tchaton added bug Something isn't working priority: 0 High priority task labels Apr 29, 2021

update

940132d

SeanNaren reviewed Apr 29, 2021

View reviewed changes

SeanNaren approved these changes Apr 29, 2021

View reviewed changes

ananthsub reviewed Apr 29, 2021

View reviewed changes

awaelchli approved these changes Apr 29, 2021

View reviewed changes

mergify bot added the has conflicts label Apr 29, 2021

justusschock approved these changes Apr 30, 2021

View reviewed changes

pytorch_lightning/plugins/precision/apex_amp.py Show resolved Hide resolved

pytorch_lightning/plugins/precision/apex_amp.py Outdated Show resolved Hide resolved

Merge branch 'master' into apex_resolve_bug

c8ad9c7

mergify bot removed the has conflicts label Apr 30, 2021

tchaton added 2 commits April 30, 2021 14:23

update

796eb05

Merge branch 'apex_resolve_bug' of https://github.com/PyTorchLightnin…

c4543ba

…g/pytorch-lightning into apex_resolve_bug

tchaton added 3 commits April 30, 2021 15:28

update

bd4523c

remove test.py

8a75b52

update

7d00bd3

awaelchli approved these changes Apr 30, 2021

View reviewed changes

pytorch_lightning/plugins/precision/apex_amp.py Outdated Show resolved Hide resolved

ananthsub reviewed Apr 30, 2021

View reviewed changes

pytorch_lightning/plugins/precision/apex_amp.py Outdated Show resolved Hide resolved

CHANGELOG.md Outdated Show resolved Hide resolved

tchaton added 4 commits April 30, 2021 16:40

update

40492d6

Merge branch 'apex_resolve_bug' of https://github.com/PyTorchLightnin…

2144453

…g/pytorch-lightning into apex_resolve_bug

update on comments

82ab406

update changelog

ae27b60

mergify bot added the has conflicts label Apr 30, 2021

tchaton added 3 commits April 30, 2021 17:46

update

a071044

update

d28e8b6

Merge branch 'master' into apex_resolve_bug

875f7ea

mergify bot removed the has conflicts label Apr 30, 2021

typo

b48017d

tchaton added the admin Requires admin privileges to merge label Apr 30, 2021

lexierule merged commit 16d6c98 into master Apr 30, 2021

lexierule deleted the apex_resolve_bug branch April 30, 2021 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bugfix] Apex never instantiated. #7274

[bugfix] Apex never instantiated. #7274

tchaton commented Apr 29, 2021 •

edited

Loading

codecov bot commented Apr 29, 2021 •

edited

Loading

SeanNaren commented Apr 29, 2021 •

edited

Loading

SeanNaren Apr 29, 2021

kaushikb11 Apr 29, 2021

ananthsub Apr 29, 2021

awaelchli Apr 29, 2021

ananthsub Apr 30, 2021

awaelchli Apr 30, 2021

tchaton Apr 30, 2021

SeanNaren left a comment

ananthsub Apr 29, 2021

ananthsub Apr 29, 2021

tchaton Apr 29, 2021

justusschock Apr 30, 2021

tchaton commented Apr 29, 2021

awaelchli left a comment •

edited

Loading

awaelchli Apr 29, 2021

awaelchli Apr 29, 2021 •

edited

Loading

tchaton Apr 30, 2021

justusschock left a comment

pep8speaks commented Apr 30, 2021 •

edited

Loading

		@pytest.mark.parametrize("amp_level", ['O2'])
		def test_amp_apex_ddp_fit(amp_level, tmpdir):

[bugfix] Apex never instantiated. #7274

[bugfix] Apex never instantiated. #7274

Conversation

tchaton commented Apr 29, 2021 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Apr 29, 2021 • edited Loading

Codecov Report

SeanNaren commented Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SeanNaren left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tchaton commented Apr 29, 2021

awaelchli left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awaelchli Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justusschock left a comment

Choose a reason for hiding this comment

pep8speaks commented Apr 30, 2021 • edited Loading

Comment last updated at 2021-04-30 16:50:17 UTC

tchaton commented Apr 29, 2021 •

edited

Loading

codecov bot commented Apr 29, 2021 •

edited

Loading

SeanNaren commented Apr 29, 2021 •

edited

Loading

awaelchli left a comment •

edited

Loading

awaelchli Apr 29, 2021 •

edited

Loading

pep8speaks commented Apr 30, 2021 •

edited

Loading