4/n Move Accelerator into strategy - remove X_step() from accelerator #10890

four4fish · 2021-12-02T01:48:16Z

What does this PR do?

Remove training_step() from Accelerator, and call strategy.training_step directly

In the following graph: Each color is one logic.

[RFC] Should we have training_step() in Parallel Plugin, then DDPSpawning, DP and DDP doesn't need to duplicate the logic. Instead we need training_step in Horovod.
cc: @awaelchli @ananthsub @carmocca @justusschock

Removed TPUSpawning.training_step() as the logic is same as the super class DDPSpawn

Part of #10648

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

codecov · 2021-12-02T07:28:02Z

Codecov Report

Merging #10890 (3d6fa8b) into master (6043179) will decrease coverage by 4%.
The diff coverage is 51%.

❗ Current head 3d6fa8b differs from pull request most recent head 23b5e94. Consider uploading reports for the commit 23b5e94 to get more accurate results

@@           Coverage Diff            @@
##           master   #10890    +/-   ##
========================================
- Coverage      92%      88%    -4%     
========================================
  Files         177      177            
  Lines       16484    16570    +86     
========================================
- Hits        15132    14589   -543     
- Misses       1352     1981   +629

pytorch_lightning/plugins/training_type/ddp.py

awaelchli · 2021-12-02T10:23:42Z

[RFC] Should we have training_step() in Parallel Plugin, then DDPSpawning, DP and DDP doesn't need to duplicate the logic. Instead we need training_step in Horovod.

I would keep the overrides for all plugins that wrap the model with a class like DistributedDataParallel for example. This way, the definition of how the model gets wrapped lies in the plugin together with how the training_step gets called through that wrapper.

four4fish · 2021-12-02T18:05:12Z

[RFC] Should we have training_step() in Parallel Plugin, then DDPSpawning, DP and DDP doesn't need to duplicate the logic. Instead we need training_step in Horovod.

I would keep the overrides for all plugins that wrap the model with a class like DistributedDataParallel for example. This way, the definition of how the model gets wrapped lies in the plugin together with how the training_step gets called through that wrapper.

@awaelchli I agree, even have duplicate logic is better than fail silently or inherent silencely. In the last step of the refactor, we can regrouping logics and flatten the inheritance

for more information, see https://pre-commit.ci

…nto b4

awaelchli

nice!

pytorch_lightning/loops/optimization/optimizer_loop.py

pytorch_lightning/loops/optimization/manual_loop.py

pytorch_lightning/trainer/trainer.py

four4fish requested review from awaelchli, Borda, carmocca, justusschock, kaushikb11, rohitgr7, SeanNaren, tchaton and williamFalcon as code owners December 2, 2021 01:48

four4fish marked this pull request as draft December 2, 2021 01:48

four4fish added accelerator plugin refactor breaking change Includes a breaking change labels Dec 2, 2021

four4fish added this to the 1.6 milestone Dec 2, 2021

four4fish marked this pull request as ready for review December 2, 2021 07:12

justusschock requested changes Dec 2, 2021

View reviewed changes

pytorch_lightning/plugins/training_type/ddp.py Outdated Show resolved Hide resolved

pytorch_lightning/plugins/training_type/ddp.py Outdated Show resolved Hide resolved

mergify bot added the has conflicts label Dec 2, 2021

four4fish marked this pull request as draft December 3, 2021 01:29

This was referenced Dec 3, 2021

Unroll dict input before call Accelerator X_steps and update function type #10907

Closed

Unroll dict input before call Accelerator X_steps #10908

Merged

remove training_step() from accelerator

900d3f6

four4fish force-pushed the b4 branch from d7dcd0a to 900d3f6 Compare December 4, 2021 01:12

four4fish added 2 commits December 3, 2021 17:18

remove function from accelerator

c0b2d01

fix mypy

b537956

four4fish marked this pull request as ready for review December 4, 2021 01:40

mergify bot removed the has conflicts label Dec 4, 2021

four4fish marked this pull request as draft December 4, 2021 02:26

four4fish and others added 3 commits December 3, 2021 18:54

remove test, val, predict step

c593c9f

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f38289

for more information, see https://pre-commit.ci

Merge branch 'b4' of https://github.com/four4fish/pytorch-lightning i…

b9eb1c7

…nto b4

four4fish changed the title ~~4/n Move Accelerator into strategy - remove training_step() from accelerator~~ 4/n Move Accelerator into strategy - remove X_step() from accelerator Dec 4, 2021

four4fish added 3 commits December 3, 2021 19:19

update

47c4da3

fix typo

0e0b6ae

fix type

3d6fa8b

four4fish marked this pull request as ready for review December 4, 2021 03:29

ananthsub approved these changes Dec 4, 2021

View reviewed changes

four4fish requested a review from justusschock December 4, 2021 06:08

mergify bot added has conflicts and removed has conflicts labels Dec 4, 2021

awaelchli approved these changes Dec 5, 2021

View reviewed changes

pytorch_lightning/loops/optimization/optimizer_loop.py Outdated Show resolved Hide resolved

awaelchli self-assigned this Dec 5, 2021

awaelchli enabled auto-merge (squash) December 5, 2021 02:26

carmocca approved these changes Dec 6, 2021

View reviewed changes

justusschock approved these changes Dec 6, 2021

View reviewed changes

mergify bot added ready PRs ready to be merged has conflicts labels Dec 6, 2021

awaelchli mentioned this pull request Dec 6, 2021

Re-design call_hook interface #10575

Merged

12 tasks

tchaton reviewed Dec 6, 2021

View reviewed changes

pytorch_lightning/loops/optimization/manual_loop.py Outdated Show resolved Hide resolved

daniellepintz reviewed Dec 6, 2021

View reviewed changes

pytorch_lightning/trainer/trainer.py Outdated Show resolved Hide resolved

awaelchli added 2 commits December 6, 2021 17:51

Merge branch 'master' into b4

e0de129

call hook on ttp instead of acc

dd30690

awaelchli force-pushed the b4 branch from 3969136 to dd30690 Compare December 6, 2021 16:57

mergify bot removed the has conflicts label Dec 6, 2021

daniellepintz reviewed Dec 6, 2021

View reviewed changes

pytorch_lightning/trainer/trainer.py Show resolved Hide resolved

awaelchli merged commit 63bb4ec into Lightning-AI:master Dec 6, 2021

daniellepintz mentioned this pull request Dec 8, 2021

Remove _call_accelerator_hook Trainer method #10999

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4/n Move Accelerator into strategy - remove X_step() from accelerator #10890

4/n Move Accelerator into strategy - remove X_step() from accelerator #10890

four4fish commented Dec 2, 2021 •

edited

Loading

codecov bot commented Dec 2, 2021 •

edited

Loading

awaelchli commented Dec 2, 2021

four4fish commented Dec 2, 2021 •

edited

Loading

awaelchli left a comment

4/n Move Accelerator into strategy - remove X_step() from accelerator #10890

4/n Move Accelerator into strategy - remove X_step() from accelerator #10890

Conversation

four4fish commented Dec 2, 2021 • edited Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

codecov bot commented Dec 2, 2021 • edited Loading

Codecov Report

awaelchli commented Dec 2, 2021

four4fish commented Dec 2, 2021 • edited Loading

awaelchli left a comment

Choose a reason for hiding this comment

four4fish commented Dec 2, 2021 •

edited

Loading

codecov bot commented Dec 2, 2021 •

edited

Loading

four4fish commented Dec 2, 2021 •

edited

Loading