Lightning Lite core and tests #10175

awaelchli · 2021-10-27T08:20:05Z

What does this PR do?

Part of #9987

This is the V1 for the new Lightning Lite package. This PR adds the main Lite components and tests. Planned to be released as part of 1.5.
For the docs, see the follow up #10176.

Demo

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

I made sure I had fun coding 🙃

Part of #1 (it's a lie, this is just here to avoid noisy GitHub bot)

tests/lite/test_lite.py

tchaton

Awesome !

tests/lite/test_parity.py

… Remove tpu spawn

…Lightning/pytorch-lightning into lightning-lite/lite-core

carmocca

(Requesting changes to allow myself to fully review)

ananthsub

if someone calls run twice, does the full accelerator setup happen again? if so, is that always desirable?

pytorch_lightning/lite/lite.py

carmocca

🚀🚀🚀🚀

pytorch_lightning/lite/wrappers.py

…ightning-lite/lite-core

aazzolini · 2021-10-28T19:57:49Z

Why do we need a run() method and an abstract class?

It looks completely out of place to me and misleading. I really like the simplicitly of LightningLite as a concrete class.
The run method here is completely out of place. The docstring is wrong, because nothing gets accelerated because it is inside the "run" function, no other places call the "run" method.

I believe it would be great if we could remove the "run" method from here and make LightingLite a concrete class.

aazzolini · 2021-10-28T20:03:34Z

Dear @aazzolini,

Let's keep PR conversation there.

About #10208, you are right, it could be an Iterator. Let's us investigate if there were some design constrains forcing a DataLoader here.

About #10207, the reason for a run function is 2 folds. First, it won't work with all ddp_spawn related plugins without it e.g TPU, secondly it is required for DeepSpeed to perform sharding instantiation under the context. The run function was the simplest API enabling all plugins to work together with the minimal code changes required for the users from raw PyTorch / across plugins. We invested multiple designs and this one was the simplest and less opiniated from all.

The problem is that the current POC doesn't exercise this use case. It's bad because it's not clear and i don't think it can be done in a clean way. If we ask the user to implement run() and then we ask the user to directly call run() there's no opportunity to incercept anything in between. Does it mean that we will have to add yet another wrapper around LighntiungLite as well?

I'd be inclined to ask you folks to exercise these use cases before moving ahead with this prototype and closing on a public API because I'm almost certain it would have to change in order to accomodate these very use cases.

justusschock · 2021-10-29T10:01:56Z

If we ask the user to implement run() and then we ask the user to directly call run() there's no opportunity to incercept anything in between

Why would we have to intercept anything there? All our logic lives either before or after the user-defined run-method.

Does it mean that we will have to add yet another wrapper around LighntiungLite as well?

Definitely not. This is meant as an entry point to lightning for people not willing to fully convert. It will provide some features like hardware acceleration but it won't at any point offer all the features Lightning does. That was never the plan.

awaelchli · 2021-10-29T10:17:23Z

@aazzolini The reason we have a run method under a class is to avoid an inconvenience for the user to manually change their code if they switch from (for example) Lite(gpus=8) to Lite(tpu_cores=8). What would happen if we didn't have a run() method? Let's see:

# python script
# training on GPUs

def main():
    lite = LightningLite(gpus=2)  # no run method 

    lite.setup()
    # train loop 


if __name__ == "__main__"
   main()

Everything works here. It runs as a script or even in a Jupyter notebook. The user want's to try if TPUs have better performance for their use case. They think all they have to do is

lite = LightningLite(tpu_cores=8)  # no run method

Will that wok? No. The TPU code here would require a function to spawn processes from. This is annoying, because we would need to rewrite the code into this:

def main():
    lite = LightningLite(tpu_cores=?)  # no run method 

    lite.setup()
    # train loop 


if __name__ == "__main__"
   xmp.spawn(main, ..., nprocs=8, method="fork")

Lightning does all that for you and can do so because the users code is not just "anywhere" but in the LightningModule. We wanted to have the same convenience with the run() method in Lite. In the above case, under the hood, Lite will do this:

xmp.spawn(self.run, nprocs=..., method=...) for the user when they call lite.run().

By forcing the user to put their code into the run method, Lite can consistently choose the right operation based on the accelerator choices of the user.

How much effort is it to convert the above code to a class + run method?

class Lite(LightningLite):
    # rename main to run and indent your code. Done!
    def run():
        ...


if __name__ == "__main__"
   # main()
   lite = Lite(gpus=...)

Plus (big plus imo): No launcher utilities. The user just runs python train.py as they would normally, or run their cell in the notebook as usual.

Beyond that, there are benefits to converting to Lightning from here, as there is already a class + method which mimics what the LightningModule would look like.

…Lightning/pytorch-lightning into lightning-lite/lite-core

awaelchli added 2 commits October 27, 2021 09:25

lightning lite package and tests

782c70f

update changelog

be39098

awaelchli added this to the v1.5 milestone Oct 27, 2021

awaelchli added feature Is an improvement or enhancement priority: 0 High priority task fabric lightning.fabric.Fabric labels Oct 27, 2021

awaelchli mentioned this pull request Oct 27, 2021

Lightning Lite docs #10176

Merged

11 tasks

kaushikb11 reviewed Oct 27, 2021

View reviewed changes

tests/lite/test_lite.py Show resolved Hide resolved

awaelchli mentioned this pull request Oct 27, 2021

Lightning Lite Examples #9987

Merged

23 tasks

awaelchli marked this pull request as ready for review October 27, 2021 15:54

awaelchli requested review from Borda, carmocca, justusschock, rohitgr7, SeanNaren, tchaton and williamFalcon as code owners October 27, 2021 15:54

justusschock approved these changes Oct 27, 2021

View reviewed changes

tchaton approved these changes Oct 27, 2021

View reviewed changes

tests/lite/test_parity.py Outdated Show resolved Hide resolved

mergify bot added the ready PRs ready to be merged label Oct 27, 2021

tchaton and others added 7 commits October 27, 2021 17:28

Merge branch 'master' into lightning-lite/lite-core

34b0e89

update

e45f736

Docstrings and CHANGELOG

0deceba

Fixes to previous commit. Mention devices=auto (not yet implemented).…

5d14e83

… Remove tpu spawn

Fix test

11862e8

Fix test

ffed5ce

Merge branch 'lightning-lite/lite-core' of https://github.com/PyTorch…

538b969

…Lightning/pytorch-lightning into lightning-lite/lite-core

carmocca previously requested changes Oct 27, 2021

View reviewed changes

mergify bot removed the ready PRs ready to be merged label Oct 27, 2021

ananthsub reviewed Oct 28, 2021

View reviewed changes

awaelchli added 3 commits October 28, 2021 18:30

add isinstance check

ba7ac5f

add bfloat16

f04b398

rename params_on_cpu

229b024

awaelchli commented Oct 28, 2021

View reviewed changes

pytorch_lightning/lite/lite.py Show resolved Hide resolved

carmocca added 5 commits October 28, 2021 20:10

Pass down the barrier name

95db246

Add back __del__

0c8e914

Fix mypy

a93278d

Fix test

65e289b

Add worker init fn

d408228

carmocca approved these changes Oct 28, 2021

View reviewed changes

carmocca reviewed Oct 28, 2021

View reviewed changes

pytorch_lightning/lite/wrappers.py Outdated Show resolved Hide resolved

carmocca and others added 5 commits October 28, 2021 21:15

Forgot to pass the global rank

952e11c

add back skip of expensive spawn test

50d5124

resolve todo in _LiteModule

13fb58a

Merge remote-tracking branch 'origin/lightning-lite/lite-core' into l…

2e92fe1

…ightning-lite/lite-core

Add seed everything test

9a1e93f

fix type error

f47c2ad

awaelchli and others added 6 commits October 29, 2021 15:07

Merge branch 'master' into lightning-lite/lite-core

5baffe6

update

280dd4d

Merge branch 'lightning-lite/lite-core' of https://github.com/PyTorch…

46070bc

…Lightning/pytorch-lightning into lightning-lite/lite-core

update lambda_closure -> closure

947f9ca

Merge branch 'master' into lightning-lite/lite-core

05ea78d

update

015051a

tchaton enabled auto-merge (squash) October 29, 2021 21:15

tchaton merged commit 9d136a9 into master Oct 29, 2021

tchaton deleted the lightning-lite/lite-core branch October 29, 2021 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lightning Lite core and tests #10175

Lightning Lite core and tests #10175

awaelchli commented Oct 27, 2021 •

edited

Loading

tchaton left a comment

carmocca left a comment

ananthsub left a comment

carmocca left a comment

aazzolini commented Oct 28, 2021

aazzolini commented Oct 28, 2021

justusschock commented Oct 29, 2021

awaelchli commented Oct 29, 2021 •

edited

Loading

Lightning Lite core and tests #10175

Lightning Lite core and tests #10175

Conversation

awaelchli commented Oct 27, 2021 • edited Loading

What does this PR do?

Demo

Before submitting

PR review

Did you have fun?

tchaton left a comment

Choose a reason for hiding this comment

carmocca left a comment

Choose a reason for hiding this comment

ananthsub left a comment

Choose a reason for hiding this comment

carmocca left a comment

Choose a reason for hiding this comment

aazzolini commented Oct 28, 2021

aazzolini commented Oct 28, 2021

justusschock commented Oct 29, 2021

awaelchli commented Oct 29, 2021 • edited Loading

awaelchli commented Oct 27, 2021 •

edited

Loading

awaelchli commented Oct 29, 2021 •

edited

Loading