Meta device initialization for FSDP in Trainer #18385

awaelchli · 2023-08-24T13:59:07Z

What does this PR do?

Same approach as in #18122

trainer = Trainer(strategy="fsdp")

with trainer.init_module(empty_init=True):
    model = Model() # the model is now on the meta device (no memory occupied)
...
# materialization and param init and sharding happens in `strategy.setup`
trainer.fit(model)

This allows you to instantiate very large models that wouldn't fit in memory (either CPU or GPU) as fast as possible. No memory for weights get allocated, neither in CPU nor GPU memory and parameters are materialized/initialized with random weights directly at the time the model gets wrapped and sharded in FSDPStrategy.setup().

Notes:

This new feature is possible thanks to [RFC] Revisiting Meta Device Initialization with reset_parameters() pytorch/pytorch#104187 in PyTorch 2.1 nightly.
Requirement: Your submodules define a reset_parameters() method that can be called to init the params. This is the case for all (most) built-in PyTorch layers. If you have a custom layer, you'd have to add that method. This PR also updates our Trainsformers example that needs this reset_parameters() fixed.
Documentation will be updated in a follow-up, both for Fabric and Trainer.

cc @Borda @awaelchli @carmocca

for more information, see https://pre-commit.ci

src/lightning/pytorch/demos/transformer.py

for more information, see https://pre-commit.ci

github-actions · 2023-08-24T23:48:42Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu (macOS-11, lightning, 3.8, 1.11)	success	✅
pl-cpu (macOS-11, lightning, 3.9, 1.12)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 1.13)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 2.0)	success	✅
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.12)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.0)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11)	success	✅
pl-cpu (windows-2022, lightning, 3.9, 1.12)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 1.13)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 2.0)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (macOS-11, pytorch, 3.8, 1.13)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13)	success	✅
pl-cpu (windows-2022, pytorch, 3.8, 1.13)	success	✅

These checks are required after the changes to src/lightning/pytorch/demos/transformer.py, src/lightning/pytorch/strategies/fsdp.py, tests/tests_pytorch/strategies/test_fsdp.py.

🟢 pytorch_lightning: Azure GPU

Check ID	Status
[pytorch-lightning (GPUs) (testing Lightning	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=171443&view=logs&jobId=47e66f3c-897a-5428-da11-bf5c7745762e)	success
[pytorch-lightning (GPUs) (testing PyTorch	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=171443&view=logs&jobId=3f274fac-2e11-54ca-487e-194c91f3ae9f)	success

These checks are required after the changes to src/lightning/pytorch/demos/transformer.py, src/lightning/pytorch/strategies/fsdp.py, tests/tests_pytorch/strategies/test_fsdp.py.

🟢 pytorch_lightning: Benchmarks

Check ID	Status
lightning.Benchmarks	success	✅

These checks are required after the changes to src/lightning/pytorch/demos/transformer.py, src/lightning/pytorch/strategies/fsdp.py.

🟢 pytorch_lightning: Docs

Check ID	Status
docs-checks (pytorch, doctest)	success	✅
make-html (pytorch)	success	✅

These checks are required after the changes to src/lightning/pytorch/demos/transformer.py, src/lightning/pytorch/strategies/fsdp.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to src/lightning/pytorch/demos/transformer.py, src/lightning/pytorch/strategies/fsdp.py.

🟢 install

Check ID	Status
install-pkg (ubuntu-22.04, app, 3.8)	success	✅
install-pkg (ubuntu-22.04, app, 3.10)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.8)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.10)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.8)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.10)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.8)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.10)	success	✅
install-pkg (ubuntu-22.04, notset, 3.8)	success	✅
install-pkg (ubuntu-22.04, notset, 3.10)	success	✅
install-pkg (macOS-12, app, 3.8)	success	✅
install-pkg (macOS-12, app, 3.10)	success	✅
install-pkg (macOS-12, fabric, 3.8)	success	✅
install-pkg (macOS-12, fabric, 3.10)	success	✅
install-pkg (macOS-12, pytorch, 3.8)	success	✅
install-pkg (macOS-12, pytorch, 3.10)	success	✅
install-pkg (macOS-12, lightning, 3.8)	success	✅
install-pkg (macOS-12, lightning, 3.10)	success	✅
install-pkg (macOS-12, notset, 3.8)	success	✅
install-pkg (macOS-12, notset, 3.10)	success	✅
install-pkg (windows-2022, app, 3.8)	success	✅
install-pkg (windows-2022, app, 3.10)	success	✅
install-pkg (windows-2022, fabric, 3.8)	success	✅
install-pkg (windows-2022, fabric, 3.10)	success	✅
install-pkg (windows-2022, pytorch, 3.8)	success	✅
install-pkg (windows-2022, pytorch, 3.10)	success	✅
install-pkg (windows-2022, lightning, 3.8)	success	✅
install-pkg (windows-2022, lightning, 3.10)	success	✅
install-pkg (windows-2022, notset, 3.8)	success	✅
install-pkg (windows-2022, notset, 3.10)	success	✅

These checks are required after the changes to src/lightning/pytorch/demos/transformer.py, src/lightning/pytorch/strategies/fsdp.py.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

src/lightning/pytorch/demos/transformer.py

src/lightning/pytorch/strategies/fsdp.py

Co-authored-by: Jirka Borovec <[email protected]>

src/lightning/pytorch/strategies/fsdp.py

awaelchli added 4 commits August 24, 2023 15:35

transformer patch

aecb71a

update pos embedding

5a55da2

x

d20ffb8

meta

c0b0697

awaelchli added this to the 2.1 milestone Aug 24, 2023

awaelchli added feature Is an improvement or enhancement strategy: fsdp Fully Sharded Data Parallel pl Generic label for PyTorch Lightning package labels Aug 24, 2023

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 24, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

9195e2f

for more information, see https://pre-commit.ci

awaelchli commented Aug 24, 2023

View reviewed changes

src/lightning/pytorch/demos/transformer.py Outdated Show resolved Hide resolved

awaelchli and others added 8 commits August 24, 2023 16:16

missing chlog

604c32a

consistent titles

7cf56d0

Merge branch 'master' into feature/fsdp-meta

4f4e714

add test

e876fa1

test

4aa609d

update

d8b3c9b

[pre-commit.ci] auto fixes from pre-commit.com hooks

11ae197

for more information, see https://pre-commit.ci

revert

b877e73

awaelchli marked this pull request as ready for review August 24, 2023 23:48

awaelchli requested review from carmocca, justusschock, Borda and williamFalcon as code owners August 24, 2023 23:48

awaelchli commented Aug 24, 2023

View reviewed changes

src/lightning/pytorch/demos/transformer.py Outdated Show resolved Hide resolved

update

ec948aa

Borda approved these changes Aug 25, 2023

View reviewed changes

carmocca approved these changes Aug 25, 2023

View reviewed changes

src/lightning/pytorch/strategies/fsdp.py Outdated Show resolved Hide resolved

mergify bot added the ready PRs ready to be merged label Aug 25, 2023

Update src/lightning/pytorch/demos/transformer.py

708f77e

Co-authored-by: Jirka Borovec <[email protected]>

awaelchli commented Aug 25, 2023

View reviewed changes

src/lightning/pytorch/strategies/fsdp.py Outdated Show resolved Hide resolved

src/lightning/pytorch/strategies/fsdp.py Outdated Show resolved Hide resolved

Apply suggestions from code review

d2456bd

awaelchli merged commit 3d41313 into master Aug 25, 2023

awaelchli deleted the feature/fsdp-meta branch August 25, 2023 18:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta device initialization for FSDP in Trainer #18385

Meta device initialization for FSDP in Trainer #18385

awaelchli commented Aug 24, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Aug 24, 2023 •

edited

Loading

Meta device initialization for FSDP in Trainer #18385

Meta device initialization for FSDP in Trainer #18385

Conversation

awaelchli commented Aug 24, 2023 • edited by github-actions bot Loading

What does this PR do?

github-actions bot commented Aug 24, 2023 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

awaelchli commented Aug 24, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Aug 24, 2023 •

edited

Loading