Support empty weight initialization in `Fabric.init_module()` #17627

awaelchli · 2023-05-14T15:53:13Z

What does this PR do?

Adds a toggle on the Fabric.init_module that speeds up initialization and memory allocation for a large model.

with fabric.init_module(empty_weights=True):
    # it is very fast to initialize, and depending on the strategy allocates no memory, or uninitialized memory
    model = MyModel()
# weights get loaded into the model
model.load_state_dict(checkpoint["state_dict"])

Useful for finetuning / loading weights into a large model.

See how Fabric.init_module(empty_weights=True) can be applied in lit-llama to minimize boilerplate logic: Lightning-AI/lit-llama#360

cc @Borda @carmocca @justusschock @awaelchli

for more information, see https://pre-commit.ci

…/revert-17488

for more information, see https://pre-commit.ci

src/lightning/fabric/fabric.py

src/lightning/fabric/strategies/deepspeed.py

src/lightning/fabric/utilities/init.py

src/lightning/fabric/strategies/fsdp.py

tests/tests_fabric/utilities/test_init.py

for more information, see https://pre-commit.ci

codecov · 2023-06-06T00:31:18Z

Codecov Report

Merging #17627 (0c7fda9) into master (420eb6f) will decrease coverage by 23%.
The diff coverage is 79%.

❗ Current head 0c7fda9 differs from pull request most recent head 4d71cdd. Consider uploading reports for the commit 4d71cdd to get more accurate results

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #17627     +/-   ##
=========================================
- Coverage      84%      61%    -23%     
=========================================
  Files         419      415      -4     
  Lines       31721    31662     -59     
=========================================
- Hits        26634    19382   -7252     
- Misses       5087    12280   +7193

carmocca

Implementation looks good.

I would personally go for empty_init.

src/lightning/fabric/utilities/init.py

for more information, see https://pre-commit.ci

src/lightning/fabric/CHANGELOG.md

docs/source-fabric/api/fabric_methods.rst

Co-authored-by: Carlos Mocholí <[email protected]>

carmocca · 2023-06-23T00:21:13Z

src/lightning/fabric/strategies/fsdp.py

        # TODO: Use the meta device and reset parameters after https://github.com/pytorch/pytorch/issues/90465
        # is resolved. For now, the module will get moved to the device in `setup_module`.
-        with self.precision.init_context(), self.module_sharded_context():
+        empty_init_context = (
+            _EmptyInit(enabled=(empty_init is not False)) if _TORCH_GREATER_EQUAL_1_13 else nullcontext()


@awaelchli We might want to reconsider doing empty_init=True by default on FSDP. I just went down a rabbit hole chasing NaNs on FSDP when all I needed to change was empty_init=False. Since the default was to skip initialization and I wasn't loading a full checkpoint, the weights were initialized to garbage.

Since there's no way for us to know if the user will load a full checkpoint after initialization, I find it safer to not do this by default.

context: Lightning-AI/litgpt#193

That wasn't the intention, sorry. The plan was to do this automatically if we have fake tensor / support for materialization (torchdistx-style). In the meantime, it would be safer to do:

Suggested change

_EmptyInit(enabled=(empty_init is not False)) if _TORCH_GREATER_EQUAL_1_13 else nullcontext()

_EmptyInit(enabled=bool(empty_init) if _TORCH_GREATER_EQUAL_1_13 else nullcontext()

The comment above there

# TODO: Use the meta device and reset parameters after https://github.com/pytorch/pytorch/issues/90465

is what I ultimately wanted to achieve

Same. While it's a cool feature, having this default to true is very disruptive and hard to debug.

awaelchli and others added 7 commits May 7, 2023 05:17

correct changelog

35de489

undo

bedf2b5

deprecate

e265ee6

init

f5b8528

update

5c5e650

[pre-commit.ci] auto fixes from pre-commit.com hooks

536aa75

for more information, see https://pre-commit.ci

Merge branch 'master' into fabric/revert-17488

102d707

github-actions bot added app (removed) Generic label for Lightning App package ci Continuous Integration fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels May 14, 2023

awaelchli added 2 commits May 14, 2023 17:54

empty init

6493195

add to strategies

4910cb6

awaelchli force-pushed the fabric/empty-init branch from e915926 to 4910cb6 Compare May 14, 2023 15:54

github-actions bot removed pl Generic label for PyTorch Lightning package ci Continuous Integration app (removed) Generic label for Lightning App package labels May 14, 2023

pre-commit-ci bot and others added 13 commits May 14, 2023 15:56

[pre-commit.ci] auto fixes from pre-commit.com hooks

9c73a6f

for more information, see https://pre-commit.ci

documentation

7788890

update test

bbee65a

test ddp

4d719d8

pre-commit

a306e2f

test

a3a9cd5

Merge branch 'master' into fabric/revert-17488

4bceacf

dedup

869a181

other torch versions

00edba0

update test

d8d7e24

[pre-commit.ci] auto fixes from pre-commit.com hooks

189b085

for more information, see https://pre-commit.ci

fix

c5a8bce

Merge remote-tracking branch 'origin/fabric/revert-17488' into fabric…

7ef896b

…/revert-17488

awaelchli and others added 4 commits June 5, 2023 11:56

reviewer suggestion, conditional class definition

8f4f3c9

[pre-commit.ci] auto fixes from pre-commit.com hooks

764e722

for more information, see https://pre-commit.ci

remove redundant test

482664b

fix for torch < 1.13

055c251

carmocca reviewed Jun 5, 2023

View reviewed changes

awaelchli and others added 3 commits June 5, 2023 18:47

better test for init

dc77aee

[pre-commit.ci] auto fixes from pre-commit.com hooks

096b45f

for more information, see https://pre-commit.ci

wording

57fa3a1

carmocca approved these changes Jun 6, 2023

View reviewed changes

src/lightning/fabric/utilities/init.py Show resolved Hide resolved

mergify bot added has conflicts and removed ready PRs ready to be merged labels Jun 6, 2023

awaelchli added 2 commits June 7, 2023 01:29

flaky test

9daf298

Merge branch 'master' into fabric/empty-init-2

e4fdfa2

mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Jun 6, 2023

awaelchli and others added 3 commits June 7, 2023 10:17

wording

9464280

rename it

357a886

[pre-commit.ci] auto fixes from pre-commit.com hooks

689f1b7

for more information, see https://pre-commit.ci

awaelchli enabled auto-merge (squash) June 7, 2023 10:41

carmocca approved these changes Jun 7, 2023

View reviewed changes

src/lightning/fabric/CHANGELOG.md Outdated Show resolved Hide resolved

docs/source-fabric/api/fabric_methods.rst Outdated Show resolved Hide resolved

awaelchli and others added 3 commits June 7, 2023 10:45

Update src/lightning/fabric/CHANGELOG.md

a159d42

Co-authored-by: Carlos Mocholí <[email protected]>

Update docs/source-fabric/api/fabric_methods.rst

0c7fda9

Co-authored-by: Carlos Mocholí <[email protected]>

skip flaky fsdp test

4d71cdd

awaelchli merged commit 24a3115 into master Jun 7, 2023

awaelchli deleted the fabric/empty-init branch June 7, 2023 18:33

carmocca reviewed Jun 23, 2023

View reviewed changes

awaelchli mentioned this pull request Jun 23, 2023

Change default init_module behavior for FSDP #17905

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support empty weight initialization in `Fabric.init_module()` #17627

Support empty weight initialization in `Fabric.init_module()` #17627

awaelchli commented May 14, 2023 •

edited

Loading

codecov bot commented Jun 6, 2023 •

edited

Loading

carmocca left a comment

carmocca Jun 23, 2023 •

edited

Loading

awaelchli Jun 23, 2023

awaelchli Jun 23, 2023

robieta Jun 30, 2023

	_EmptyInit(enabled=(empty_init is not False)) if _TORCH_GREATER_EQUAL_1_13 else nullcontext()
	_EmptyInit(enabled=bool(empty_init) if _TORCH_GREATER_EQUAL_1_13 else nullcontext()

Support empty weight initialization in Fabric.init_module() #17627

Support empty weight initialization in Fabric.init_module() #17627

Conversation

awaelchli commented May 14, 2023 • edited Loading

What does this PR do?

codecov bot commented Jun 6, 2023 • edited Loading

Codecov Report

carmocca left a comment

Choose a reason for hiding this comment

carmocca Jun 23, 2023 • edited Loading

Choose a reason for hiding this comment

awaelchli Jun 23, 2023

Choose a reason for hiding this comment

awaelchli Jun 23, 2023

Choose a reason for hiding this comment

robieta Jun 30, 2023

Choose a reason for hiding this comment

Support empty weight initialization in `Fabric.init_module()` #17627

Support empty weight initialization in `Fabric.init_module()` #17627

awaelchli commented May 14, 2023 •

edited

Loading

codecov bot commented Jun 6, 2023 •

edited

Loading

carmocca Jun 23, 2023 •

edited

Loading