True half-precision support in Fabric #17287

awaelchli · 2023-04-05T22:50:19Z

Adds support for half-precision training and inference with Fabric:

fabric = Fabric(precision="16-true")
fabric = Fabric(precision="bf16-true")

model = ...
model = fabric.setup(model)  # model gets converted to the half type

To save peak memory usage, you can init the model directly with the desired precision like so:

fabric = Fabric(precision="bf16-true")

with fabric.init_module():
    model = ...

print(model.dtype)

For training, there will most likely be instabilities when choosing precision="16-true", so mixed precision training is still preferred in many cases. This PR offers more flexibility for inference, especially with bfloat16. See Lit-LLaMA for example: https://github.com/Lightning-AI/lit-llama

cc @Borda @carmocca @justusschock @awaelchli

for more information, see https://pre-commit.ci

…ic/half-precision

for more information, see https://pre-commit.ci

src/lightning/fabric/connector.py

github-actions · 2023-04-26T16:48:37Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu (macOS-11, lightning, 3.8, 1.11)	success	✅
pl-cpu (macOS-11, lightning, 3.9, 1.12)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 1.13)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 2.0)	success	✅
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.12)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.0)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11)	success	✅
pl-cpu (windows-2022, lightning, 3.9, 1.12)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 1.13)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 2.0)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (macOS-11, pytorch, 3.8, 1.13)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13)	success	✅
pl-cpu (windows-2022, pytorch, 3.8, 1.13)	success	✅

These checks are required after the changes to src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/half.py, src/lightning/fabric/plugins/precision/precision.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py.

🟢 pytorch_lightning: Azure GPU

Check ID	Status
pytorch-lightning (GPUs)	success	✅

These checks are required after the changes to src/lightning/pytorch/trainer/connectors/accelerator_connector.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/half.py, src/lightning/fabric/plugins/precision/precision.py.

🟢 fabric: Docs

Check ID	Status
make-doctest (fabric)	success	✅
make-html (fabric)	success	✅

These checks are required after the changes to src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/half.py, src/lightning/fabric/plugins/precision/precision.py, docs/source-fabric/api/fabric_args.rst, docs/source-fabric/fundamentals/precision.rst.

🟢 pytorch_lightning: Docs

Check ID	Status
make-doctest (pytorch)	success	✅
make-html (pytorch)	success	✅

These checks are required after the changes to src/lightning/pytorch/trainer/connectors/accelerator_connector.py.

🟢 lightning_fabric: CPU workflow

Check ID	Status
fabric-cpu (macOS-11, lightning, 3.8, 1.11)	success	✅
fabric-cpu (macOS-11, lightning, 3.9, 1.12)	success	✅
fabric-cpu (macOS-11, lightning, 3.10, 1.13)	success	✅
fabric-cpu (macOS-11, lightning, 3.10, 2.0)	success	✅
fabric-cpu (macOS-11, lightning, 3.8, 1.11, oldest)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.9, 1.12)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.10, 1.13)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.10, 2.0)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest)	success	✅
fabric-cpu (windows-2022, lightning, 3.8, 1.11)	success	✅
fabric-cpu (windows-2022, lightning, 3.9, 1.12)	success	✅
fabric-cpu (windows-2022, lightning, 3.10, 1.13)	success	✅
fabric-cpu (windows-2022, lightning, 3.10, 2.0)	success	✅
fabric-cpu (windows-2022, lightning, 3.8, 1.11, oldest)	success	✅
fabric-cpu (macOS-11, fabric, 3.8, 1.13)	success	✅
fabric-cpu (ubuntu-20.04, fabric, 3.8, 1.13)	success	✅
fabric-cpu (windows-2022, fabric, 3.8, 1.13)	success	✅

These checks are required after the changes to src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/half.py, src/lightning/fabric/plugins/precision/precision.py, tests/tests_fabric/plugins/precision/test_double.py, tests/tests_fabric/plugins/precision/test_half.py, tests/tests_fabric/strategies/test_ddp.py, tests/tests_fabric/strategies/test_fsdp_integration.py, tests/tests_fabric/strategies/test_single_device.py, tests/tests_fabric/test_connector.py.

🟢 lightning_fabric: Azure GPU

Check ID	Status
lightning-fabric (GPUs)	success	✅

These checks are required after the changes to src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/half.py, src/lightning/fabric/plugins/precision/precision.py, tests/tests_fabric/plugins/precision/test_double.py, tests/tests_fabric/plugins/precision/test_half.py, tests/tests_fabric/strategies/test_ddp.py, tests/tests_fabric/strategies/test_fsdp_integration.py, tests/tests_fabric/strategies/test_single_device.py, tests/tests_fabric/test_connector.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/half.py, src/lightning/fabric/plugins/precision/precision.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py.

🟢 install

Check ID	Status
install-pkg (ubuntu-22.04, app, 3.8)	success	✅
install-pkg (ubuntu-22.04, app, 3.10)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.8)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.10)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.8)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.10)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.8)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.10)	success	✅
install-pkg (ubuntu-22.04, notset, 3.8)	success	✅
install-pkg (ubuntu-22.04, notset, 3.10)	success	✅
install-pkg (macOS-12, app, 3.8)	success	✅
install-pkg (macOS-12, app, 3.10)	success	✅
install-pkg (macOS-12, fabric, 3.8)	success	✅
install-pkg (macOS-12, fabric, 3.10)	success	✅
install-pkg (macOS-12, pytorch, 3.8)	success	✅
install-pkg (macOS-12, pytorch, 3.10)	success	✅
install-pkg (macOS-12, lightning, 3.8)	success	✅
install-pkg (macOS-12, lightning, 3.10)	success	✅
install-pkg (macOS-12, notset, 3.8)	success	✅
install-pkg (macOS-12, notset, 3.10)	success	✅
install-pkg (windows-2022, app, 3.8)	success	✅
install-pkg (windows-2022, app, 3.10)	success	✅
install-pkg (windows-2022, fabric, 3.8)	success	✅
install-pkg (windows-2022, fabric, 3.10)	success	✅
install-pkg (windows-2022, pytorch, 3.8)	success	✅
install-pkg (windows-2022, pytorch, 3.10)	success	✅
install-pkg (windows-2022, lightning, 3.8)	success	✅
install-pkg (windows-2022, lightning, 3.10)	success	✅
install-pkg (windows-2022, notset, 3.8)	success	✅
install-pkg (windows-2022, notset, 3.10)	success	✅

These checks are required after the changes to src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/half.py, src/lightning/fabric/plugins/precision/precision.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py.

🟢 link-check

Check ID	Status
check-md-links / markdown-link-check	success	✅

These checks are required after the changes to src/lightning/fabric/CHANGELOG.md.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

src/lightning/fabric/plugins/precision/half.py

docs/source-fabric/fundamentals/precision.rst

src/lightning/fabric/plugins/precision/half.py

tests/tests_fabric/strategies/test_ddp.py

src/lightning/fabric/connector.py

Co-authored-by: Carlos Mocholí <[email protected]>

awaelchli added 2 commits April 6, 2023 00:34

model instantiation

be48ffa

strategy implementations

5d68d2f

awaelchli added fabric lightning.fabric.Fabric feature Is an improvement or enhancement labels Apr 5, 2023

awaelchli added this to the 2.1 milestone Apr 5, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

df5c9ed

for more information, see https://pre-commit.ci

awaelchli changed the title ~~WIP: Improved model initialization API~~ WIP: Improved model initialization API for Fabric Apr 5, 2023

awaelchli and others added 9 commits April 11, 2023 00:12

tests

1779460

[pre-commit.ci] auto fixes from pre-commit.com hooks

b92d056

for more information, see https://pre-commit.ci

connect precision

de21ae5

[pre-commit.ci] auto fixes from pre-commit.com hooks

3153e0c

for more information, see https://pre-commit.ci

tests

70c25df

ddp

3fb0c50

[pre-commit.ci] auto fixes from pre-commit.com hooks

ccc9b8d

for more information, see https://pre-commit.ci

update

d93b7c9

Merge remote-tracking branch 'origin/fabric/half-precision' into fabr…

cb94829

…ic/half-precision

This comment was marked as off-topic.

Sign in to view

github-actions bot added the pl Generic label for PyTorch Lightning package label Apr 18, 2023

pre-commit-ci bot and others added 5 commits April 18, 2023 00:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

5f57343

for more information, see https://pre-commit.ci

ddp test

2fd241a

ddp test

9f80ea3

[pre-commit.ci] auto fixes from pre-commit.com hooks

5a72dca

for more information, see https://pre-commit.ci

reset

e1e1852

github-actions bot removed the pl Generic label for PyTorch Lightning package label Apr 18, 2023

awaelchli added 2 commits April 18, 2023 02:52

notebook

4c07eae

notebook

9b0f0de

github-actions bot added the pl Generic label for PyTorch Lightning package label Apr 18, 2023

notebook

bb2321f

github-actions bot removed the pl Generic label for PyTorch Lightning package label Apr 18, 2023

add test

6a14bdd

ignore weirdo type error

a81152a

awaelchli commented Apr 26, 2023

View reviewed changes

src/lightning/fabric/connector.py Show resolved Hide resolved

awaelchli added the precision: amp Automatic Mixed Precision label Apr 26, 2023

awaelchli marked this pull request as ready for review April 26, 2023 16:47

awaelchli requested review from carmocca, justusschock, edenlightning, lantiga and Borda as code owners April 26, 2023 16:47

Borda reviewed Apr 26, 2023

View reviewed changes

src/lightning/fabric/plugins/precision/half.py Outdated Show resolved Hide resolved

Borda approved these changes Apr 26, 2023

View reviewed changes

carmocca reviewed Apr 26, 2023

View reviewed changes

awaelchli and others added 5 commits April 27, 2023 04:03

Update docs/source-fabric/fundamentals/precision.rst

043911f

Co-authored-by: Carlos Mocholí <[email protected]>

Update src/lightning/fabric/plugins/precision/half.py

898ee2a

Co-authored-by: Carlos Mocholí <[email protected]>

Update src/lightning/fabric/plugins/precision/half.py

9e581fa

Co-authored-by: Carlos Mocholí <[email protected]>

update default

9cbd557

mypy

f8b02e1

awaelchli requested a review from williamFalcon as a code owner April 27, 2023 08:43

github-actions bot added the pl Generic label for PyTorch Lightning package label Apr 27, 2023

Borda requested a review from carmocca April 27, 2023 08:53

carmocca approved these changes Apr 27, 2023

View reviewed changes

mergify bot added the ready PRs ready to be merged label Apr 27, 2023

Merge branch 'master' into fabric/half-precision

3fac938

awaelchli enabled auto-merge (squash) April 27, 2023 12:31

awaelchli merged commit 614dcdf into master Apr 27, 2023

awaelchli deleted the fabric/half-precision branch April 27, 2023 12:37

awaelchli mentioned this pull request May 5, 2023

Support true 16-bit precision with deepspeed #17576

Merged

awaelchli mentioned this pull request May 17, 2023

Update FSDPPrecision to support 16-true precision setting #17657

Closed

awaelchli mentioned this pull request Jul 30, 2023

True half-precision support in Trainer #18193

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

True half-precision support in Fabric #17287

True half-precision support in Fabric #17287

awaelchli commented Apr 5, 2023 •

edited by github-actions bot

Loading

This comment was marked as off-topic.

github-actions bot commented Apr 26, 2023 •

edited

Loading

True half-precision support in Fabric #17287

True half-precision support in Fabric #17287

Conversation

awaelchli commented Apr 5, 2023 • edited by github-actions bot Loading

This comment was marked as off-topic.

github-actions bot commented Apr 26, 2023 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

awaelchli commented Apr 5, 2023 •

edited by github-actions bot

Loading

github-actions bot commented Apr 26, 2023 •

edited

Loading