[TPU] Proper half-precision implementation for XLA #18213

awaelchli · 2023-08-01T21:24:10Z

What does this PR do?

Successful test run
https://github.com/Lightning-AI/lightning/actions/runs/5802318415/job/15728813744?pr=18213

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

cc @Borda @carmocca @JackCaoG @steventk-g @Liyang90 @justusschock @awaelchli

for more information, see https://pre-commit.ci

…o bugfix/xla-precision-plugin

src/lightning/fabric/connector.py

github-actions · 2023-08-04T12:35:20Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu (macOS-11, lightning, 3.8, 1.11)	success	✅
pl-cpu (macOS-11, lightning, 3.9, 1.12)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 1.13)	success	✅
pl-cpu (macOS-11, lightning, 3.10, 2.0)	success	✅
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.12)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.0)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11)	success	✅
pl-cpu (windows-2022, lightning, 3.9, 1.12)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 1.13)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 2.0)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest)	success	✅
pl-cpu (macOS-11, pytorch, 3.8, 1.13)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13)	success	✅
pl-cpu (windows-2022, pytorch, 3.8, 1.13)	success	✅

These checks are required after the changes to src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py, src/lightning/pytorch/_graveyard/tpu.py, src/lightning/pytorch/plugins/__init__.py, src/lightning/pytorch/plugins/precision/__init__.py, src/lightning/pytorch/plugins/precision/xla.py, src/lightning/pytorch/plugins/precision/xlabf16.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, tests/tests_pytorch/graveyard/test_tpu.py, tests/tests_pytorch/models/test_tpu.py, tests/tests_pytorch/plugins/precision/test_xla.py, tests/tests_pytorch/plugins/precision/test_xlabf16.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py.

🟢 pytorch_lightning: Azure GPU

Check ID	Status
[pytorch-lightning (GPUs) (testing Lightning	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=169505&view=logs&jobId=47e66f3c-897a-5428-da11-bf5c7745762e)	success
[pytorch-lightning (GPUs) (testing PyTorch	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=169505&view=logs&jobId=3f274fac-2e11-54ca-487e-194c91f3ae9f)	success

These checks are required after the changes to src/lightning/pytorch/_graveyard/tpu.py, src/lightning/pytorch/plugins/__init__.py, src/lightning/pytorch/plugins/precision/__init__.py, src/lightning/pytorch/plugins/precision/xla.py, src/lightning/pytorch/plugins/precision/xlabf16.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, tests/tests_pytorch/graveyard/test_tpu.py, tests/tests_pytorch/models/test_tpu.py, tests/tests_pytorch/plugins/precision/test_xla.py, tests/tests_pytorch/plugins/precision/test_xlabf16.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py, src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py.

🟢 pytorch_lightning: Benchmarks

Check ID	Status
lightning.Benchmarks	success	✅

These checks are required after the changes to src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py, src/lightning/pytorch/_graveyard/tpu.py, src/lightning/pytorch/plugins/__init__.py, src/lightning/pytorch/plugins/precision/__init__.py, src/lightning/pytorch/plugins/precision/xla.py, src/lightning/pytorch/plugins/precision/xlabf16.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py.

🟢 fabric: Docs

Check ID	Status
make-doctest (fabric)	success	✅
make-html (fabric)	success	✅

These checks are required after the changes to src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py.

🟢 pytorch_lightning: Docs

Check ID	Status
make-doctest (pytorch)	success	✅
make-html (pytorch)	success	✅

These checks are required after the changes to src/lightning/pytorch/_graveyard/tpu.py, src/lightning/pytorch/plugins/__init__.py, src/lightning/pytorch/plugins/precision/__init__.py, src/lightning/pytorch/plugins/precision/xla.py, src/lightning/pytorch/plugins/precision/xlabf16.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, docs/source-pytorch/accelerators/tpu_intermediate.rst.

🟢 lightning_fabric: CPU workflow

Check ID	Status
fabric-cpu (macOS-11, lightning, 3.8, 1.11)	success	✅
fabric-cpu (macOS-11, lightning, 3.9, 1.12)	success	✅
fabric-cpu (macOS-11, lightning, 3.10, 1.13)	success	✅
fabric-cpu (macOS-11, lightning, 3.10, 2.0)	success	✅
fabric-cpu (macOS-11, lightning, 3.8, 1.11, oldest)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.9, 1.12)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.10, 1.13)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.10, 2.0)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest)	success	✅
fabric-cpu (windows-2022, lightning, 3.8, 1.11)	success	✅
fabric-cpu (windows-2022, lightning, 3.9, 1.12)	success	✅
fabric-cpu (windows-2022, lightning, 3.10, 1.13)	success	✅
fabric-cpu (windows-2022, lightning, 3.10, 2.0)	success	✅
fabric-cpu (windows-2022, lightning, 3.8, 1.11, oldest)	success	✅
fabric-cpu (macOS-11, fabric, 3.8, 1.13)	success	✅
fabric-cpu (ubuntu-20.04, fabric, 3.8, 1.13)	success	✅
fabric-cpu (windows-2022, fabric, 3.8, 1.13)	success	✅

These checks are required after the changes to src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py, tests/tests_fabric/graveyard/test_tpu.py, tests/tests_fabric/plugins/precision/test_xla.py, tests/tests_fabric/plugins/precision/test_xlabf16.py, tests/tests_fabric/test_connector.py.

🟢 lightning_fabric: Azure GPU

Check ID	Status
[lightning-fabric (GPUs) (testing Fabric	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=169507&view=logs&jobId=3f274fac-2e11-54ca-487e-194c91f3ae9f)	success
[lightning-fabric (GPUs) (testing Lightning	latest)](https://dev.azure.com/Lightning-AI/72ab7ed8-b00f-4b6e-b131-3388f7ffafa7/_build/results?buildId=169507&view=logs&jobId=47e66f3c-897a-5428-da11-bf5c7745762e)	success

These checks are required after the changes to src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py, tests/tests_fabric/graveyard/test_tpu.py, tests/tests_fabric/plugins/precision/test_xla.py, tests/tests_fabric/plugins/precision/test_xlabf16.py, tests/tests_fabric/test_connector.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py, src/lightning/pytorch/_graveyard/tpu.py, src/lightning/pytorch/plugins/__init__.py, src/lightning/pytorch/plugins/precision/__init__.py, src/lightning/pytorch/plugins/precision/xla.py, src/lightning/pytorch/plugins/precision/xlabf16.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py.

🟢 install

Check ID	Status
install-pkg (ubuntu-22.04, app, 3.8)	success	✅
install-pkg (ubuntu-22.04, app, 3.10)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.8)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.10)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.8)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.10)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.8)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.10)	success	✅
install-pkg (ubuntu-22.04, notset, 3.8)	success	✅
install-pkg (ubuntu-22.04, notset, 3.10)	success	✅
install-pkg (macOS-12, app, 3.8)	success	✅
install-pkg (macOS-12, app, 3.10)	success	✅
install-pkg (macOS-12, fabric, 3.8)	success	✅
install-pkg (macOS-12, fabric, 3.10)	success	✅
install-pkg (macOS-12, pytorch, 3.8)	success	✅
install-pkg (macOS-12, pytorch, 3.10)	success	✅
install-pkg (macOS-12, lightning, 3.8)	success	✅
install-pkg (macOS-12, lightning, 3.10)	success	✅
install-pkg (macOS-12, notset, 3.8)	success	✅
install-pkg (macOS-12, notset, 3.10)	success	✅
install-pkg (windows-2022, app, 3.8)	success	✅
install-pkg (windows-2022, app, 3.10)	success	✅
install-pkg (windows-2022, fabric, 3.8)	success	✅
install-pkg (windows-2022, fabric, 3.10)	success	✅
install-pkg (windows-2022, pytorch, 3.8)	success	✅
install-pkg (windows-2022, pytorch, 3.10)	success	✅
install-pkg (windows-2022, lightning, 3.8)	success	✅
install-pkg (windows-2022, lightning, 3.10)	success	✅
install-pkg (windows-2022, notset, 3.8)	success	✅
install-pkg (windows-2022, notset, 3.10)	success	✅

These checks are required after the changes to src/lightning/fabric/_graveyard/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/__init__.py, src/lightning/fabric/plugins/precision/__init__.py, src/lightning/fabric/plugins/precision/xla.py, src/lightning/fabric/plugins/precision/xlabf16.py, src/lightning/pytorch/_graveyard/tpu.py, src/lightning/pytorch/plugins/__init__.py, src/lightning/pytorch/plugins/precision/__init__.py, src/lightning/pytorch/plugins/precision/xla.py, src/lightning/pytorch/plugins/precision/xlabf16.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py.

🟢 link-check

Check ID	Status
check-md-links / markdown-link-check	success	✅

These checks are required after the changes to src/lightning/pytorch/CHANGELOG.md.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

for more information, see https://pre-commit.ci

…o bugfix/xla-precision-plugin

src/lightning/pytorch/trainer/connectors/accelerator_connector.py

carmocca

In a follow-up we should add support for casting the model weights to half too.

src/lightning/pytorch/trainer/connectors/accelerator_connector.py

awaelchli · 2023-08-09T21:41:43Z

In a follow-up we should add support for casting the model weights to half too.

My interpretation was that this happens automatically as described here but for completeness, we can add the conversion anyway if it doesn't hurt.

for more information, see https://pre-commit.ci

carmocca · 2023-08-11T16:08:46Z

src/lightning/fabric/plugins/precision/xla.py

+        else:
+            self._desired_dtype = torch.float32
+
+    def convert_input(self, data: Any) -> Any:


Shouldn't this be removed too? I don't think the Trainer tests were failing due to a bug in the Trainer. Fabric should be impacted too with this change and the Fabric test coverage probably doesn't include such failure

Probably yes, i can add an integration test if there is not already to find out

awaelchli added 2 commits August 1, 2023 23:13

consolidate xla precision plugin

0e15c56

graveyard

9b7a6f7

awaelchli changed the title ~~consolidate xla precision plugin~~ Proper half-precision implementation for XLA Aug 1, 2023

github-actions bot added the fabric lightning.fabric.Fabric label Aug 1, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

c553b27

for more information, see https://pre-commit.ci

awaelchli mentioned this pull request Aug 1, 2023

XLA's 16-bit precision plugin selection is incorrect #18172

Closed

awaelchli added 4 commits August 1, 2023 23:31

Update test

6ce2c2a

Merge remote-tracking branch 'origin/bugfix/xla-precision-plugin' int…

e96b23a

…o bugfix/xla-precision-plugin

tests

1c26498

extend graveyard

8937ea2

awaelchli added accelerator: tpu Tensor Processing Unit strategy: xla bug Something isn't working feature Is an improvement or enhancement labels Aug 1, 2023

awaelchli commented Aug 2, 2023

View reviewed changes

src/lightning/fabric/connector.py Outdated Show resolved Hide resolved

awaelchli changed the title ~~Proper half-precision implementation for XLA~~ [TPU] Proper half-precision implementation for XLA Aug 4, 2023

awaelchli added 4 commits August 4, 2023 11:55

single

d26d5ae

Merge branch 'master' into bugfix/xla-precision-plugin

76fde06

include xlafsdp

e8ba67e

mypy

7ff03db

awaelchli marked this pull request as ready for review August 4, 2023 12:34

awaelchli requested review from carmocca and justusschock as code owners August 4, 2023 12:34

awaelchli changed the title ~~[TPU] Proper half-precision implementation for XLA~~ [TPU] Proper half-precision implementation for XLA (WIP) Aug 4, 2023

awaelchli added this to the 2.1 milestone Aug 4, 2023

bring changes to trainer

e0387b0

awaelchli requested a review from williamFalcon as a code owner August 4, 2023 13:14

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 4, 2023

tests

62d9b4c

pre-commit-ci bot and others added 4 commits August 4, 2023 13:40

[pre-commit.ci] auto fixes from pre-commit.com hooks

1c6f5c2

for more information, see https://pre-commit.ci

set default

2a7ae9e

Merge remote-tracking branch 'origin/bugfix/xla-precision-plugin' int…

74016f2

…o bugfix/xla-precision-plugin

fix

70a5e63

awaelchli commented Aug 4, 2023

View reviewed changes

src/lightning/pytorch/trainer/connectors/accelerator_connector.py Show resolved Hide resolved

awaelchli added 3 commits August 4, 2023 23:59

Merge branch 'master' into bugfix/xla-precision-plugin

5f7952f

Merge branch 'master' into bugfix/xla-precision-plugin

c30f376

update doc

3ce1fd0

awaelchli changed the title ~~[TPU] Proper half-precision implementation for XLA (WIP)~~ [TPU] Proper half-precision implementation for XLA Aug 8, 2023

changelog

3d99526

awaelchli requested review from edenlightning and lantiga as code owners August 8, 2023 22:27

mergify bot added the has conflicts label Aug 9, 2023

carmocca approved these changes Aug 9, 2023

View reviewed changes

src/lightning/pytorch/trainer/connectors/accelerator_connector.py Show resolved Hide resolved

Merge branch 'master' into bugfix/xla-precision-plugin

28ce0fe

mergify bot removed the has conflicts label Aug 9, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

a2b9c6b

for more information, see https://pre-commit.ci

mergify bot added the has conflicts label Aug 10, 2023

awaelchli added 2 commits August 11, 2023 13:53

Merge branch 'master' into bugfix/xla-precision-plugin

66bd979

debug xla

bea829a

mergify bot removed the has conflicts label Aug 11, 2023

pre-commit-ci bot and others added 2 commits August 11, 2023 11:59

[pre-commit.ci] auto fixes from pre-commit.com hooks

5f5f644

for more information, see https://pre-commit.ci

remove input and output conversion

6f636e0

Borda approved these changes Aug 11, 2023

View reviewed changes

mergify bot added the ready PRs ready to be merged label Aug 11, 2023

awaelchli merged commit 7fe8756 into master Aug 11, 2023

awaelchli deleted the bugfix/xla-precision-plugin branch August 11, 2023 15:37

carmocca reviewed Aug 11, 2023

View reviewed changes

awaelchli mentioned this pull request Aug 11, 2023

Integration tests for XLA precision #18286

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TPU] Proper half-precision implementation for XLA #18213

[TPU] Proper half-precision implementation for XLA #18213

awaelchli commented Aug 1, 2023 •

edited

Loading

github-actions bot commented Aug 4, 2023 •

edited

Loading

carmocca left a comment

awaelchli commented Aug 9, 2023

carmocca Aug 11, 2023

awaelchli Aug 11, 2023

[TPU] Proper half-precision implementation for XLA #18213

[TPU] Proper half-precision implementation for XLA #18213

Conversation

awaelchli commented Aug 1, 2023 • edited Loading

What does this PR do?

PR review

github-actions bot commented Aug 4, 2023 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

carmocca left a comment

Choose a reason for hiding this comment

awaelchli commented Aug 9, 2023

carmocca Aug 11, 2023

Choose a reason for hiding this comment

awaelchli Aug 11, 2023

Choose a reason for hiding this comment

awaelchli commented Aug 1, 2023 •

edited

Loading

github-actions bot commented Aug 4, 2023 •

edited

Loading