[TPU] For XLA Strategy, added function arg to control `broadcast_master_param()` #17522

gkroiz · 2023-04-28T17:44:06Z

What does this PR do?

Adds boolean sync_module_states to XLAStrategy to control whether or not to call broadcast_master_params(). Broadcasting master parameters is not always needed. For example, when initializing random weights, if the seed is defined and the same on all devices, then broadcast_master_params() is not needed.

The motivation for this change is that broadcast_master_params() adds additional time to training and if not needed should be skipped.

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Added function argument to XLA strategy controlling whether to call broadcast_master_params(). Broadcasting master parameters is not always needed. For example, when initializing random weights, if the seed is defined and the same on all devices, then broadcast_master_params() is not needed.

src/lightning/fabric/strategies/xla.py

carmocca

We should add a test like this one: https://github.com/Lightning-AI/lightning/pull/17370/files#diff-664848c0cba1d81c012e047af0b592829a44870a55e75c24884569e81e3e62beR41

for more information, see https://pre-commit.ci

gkroiz · 2023-05-03T22:59:34Z

We should add a test like this one: https://github.com/Lightning-AI/lightning/pull/17370/files#diff-664848c0cba1d81c012e047af0b592829a44870a55e75c24884569e81e3e62beR41

I added a test for the Fabric XLAStrategy.

carmocca

LGTM! Last bit is to add a CHANGELOG entry for both Fabric and PyTorch

tests/tests_fabric/strategies/test_xla.py

for more information, see https://pre-commit.ci

src/lightning/fabric/strategies/xla.py

gkroiz requested review from awaelchli, carmocca, justusschock and williamFalcon as code owners April 28, 2023 17:44

github-actions bot added fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Apr 28, 2023

carmocca reviewed May 3, 2023

View reviewed changes

src/lightning/fabric/strategies/xla.py Outdated Show resolved Hide resolved

carmocca reviewed May 3, 2023

View reviewed changes

gkroiz and others added 2 commits May 3, 2023 22:33

Added Fabric test for broadcast_master_params()

c45d2d9

[pre-commit.ci] auto fixes from pre-commit.com hooks

147804f

for more information, see https://pre-commit.ci

carmocca approved these changes May 4, 2023

View reviewed changes

tests/tests_fabric/strategies/test_xla.py Outdated Show resolved Hide resolved

tests/tests_fabric/strategies/test_xla.py Outdated Show resolved Hide resolved

Borda approved these changes May 4, 2023

View reviewed changes

mergify bot added the ready PRs ready to be merged label May 4, 2023

gkroiz and others added 2 commits May 4, 2023 07:31

Fixed seed_everything() usage

f3ba3a0

[pre-commit.ci] auto fixes from pre-commit.com hooks

c65d56f

for more information, see https://pre-commit.ci

awaelchli reviewed May 4, 2023

View reviewed changes

src/lightning/fabric/strategies/xla.py Outdated Show resolved Hide resolved

carmocca and others added 3 commits May 4, 2023 19:40

CHANGELOG and test fix

a0a6d00

typo

3bab2c6

Merge branch 'master' into tpu_broadcast_params

1313fb1

justusschock approved these changes May 5, 2023

View reviewed changes

Borda and others added 2 commits May 5, 2023 19:04

Merge branch 'master' into tpu_broadcast_params

cd4fab1

Changed broadcast_master_params to sync_module_states

32a1729

gkroiz changed the title ~~[TPU] For XLA Strategy, added function arg to control broadcast_master_params()~~ [TPU] For XLA Strategy, added function arg to control broadcast_master_param() May 5, 2023

carmocca enabled auto-merge (squash) May 5, 2023 17:38

carmocca merged commit 8e6f24b into Lightning-AI:master May 5, 2023

This was referenced May 5, 2023

Added strategy options to config + general cleanup pytorch-tpu/stable-diffusion#2

Merged

Updates to Lightning SD test scripts GoogleCloudPlatform/ml-testing-accelerators#892

Merged

gkroiz deleted the tpu_broadcast_params branch June 13, 2023 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TPU] For XLA Strategy, added function arg to control `broadcast_master_param()` #17522

[TPU] For XLA Strategy, added function arg to control `broadcast_master_param()` #17522

gkroiz commented Apr 28, 2023 •

edited

Loading

carmocca left a comment

gkroiz commented May 3, 2023

carmocca left a comment

[TPU] For XLA Strategy, added function arg to control broadcast_master_param() #17522

[TPU] For XLA Strategy, added function arg to control broadcast_master_param() #17522

Conversation

gkroiz commented Apr 28, 2023 • edited Loading

What does this PR do?

PR review

carmocca left a comment

Choose a reason for hiding this comment

gkroiz commented May 3, 2023

carmocca left a comment

Choose a reason for hiding this comment

[TPU] For XLA Strategy, added function arg to control `broadcast_master_param()` #17522

[TPU] For XLA Strategy, added function arg to control `broadcast_master_param()` #17522

gkroiz commented Apr 28, 2023 •

edited

Loading