Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove automatic sharding support with Fabric.run or fabric.launch(fn) #17832

Merged
merged 7 commits into from
Jun 15, 2023

Conversation

carmocca
Copy link
Contributor

@carmocca carmocca commented Jun 15, 2023

What does this PR do?

When the pattern fabric.launch(fn) (required for XLA) is used with fabric.init_module() and FSDP, the following error appears:

Traceback (most recent call last):
  ...
  File "/home/carlos/lit-parrot/pretrain/openwebtext.py", line 87, in main
    with fabric.init_module():
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/carlos/lightning/src/lightning/fabric/fabric.py", line 640, in init_module
    with self._strategy.module_init_context(empty_init=empty_init):
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/carlos/lightning/src/lightning/fabric/strategies/fsdp.py", line 281, in module_init_context
    with empty_init_context, self.precision.init_context(), self.module_sharded_context():
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/carlos/lightning/src/lightning/fabric/strategies/fsdp.py", line 289, in module_sharded_context
    with enable_wrap(
  File "/usr/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/carlos/venv/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 271, in enable_wrap
    with _ConfigAutoWrap(**kwargs):
  File "/home/carlos/venv/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 435, in __enter__
    self.enable_autowrap_context(self.kwargs)
  File "/home/carlos/venv/lib/python3.8/site-packages/torch/distributed/fsdp/wrap.py", line 415, in enable_autowrap_context
    raise NotImplementedError(
NotImplementedError: You are already within an autowrap context and we currently do not supported nested autowrap.

Resolves #17462 (comment)
Fixes Lightning-AI/litgpt#154

I didn't find any mention of this functionality in the docs, so they shouldn't need an update.

cc @Borda @justusschock @awaelchli @carmocca

@carmocca carmocca added bug Something isn't working breaking change Includes a breaking change strategy: deepspeed fabric lightning.fabric.Fabric strategy: fsdp Fully Sharded Data Parallel labels Jun 15, 2023
@carmocca carmocca added this to the 2.1 milestone Jun 15, 2023
@carmocca carmocca requested a review from awaelchli as a code owner June 15, 2023 04:14
@carmocca carmocca self-assigned this Jun 15, 2023
@carmocca carmocca requested a review from justusschock as a code owner June 15, 2023 04:14
@github-actions
Copy link
Contributor

github-actions bot commented Jun 15, 2023

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow
Check ID Status
pl-cpu (macOS-11, lightning, 3.8, 1.11) success
pl-cpu (macOS-11, lightning, 3.9, 1.12) success
pl-cpu (macOS-11, lightning, 3.10, 1.13) success
pl-cpu (macOS-11, lightning, 3.10, 2.0) success
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11) success
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.12) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.0) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest) success
pl-cpu (windows-2022, lightning, 3.8, 1.11) success
pl-cpu (windows-2022, lightning, 3.9, 1.12) success
pl-cpu (windows-2022, lightning, 3.10, 1.13) success
pl-cpu (windows-2022, lightning, 3.10, 2.0) success
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest) success
pl-cpu (macOS-11, pytorch, 3.8, 1.13) success
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13) success
pl-cpu (windows-2022, pytorch, 3.8, 1.13) success

These checks are required after the changes to src/lightning/fabric/fabric.py.

🟢 pytorch_lightning: Azure GPU
Check ID Status
pytorch-lightning (GPUs) success

These checks are required after the changes to src/lightning/fabric/fabric.py.

🟢 pytorch_lightning: Benchmarks
Check ID Status
lightning.Benchmarks success

These checks are required after the changes to src/lightning/fabric/fabric.py.

🟢 fabric: Docs
Check ID Status
make-doctest (fabric) success
make-html (fabric) success

These checks are required after the changes to src/lightning/fabric/fabric.py.

🟢 lightning_fabric: CPU workflow
Check ID Status
fabric-cpu (macOS-11, lightning, 3.8, 1.11) success
fabric-cpu (macOS-11, lightning, 3.9, 1.12) success
fabric-cpu (macOS-11, lightning, 3.10, 1.13) success
fabric-cpu (macOS-11, lightning, 3.10, 2.0) success
fabric-cpu (macOS-11, lightning, 3.8, 1.11, oldest) success
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11) success
fabric-cpu (ubuntu-20.04, lightning, 3.9, 1.12) success
fabric-cpu (ubuntu-20.04, lightning, 3.10, 1.13) success
fabric-cpu (ubuntu-20.04, lightning, 3.10, 2.0) success
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest) success
fabric-cpu (windows-2022, lightning, 3.8, 1.11) success
fabric-cpu (windows-2022, lightning, 3.9, 1.12) success
fabric-cpu (windows-2022, lightning, 3.10, 1.13) success
fabric-cpu (windows-2022, lightning, 3.10, 2.0) success
fabric-cpu (windows-2022, lightning, 3.8, 1.11, oldest) success
fabric-cpu (macOS-11, fabric, 3.8, 1.13) success
fabric-cpu (ubuntu-20.04, fabric, 3.8, 1.13) success
fabric-cpu (windows-2022, fabric, 3.8, 1.13) success

These checks are required after the changes to src/lightning/fabric/fabric.py, tests/tests_fabric/helpers/models.py, tests/tests_fabric/plugins/precision/test_double_integration.py, tests/tests_fabric/strategies/test_fsdp_integration.py.

🟢 lightning_fabric: Azure GPU
Check ID Status
lightning-fabric (GPUs) success

These checks are required after the changes to src/lightning/fabric/fabric.py, tests/tests_fabric/helpers/models.py, tests/tests_fabric/plugins/precision/test_double_integration.py, tests/tests_fabric/strategies/test_fsdp_integration.py.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to src/lightning/fabric/fabric.py.

🟢 install
Check ID Status
install-pkg (ubuntu-22.04, app, 3.8) success
install-pkg (ubuntu-22.04, app, 3.10) success
install-pkg (ubuntu-22.04, fabric, 3.8) success
install-pkg (ubuntu-22.04, fabric, 3.10) success
install-pkg (ubuntu-22.04, pytorch, 3.8) success
install-pkg (ubuntu-22.04, pytorch, 3.10) success
install-pkg (ubuntu-22.04, lightning, 3.8) success
install-pkg (ubuntu-22.04, lightning, 3.10) success
install-pkg (ubuntu-22.04, notset, 3.8) success
install-pkg (ubuntu-22.04, notset, 3.10) success
install-pkg (macOS-12, app, 3.8) success
install-pkg (macOS-12, app, 3.10) success
install-pkg (macOS-12, fabric, 3.8) success
install-pkg (macOS-12, fabric, 3.10) success
install-pkg (macOS-12, pytorch, 3.8) success
install-pkg (macOS-12, pytorch, 3.10) success
install-pkg (macOS-12, lightning, 3.8) success
install-pkg (macOS-12, lightning, 3.10) success
install-pkg (macOS-12, notset, 3.8) success
install-pkg (macOS-12, notset, 3.10) success
install-pkg (windows-2022, app, 3.8) success
install-pkg (windows-2022, app, 3.10) success
install-pkg (windows-2022, fabric, 3.8) success
install-pkg (windows-2022, fabric, 3.10) success
install-pkg (windows-2022, pytorch, 3.8) success
install-pkg (windows-2022, pytorch, 3.10) success
install-pkg (windows-2022, lightning, 3.8) success
install-pkg (windows-2022, lightning, 3.10) success
install-pkg (windows-2022, notset, 3.8) success
install-pkg (windows-2022, notset, 3.10) success

These checks are required after the changes to src/lightning/fabric/fabric.py.

🟢 link-check
Check ID Status
check-md-links / markdown-link-check success

These checks are required after the changes to src/lightning/fabric/CHANGELOG.md.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@carmocca carmocca changed the title Remove sharding support with Fabric.run or using fabric.launch(fn) Remove automatic sharding support with Fabric.run or fabric.launch(fn) Jun 15, 2023
@mergify mergify bot added the ready PRs ready to be merged label Jun 15, 2023
@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Merging #17832 (4dd94d8) into master (577c494) will increase coverage by 2%.
The diff coverage is 100%.

❗ Current head 4dd94d8 differs from pull request most recent head dd23533. Consider uploading reports for the commit dd23533 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #17832      +/-   ##
==========================================
+ Coverage      59%      61%      +2%     
==========================================
  Files         839      420     -419     
  Lines       63788    31914   -31874     
==========================================
- Hits        37909    19465   -18444     
+ Misses      25879    12449   -13430     

@carmocca carmocca enabled auto-merge (squash) June 15, 2023 15:30
@carmocca carmocca merged commit f78db4c into master Jun 15, 2023
@carmocca carmocca deleted the carmocca/fabric-old-sharded-ctx branch June 15, 2023 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Includes a breaking change bug Something isn't working fabric lightning.fabric.Fabric ready PRs ready to be merged strategy: deepspeed strategy: fsdp Fully Sharded Data Parallel
Projects
None yet
3 participants