Use fsdp module to initialize precision scalar for fsdp native #14092

rohitgr7 · 2022-08-08T15:04:22Z

What does this PR do?

Fairscale shouldn't be a requirement when using fsdp_native, so we should rely on fsdp package from torch itself.

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

cc @Borda @carmocca @justusschock @awaelchli @akihironitta @rohitgr7

docs/source-pytorch/api_references.rst

carmocca

The fix looks good.

The inheritance hierarchy calls to be refactored here, but the suggested patch seems like the path of least resistance for a bug-fix.
I wonder if the introduced plugin should be marked as protected for that reason.

Also, an orthogonal question is, should we deprecate fairscale's version soon? Or is it still going to be used as a cutting-edge testing ground? Do you have any suggestion in this regard @SeanNaren?

src/pytorch_lightning/CHANGELOG.md

rohitgr7 · 2022-08-09T17:03:36Z

I wonder if the introduced plugin should be marked as protected for that reason.

I don't think so since users should have access to all kinds of plugins we use within strategies for customization.

Co-authored-by: Carlos Mocholí <[email protected]>

awaelchli

Retrospectively, we should have named the fsdp strategy FairscaleFSDP or something like that. Normally, everything else we have always implies that it is "native" anyway.

src/pytorch_lightning/CHANGELOG.md

Co-authored-by: Laverne Henderson <[email protected]>

Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Laverne Henderson <[email protected]> Co-authored-by: Rohit Gupta <[email protected]>

* update version and changelog for 1.7.2 release * Reset all results on epoch end (#14061) Co-authored-by: Carlos Mocholí <[email protected]> * Skip ddp fork tests on windows (#14121) * Fix device placement when `.cuda()` called without specifying index (#14128) * Convert subprocess test to standalone test (#14101) * Fix entry point test for Python 3.10 (#14154) * Fix flaky test caused by weak reference (#14157) * Fix saving hyperparameters in a composition where parent is not a LM or LDM (#14151) Co-authored-by: Rohit Gupta <[email protected]> * Remove DeepSpeed version restriction from Lite (#13967) * Configure the check-group app (#14165) Co-authored-by: Jirka <[email protected]> * Update onnxruntime requirement from <=1.12.0 to <1.13.0 in /requirements (#14083) Updates the requirements on [onnxruntime](https://github.com/microsoft/onnxruntime) to permit the latest version. - [Release notes](https://github.com/microsoft/onnxruntime/releases) - [Changelog](https://github.com/microsoft/onnxruntime/blob/master/docs/ReleaseManagement.md) - [Commits](microsoft/onnxruntime@v0.1.4...v1.12.1) --- updated-dependencies: - dependency-name: onnxruntime dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update gcsfs requirement from <2022.6.0,>=2021.5.0 to >=2021.5.0,<2022.8.0 in /requirements (#14079) Update gcsfs requirement in /requirements Updates the requirements on [gcsfs](https://github.com/fsspec/gcsfs) to permit the latest version. - [Release notes](https://github.com/fsspec/gcsfs/releases) - [Commits](fsspec/gcsfs@2021.05.0...2022.7.1) --- updated-dependencies: - dependency-name: gcsfs dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix a bug that caused spurious `AttributeError` when multiple `DataLoader` classes are imported (#14117) fix * CI: Replace `_` of in GHA workflow filenames with `-` (#13917) * Rename workflow files * Update docs * Fix azure badges * Update the main readme * bad rebase * Update doc * CI: Update Windows version from 2019 to 2022 (#14129) Update windows * CI/CD: Add CUDA version to docker image tags (#13831) * append cuda version to tags * revertme: push to hub * Update docker readme * Build base-conda-py3.9-torch1.12-cuda11.3.1 * Use new images in conda tests * revertme: push to hub * Revert "revertme: push to hub" This reverts commit 0f7d534. * Revert "revertme: push to hub" This reverts commit 46a05fc. * Run conda if workflow edited * Run gpu testing if workflow edited * Use new tags in release/Dockerfile * Build base-cuda and PL release images with all combinations * Update release docker * Update conda from py3.9-torch1.12 to py3.10-torch.1.12 * Fix ubuntu version * Revert conda * revertme: push to hub * Don't build Python 3.10 for now... * Fix pl release builder * updating version contribute to the error? docker/buildx#456 * Update actions' versions * Update slack user to notify * Don't use 11.6.0 to avoid bagua incompatibility * Don't use 11.1, and use 11.1.1 * Update .github/workflows/ci-pytorch_test-conda.yml Co-authored-by: Luca Medeiros <[email protected]> * Update trigger * Ignore artfacts from tutorials * Trim docker images to distribute * Add an image for tutorials * Update conda image 3.8x1.10 * Try different conda variants * No need to set cuda for conda jobs * Update who to notify ipu failure * Don't push * update filenaem Co-authored-by: Luca Medeiros <[email protected]> * Avoid entry_points deprecation warning (#14052) Co-authored-by: Adam J. Stewart <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> * Configure the check-group app (#14165) Co-authored-by: Jirka <[email protected]> * Profile batch transfer and gradient clipping hooks (#14069) Co-authored-by: Rohit Gupta <[email protected]> * Avoid false positive warning about using `sync_dist` when using torchmetrics (#14143) Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> * Avoid raising the sampler warning if num_replicas=1 (#14097) Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> Co-authored-by: otaj <[email protected]> * Remove skipping logic in favor of path filtering (#14170) * Support checkpoint save and load with Stochastic Weight Averaging (#9938) Co-authored-by: thomas chaton <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Carlos Mocholi <[email protected]> Co-authored-by: Kushashwa Ravi Shrimali <[email protected]> Co-authored-by: Jirka <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> * Use fsdp module to initialize precision scalar for fsdp native (#14092) Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Laverne Henderson <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> * add more issues types (#14174) * add more issues types * Update .github/ISSUE_TEMPLATE/config.yml Co-authored-by: Mansy <[email protected]> * typo Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Kaushik B <[email protected]> Co-authored-by: Mansy <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Laverne Henderson <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> * CI: clean building docs (#14216) * CI: clean building docs * group * . * CI: docker focus on PL only (#14246) * CI: docker focus on PL only * group * Allowed setting attributes on `DataLoader` and `BatchSampler` when instantiated inside `*_dataloader` hooks (#14212) Co-authored-by: otaj <[email protected]> * Revert "Remove skipping logic in favor of path filtering (#14170)" (#14244) * Update defaults for WandbLogger's run name and project name (#14145) Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Rohit Gupta <[email protected]> Co-authored-by: Jirka <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Luca Medeiros <[email protected]> Co-authored-by: Adam J. Stewart <[email protected]> Co-authored-by: otaj <[email protected]> Co-authored-by: Adam Reeve <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kushashwa Ravi Shrimali <[email protected]> Co-authored-by: Laverne Henderson <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Kaushik B <[email protected]> Co-authored-by: Mansy <[email protected]>

rohitgr7 added bug Something isn't working precision: amp Automatic Mixed Precision strategy: fsdp Fully Sharded Data Parallel labels Aug 8, 2022

rohitgr7 added this to the pl:1.7.x milestone Aug 8, 2022

Use fsdp module to initialize precision scalar for fsdp native

655c60e

rohitgr7 force-pushed the bug/fsdp_native_precision branch from 3421b92 to 655c60e Compare August 8, 2022 15:08

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 8, 2022

rohitgr7 added 2 commits August 8, 2022 11:10

chlog

204c72d

missing import

63d294d

rohitgr7 self-assigned this Aug 8, 2022

standalone

6844758

rohitgr7 commented Aug 8, 2022

View reviewed changes

docs/source-pytorch/api_references.rst Show resolved Hide resolved

rohitgr7 marked this pull request as ready for review August 8, 2022 18:43

rohitgr7 requested review from tchaton, carmocca, Borda, awaelchli, justusschock, kaushikb11, edenlightning, Felonious-Spellfire and williamFalcon as code owners August 8, 2022 18:43

mergify bot added the has conflicts label Aug 9, 2022

carmocca approved these changes Aug 9, 2022

View reviewed changes

src/pytorch_lightning/CHANGELOG.md Outdated Show resolved Hide resolved

src/pytorch_lightning/CHANGELOG.md Outdated Show resolved Hide resolved

rohitgr7 and others added 3 commits August 9, 2022 22:36

Update src/pytorch_lightning/CHANGELOG.md

f43a0b1

Co-authored-by: Carlos Mocholí <[email protected]>

Update src/pytorch_lightning/CHANGELOG.md

f4b2792

Co-authored-by: Carlos Mocholí <[email protected]>

Merge branch 'master' into bug/fsdp_native_precision

d619c3a

mergify bot added has conflicts and removed has conflicts labels Aug 9, 2022

kaushikb11 approved these changes Aug 12, 2022

View reviewed changes

Borda approved these changes Aug 12, 2022

View reviewed changes

awaelchli approved these changes Aug 12, 2022

View reviewed changes

rohitgr7 mentioned this pull request Aug 12, 2022

[RFC] Rename the fsdp strategy to FairscaleFSDP since we have fsdp native now. #14186

Closed

Merge branch 'master' into bug/fsdp_native_precision

86e4a7c

mergify bot added ready PRs ready to be merged and removed has conflicts ready PRs ready to be merged labels Aug 12, 2022

Felonious-Spellfire reviewed Aug 12, 2022

View reviewed changes

src/pytorch_lightning/CHANGELOG.md Outdated Show resolved Hide resolved

Update src/pytorch_lightning/CHANGELOG.md

d2830dc

Co-authored-by: Laverne Henderson <[email protected]>

rohitgr7 enabled auto-merge (squash) August 13, 2022 07:02

rohitgr7 merged commit 48c23e5 into master Aug 13, 2022

rohitgr7 deleted the bug/fsdp_native_precision branch August 13, 2022 07:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fsdp module to initialize precision scalar for fsdp native #14092

Use fsdp module to initialize precision scalar for fsdp native #14092

rohitgr7 commented Aug 8, 2022 •

edited by github-actions bot

Loading

carmocca left a comment

rohitgr7 commented Aug 9, 2022

awaelchli left a comment

Use fsdp module to initialize precision scalar for fsdp native #14092

Use fsdp module to initialize precision scalar for fsdp native #14092

Conversation

rohitgr7 commented Aug 8, 2022 • edited by github-actions bot Loading

What does this PR do?

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

PR review

Did you have fun?

carmocca left a comment

Choose a reason for hiding this comment

rohitgr7 commented Aug 9, 2022

awaelchli left a comment

Choose a reason for hiding this comment

rohitgr7 commented Aug 8, 2022 •

edited by github-actions bot

Loading