-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better check for programmatic lightningignore #16080
Conversation
⚡ Required checks status: All passing 🟢Groups summary🟢 lightning_app: Tests workflow
These checks are required after the changes to 🟢 lightning_app: Examples
These checks are required after the changes to 🟢 lightning_app: Azure
These checks are required after the changes to 🟢 lightning_app: Docs
These checks are required after the changes to 🟢 mypy
These checks are required after the changes to 🟢 installThese checks are required after the changes to Thank you for your contribution! 💜
|
06fe820
to
3d4ac6d
Compare
1e1417b
to
2e080e7
Compare
2e080e7
to
da95d70
Compare
da95d70
to
4b4323a
Compare
4b4323a
to
c0dc817
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit b1ce263)
* Add function to remove checkpoint to allow override for extended classes (#16067) (cherry picked from commit 10cc677) * minor fix: indent spaces in comment-out (#16076) (cherry picked from commit 385e5e2) * ci: print existing candidates (#16077) (cherry picked from commit 9e89aed) * [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082) (cherry picked from commit 5f7403e) * Better check for programmatic lightningignore (#16080) Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit b1ce263) * [App] Removing single quote (#16079) (cherry picked from commit 005b6f2) * version 1.8.5.post0 * skip example test that relies on unreleased lite code The examples use LightningLite syntax without the run method, which is only available in master * fix can't instantiate abstract class [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci fix * skip bagua Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Qiushi Pan <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: awaelchli <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
* Remove the deprecated profiler imports (#16059) * Revert "Load app before setting LIGHTNING_DISPATCHED" (#16064) Revert "Load app before setting LIGHTNING_DISPATCHED (#16057)" This reverts commit 8d3339a. * [App] Hot fix: Resolve detection of python debugger (#16068) Co-authored-by: thomas <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> * Load the app before setting `LIGHTNING_DISPATCHED` (#16071) * fix(cloud): detect and ignore venv (#16056) Co-authored-by: Ethan Harris <[email protected]> * Add function to remove checkpoint to allow override for extended classes (#16067) * Drop FairScale sharded parity tests (#16069) * minor fix: indent spaces in comment-out (#16076) * ci: print existing candidates (#16077) * [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082) * Better check for programmatic lightningignore (#16080) Co-authored-by: Jirka Borovec <[email protected]> * [App] Removing single quote (#16079) * [App] PoC: Add support for Request (#16047) * Have checkgroup pull the latest runs (#16033) * Update Multinode Warning (#16091) * [App] Serve datatypes with better client code (#16018) * docs: add PT version (#16010) * docs: add PT version * stable Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> * add 1.13.1 to adjust versions (#16099) * Remove redundant `find_unused_parameters=False` in Lite (#16026) * [App] Add display name property to the work (#16095) Co-authored-by: thomas <[email protected]> * Fix detection of whether app is running in cloud (#16045) * [App] Add work.delete (#16103) Co-authored-by: thomas <[email protected]> * [App] Improve the autoscaler UI (#16063) [App] Improve the autoscaler UI (#16063) * Re-enable Lite CLI on Windows + PyTorch 1.13 (#15645) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <[email protected]> * [App] Min replica=0 would break autoscaler component (#16092) * fixing the bug where num_replica=0 would fail * changelog * [App] Scale out/in interval for autoscaler (#16093) * Adding arguments for scale out/in interval * Tests * Set the default work start method to spawn on MacOS (#16089) * [App] Add status endpoint, enable `ready` (#16075) Co-authored-by: thomas chaton <[email protected]> * Clarify `work.stop()` limitation (#16073) * fix merge errors * Update torchvision requirement from <=0.14.0,>=0.11.1 to >=0.11.1,<0.15.0 in /requirements (#16108) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> * CI: settle file names (#16098) * CI: settle file names * rename * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix test failing on master due to bad auto-merge (#16118) * fix merge error Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: thomas <[email protected]> Co-authored-by: Yurij Mikhalevich <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Qiushi Pan <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]>
* Remove the deprecated profiler imports (#16059) * Revert "Load app before setting LIGHTNING_DISPATCHED" (#16064) Revert "Load app before setting LIGHTNING_DISPATCHED (#16057)" This reverts commit 8d3339a. * [App] Hot fix: Resolve detection of python debugger (#16068) Co-authored-by: thomas <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> * Load the app before setting `LIGHTNING_DISPATCHED` (#16071) * fix(cloud): detect and ignore venv (#16056) Co-authored-by: Ethan Harris <[email protected]> * Add function to remove checkpoint to allow override for extended classes (#16067) * Drop FairScale sharded parity tests (#16069) * minor fix: indent spaces in comment-out (#16076) * ci: print existing candidates (#16077) * [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082) * Better check for programmatic lightningignore (#16080) Co-authored-by: Jirka Borovec <[email protected]> * [App] Removing single quote (#16079) * [App] PoC: Add support for Request (#16047) * Have checkgroup pull the latest runs (#16033) * Update Multinode Warning (#16091) * [App] Serve datatypes with better client code (#16018) * docs: add PT version (#16010) * docs: add PT version * stable Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> * add 1.13.1 to adjust versions (#16099) * Remove redundant `find_unused_parameters=False` in Lite (#16026) * [App] Add display name property to the work (#16095) Co-authored-by: thomas <[email protected]> * Fix detection of whether app is running in cloud (#16045) * [App] Add work.delete (#16103) Co-authored-by: thomas <[email protected]> * [App] Improve the autoscaler UI (#16063) [App] Improve the autoscaler UI (#16063) * Re-enable Lite CLI on Windows + PyTorch 1.13 (#15645) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <[email protected]> * [App] Min replica=0 would break autoscaler component (#16092) * fixing the bug where num_replica=0 would fail * changelog * [App] Scale out/in interval for autoscaler (#16093) * Adding arguments for scale out/in interval * Tests * Set the default work start method to spawn on MacOS (#16089) * [App] Add status endpoint, enable `ready` (#16075) Co-authored-by: thomas chaton <[email protected]> * Clarify `work.stop()` limitation (#16073) * fix merge errors * Update torchvision requirement from <=0.14.0,>=0.11.1 to >=0.11.1,<0.15.0 in /requirements (#16108) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> * CI: settle file names (#16098) * CI: settle file names * rename * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix test failing on master due to bad auto-merge (#16118) * fix merge error Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: thomas <[email protected]> Co-authored-by: Yurij Mikhalevich <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Qiushi Pan <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]>
* Remove the deprecated profiler imports (#16059) * Revert "Load app before setting LIGHTNING_DISPATCHED" (#16064) Revert "Load app before setting LIGHTNING_DISPATCHED (#16057)" This reverts commit 8d3339a. * [App] Hot fix: Resolve detection of python debugger (#16068) Co-authored-by: thomas <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> * Load the app before setting `LIGHTNING_DISPATCHED` (#16071) * fix(cloud): detect and ignore venv (#16056) Co-authored-by: Ethan Harris <[email protected]> * Add function to remove checkpoint to allow override for extended classes (#16067) * Drop FairScale sharded parity tests (#16069) * minor fix: indent spaces in comment-out (#16076) * ci: print existing candidates (#16077) * [App] Fix bug where previously deleted apps cannot be re-run from the CLI (#16082) * Better check for programmatic lightningignore (#16080) Co-authored-by: Jirka Borovec <[email protected]> * [App] Removing single quote (#16079) * [App] PoC: Add support for Request (#16047) * Have checkgroup pull the latest runs (#16033) * Update Multinode Warning (#16091) * [App] Serve datatypes with better client code (#16018) * docs: add PT version (#16010) * docs: add PT version * stable Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> * add 1.13.1 to adjust versions (#16099) * Remove redundant `find_unused_parameters=False` in Lite (#16026) * [App] Add display name property to the work (#16095) Co-authored-by: thomas <[email protected]> * Fix detection of whether app is running in cloud (#16045) * [App] Add work.delete (#16103) Co-authored-by: thomas <[email protected]> * [App] Improve the autoscaler UI (#16063) [App] Improve the autoscaler UI (#16063) * Re-enable Lite CLI on Windows + PyTorch 1.13 (#15645) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Justus Schock <[email protected]> * [App] Min replica=0 would break autoscaler component (#16092) * fixing the bug where num_replica=0 would fail * changelog * [App] Scale out/in interval for autoscaler (#16093) * Adding arguments for scale out/in interval * Tests * Set the default work start method to spawn on MacOS (#16089) * [App] Add status endpoint, enable `ready` (#16075) Co-authored-by: thomas chaton <[email protected]> * Clarify `work.stop()` limitation (#16073) * fix merge errors * Update torchvision requirement from <=0.14.0,>=0.11.1 to >=0.11.1,<0.15.0 in /requirements (#16108) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> * CI: settle file names (#16098) * CI: settle file names * rename * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix test failing on master due to bad auto-merge (#16118) * fix merge error Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: thomas <[email protected]> Co-authored-by: Yurij Mikhalevich <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Sean Naren <[email protected]> Co-authored-by: Qiushi Pan <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]>
What does this PR do?
Fixes regression where the
LightningTrainerMultiNode
component fails to start because the work is re-created several times even after the app has been dispatched.Avoiding the use of an environment variable fixes the issue.
self._backend is not None
can act as a proxy for this as suggested by @tchatonThis should get tested by re-enabling the multinode tests
Does your PR introduce any breaking changes? If yes, please list them.
None
cc @Borda