-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[App] Introduce auto scaler #15769
Merged
Merged
[App] Introduce auto scaler #15769
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
akihironitta
added
feature
Is an improvement or enhancement
app (removed)
Generic label for Lightning App package
components
labels
Nov 22, 2022
akihironitta
force-pushed
the
feat/load-balancer-component
branch
from
November 22, 2022 13:54
8ce484c
to
f5a6081
Compare
akihironitta
force-pushed
the
feat/load-balancer-component
branch
from
November 24, 2022 10:16
f5a6081
to
0d67fd2
Compare
akihironitta
changed the title
[App] Introduce load balancer
[App] Introduce auto scaler
Nov 24, 2022
tchaton
approved these changes
Dec 6, 2022
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM !
mergify
bot
added
ready
PRs ready to be merged
and removed
has conflicts
ready
PRs ready to be merged
labels
Dec 6, 2022
Blocked by the failure of cloud-e2e tests running on Azure. |
Borda
requested review from
aniketmaurya,
carmocca and
nohalon
and removed request for
aniketmaurya
December 7, 2022 11:09
Borda
reviewed
Dec 7, 2022
Co-authored-by: Jirka Borovec <[email protected]>
akihironitta
commented
Dec 7, 2022
Borda
approved these changes
Dec 7, 2022
Borda
pushed a commit
that referenced
this pull request
Dec 7, 2022
* Exlucde __pycache__ in setuptools * Add load balancer example * wip * Update example * rename * remove prints * _LoadBalancer -> LoadBalancer * AutoScaler(work) * change var name * remove locust * Update docs * include autoscaler in api ref * docs typo * docs typo * docs typo * docs typo * remove unused loadtest * remove unused device_type * clean up * clean up * clean up * Add docstring * type * env vars to args * expose an API for users to override to customise autoscaling logic * update example * comment * udpate var name * fix scale mechanism and clean up * Update exampl * ignore mypy * Add test file * . * update impl and update tests * Update changlog * . * revert docs * update test * update state to keep calling 'flow.run()' Co-authored-by: Aniket Maurya <[email protected]> * Add aiohttp to base requirements * Update docs Co-authored-by: Luca Antiga <[email protected]> * Use deserializer utility * fake trigger * wip: protect /system/* with basic auth * read password at runtime * Change env var name * import torch as optional * Don't overcreate works * simplify imports * Update example * aiohttp * Add work_args work_kwargs * More docs * remove FIXME * Apply Jirka's suggestions Co-authored-by: Jirka Borovec <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean example device * add comment on init threshold value * bad merge * nit: logging format * {in,out}put_schema -> {in,out}put_type * lowercase * docs on seconds * process_time -> processing_time * Dont modify work state from flow * Update tests * worker_url -> endpoint * fix exampl * Fix default scale logic * Fix default scale logic * Fix num_pending_works * Update num_pending_works * Fix bug creating too many works * Remove up/downscale_threshold args * Update example * Add typing * Fix example in docstring * Fix default scale logic * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename method * rename locvar * Add todo * docs ci * docs ci * asdfafsdasdf pls docs * Apply suggestions from code review Co-authored-by: Ethan Harris <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * . * doc * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 24983a0. * Revert "Update src/lightning_app/components/auto_scaler.py" This reverts commit 56ea78b. * Remove redefinition * Remove load balancer run blocker * raise RuntimeError * remove has_sent * lower the default timeout_batching from 10 to 1 * remove debug * update the default timeout_batching * . * tighten condition * fix endpoint * typo in runtimeerror cond * async lock update severs * add a test * {in,out}put_type typing * Update examples/app_server_with_auto_scaler/app.py Co-authored-by: Jirka Borovec <[email protected]> * Update .actions/setup_tools.py Co-authored-by: Aniket Maurya <[email protected]> Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Noha Alon <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: thomas chaton <[email protected]> (cherry picked from commit 64b19fb)
lantiga
added a commit
that referenced
this pull request
Dec 7, 2022
* update chlog * Tests/App: refactor examples - structure (#15770) * rename _examples dir * refactor * clean * path * add inits * skip * e2e * azure * e2e * rev * unify single depth for ignore docs req. * group (cherry picked from commit 59fa320) * feature: add `_generate_works_json` method (#15767) (cherry picked from commit 51bb845) * tests: split examples and pytests (#15774) split examples and pytests (cherry picked from commit 952b64b) * [App] Stop App when it has succeeded (#15801) (cherry picked from commit 3a99a25) * Notify the user of ignored requirements (#15799) (cherry picked from commit 9e43604) * Add code_dir argument to tracer run (#15771) (cherry picked from commit 0a12731) * [App] Add CloudMultiProcessBackend to run an children App within the Flow in the cloud (#15800) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * updte * update * update * update * update * update * update * update * update * update * update * update * Update src/lightning_app/CHANGELOG.md Co-authored-by: Ethan Harris <[email protected]> * Update src/lightning_app/utilities/port.py Co-authored-by: Ethan Harris <[email protected]> * Update src/lightning_app/utilities/port.py Co-authored-by: Ethan Harris <[email protected]> * Update src/lightning_app/utilities/port.py Co-authored-by: Ethan Harris <[email protected]> * Update src/lightning_app/utilities/port.py Co-authored-by: Ethan Harris <[email protected]> * Update src/lightning_app/utilities/port.py Co-authored-by: Ethan Harris <[email protected]> * Update src/lightning_app/utilities/port.py Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Ethan Harris <[email protected]> (cherry picked from commit 8ca6dfe) * Update lightning-utilities requirement from ==0.3.* to ==0.4.* in /requirements (#15420) Update lightning-utilities requirement in /requirements Updates the requirements on [lightning-utilities](https://github.com/Lightning-AI/utilities) to permit the latest version. - [Release notes](https://github.com/Lightning-AI/utilities/releases) - [Changelog](https://github.com/Lightning-AI/utilities/blob/main/CHANGELOG.md) - [Commits](Lightning-AI/utilities@v0.3.0...v0.4.0) --- updated-dependencies: - dependency-name: lightning-utilities dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (cherry picked from commit e150d08) * Ignore `num_nodes` when running MultiNode components locally (#15806) (cherry picked from commit a970f09) * lit extras (#15793) Co-authored-by: Carlos Mocholí <[email protected]> (cherry picked from commit 8ee889b) * [App] Add utility to get install command for package extras (#15809) (cherry picked from commit f171657) * [App] Enable Python Server and Gradio Serve to run on accelerated device such as GPU CUDA / MPS (#15813) (cherry picked from commit 4e64391) * Update flake8 version (#15816) (cherry picked from commit 1e56b75) * Checkgroup config fixes (#15787) (cherry picked from commit cca3432) * [App] Resolve a condition bug with spawning (#15812) Co-authored-by: Carlos Mocholí <[email protected]> (cherry picked from commit 6a2a83a) * Print the e2e app ID as early as possible (#15821) (cherry picked from commit 76cf419) * Added note about custom base images (#14125) Co-authored-by: awaelchli <[email protected]> (cherry picked from commit 70126df) * Add warning comment to cloud requirements (#15790) (cherry picked from commit be699a8) * [CLI] fix ssh listing stopped components (#15810) * [CLI] fix ssh listing stopped components * update CHANGELOG (cherry picked from commit c786b3d) * Update fairscale requirement from <=0.4.6,>=0.4.5 to >=0.4.5,<0.4.13 in /requirements (#15842) Update fairscale requirement in /requirements Updates the requirements on [fairscale]() to permit the latest version. --- updated-dependencies: - dependency-name: fairscale dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (cherry picked from commit 206fd06) * Bump google-github-actions/get-gke-credentials from 0 to 1 (#15843) Bumps [google-github-actions/get-gke-credentials](https://github.com/google-github-actions/get-gke-credentials) from 0 to 1. - [Release notes](https://github.com/google-github-actions/get-gke-credentials/releases) - [Changelog](https://github.com/google-github-actions/get-gke-credentials/blob/main/CHANGELOG.md) - [Commits](google-github-actions/get-gke-credentials@v0...v1) --- updated-dependencies: - dependency-name: google-github-actions/get-gke-credentials dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (cherry picked from commit ed7707e) * Bump hivemind from 1.0.1 to 1.1.2 in /requirements (#15839) Bumps [hivemind](https://github.com/learning-at-home/hivemind) from 1.0.1 to 1.1.2. - [Release notes](https://github.com/learning-at-home/hivemind/releases) - [Commits](learning-at-home/hivemind@1.0.1...1.1.2) --- updated-dependencies: - dependency-name: hivemind dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (cherry picked from commit 1d94297) * Update cloudpickle requirement from <=2.1.0,>=1.3 to >=1.3,<2.3.0 in /requirements (#15840) Update cloudpickle requirement in /requirements Updates the requirements on [cloudpickle](https://github.com/cloudpipe/cloudpickle) to permit the latest version. - [Release notes](https://github.com/cloudpipe/cloudpickle/releases) - [Changelog](https://github.com/cloudpipe/cloudpickle/blob/master/CHANGES.md) - [Commits](cloudpipe/cloudpickle@v1.3.0...v2.2.0) --- updated-dependencies: - dependency-name: cloudpickle dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> (cherry picked from commit 95d5ccb) * hotfix import torch (#15849) * fix import torch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * plugin * fix * skip * patch require * seed * warn * . * .. * skip True * 0.0.3 Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit ad4bd66) * [App] Improve cluster creation / deletion experience (#15458) Cluster creation and deletion can take a long time. Instead of having these long running operations happen in the background, they should happen in the foreground. The advantage is that failures are brought to the users attention immediately, instead of the next time they decide to run `lightning list clusters`. While the CLI waits for the cluster to run / delete, it will display cluster status changes to the user. This PR also hides the `--enable-performance` and `--edit-before-creation` creation flags, as well as the `--force` deletion flag. They are either not frequently used (performance mode is expensive), or prone to misuse. Co-authored-by: Neven Miculinic <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Raphael Randschau <[email protected]> (cherry picked from commit 33e1f93) * CI: freeze docs requirements [hotfix] (#15865) freeze ipy (cherry picked from commit bc528fd) * fix formatting * [App] Raise error when launching app on multiple clusters (#15484) * Error when running on multiple clusters * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert this in separate PR: keep this focused * Improve testing * fixup! Improve testing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * pass flake8 * Update changelog * Address PR feedback * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused import * Reword error message * Error if running on cluster that doesn't exist * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixup! Error if running on cluster that doesn't exist * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unsued import Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit c5d3bba) * Moving `lightning_api_access` out of base requirements (#15844) * moving the requirements to components extras * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * component requirements to devel * importing torch in local scope * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * skipping doctest Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> (cherry picked from commit 5864409) * [App] Fixing Sigterm Handler causing thread lock which caused KeyboardInterrupt to hang (#15881) * terminating only once * changelog (cherry picked from commit 5144160) * CI: signal lai build (#15871) (cherry picked from commit 36b953b) * CI: prune dependency for benchmarks (#15879) * prune dependency for benchmarks * drop (cherry picked from commit 993bd67) * unblock legacy checkpoints (#15798) * fixing legacy checkpoints * Apply suggestions from code review Co-authored-by: Akihiro Nitta <[email protected]> (cherry picked from commit fee52f9) * CI: update signalling (#15887) (cherry picked from commit f4fcad3) * update chlog * waiting on feedback (#15893) * waiting * builds (cherry picked from commit a86584d) * [CLI] drop name column from cluster list (#15721) * drop name column from cluster list * change create cluster to accept id as well * rename validator * remove cluster name from logs * fix merge with master * more merge with master issues (cherry picked from commit a82be2f) * Add CLI Command to Delete Lightning App (#15783) * initial work on deleting apps * after PR review * delete CLI working * restructred to make tests easier * revert manifest changes * added changelog, fix mypy issue * updates * Update src/lightning_app/cli/cmd_apps.py Co-authored-by: Jirka Borovec <[email protected]> * Update src/lightning_app/cli/lightning_cli_delete.py Co-authored-by: Jirka Borovec <[email protected]> * Update src/lightning_app/cli/lightning_cli_delete.py Co-authored-by: Jirka Borovec <[email protected]> * Update src/lightning_app/cli/lightning_cli_delete.py Co-authored-by: Sherin Thomas <[email protected]> * Update src/lightning_app/cli/lightning_cli_delete.py Co-authored-by: Sherin Thomas <[email protected]> * import typing * adding tests * finished adding tests * addressed code review comments * fix mypy error * make mypy happy * make mypy happy * make mypy happy * make mypy happy * fix windows cli Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> (cherry picked from commit b4d99e3) * [App] Support for headless apps (#15875) * Add `is_headless` when dispatching in the cloud * Bump cloud version * Add tests * Dont open app page for headless apps locally * Refactor * Update CHANGELOG.md * Support dynamic UIs at runtime * Comments * Fix * Updates * Fixes and cleanup * Fix tests * Dont open view page for headless apps * Fix test, resolve URL the right way * Remove launch * Clean * Cleanup tests * Fixes * Updates * Add test * Increase app cloud tests timeout * Increase timeout * Wait for running * Revert timeouts * Clean * Dont update if it hasnt changed * Increase timeout (cherry picked from commit 32cf1fa) * [App] Fix hanging CI (#15913) (cherry picked from commit ab022ac) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * version 1.8.4 * Direct support for compiled models (#15922) * Direct support for compiled models * Update test * Update src/pytorch_lightning/core/module.py Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Ethan Harris <[email protected]> (cherry picked from commit 2992002) * update chlog * CI: parameterize TPU tests (#15876) * update * param * Apply suggestions from code review (cherry picked from commit 77006a2) * [App] Add ready property to the flow (#15921) (cherry picked from commit 852089e) * [App] Enable running with spawn context (#15923) (cherry picked from commit d2a8fbf) * Fix compiler support test (#15927) (cherry picked from commit 6f54a82) * Enable back inference mode support with hpu & update links (#15918) * Enable back inference mode support with hpu * Remove unused * Update document link and address comment Signed-off-by: Jerome <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> (cherry picked from commit 6aaac8b) * [App] Introduce auto scaler (#15769) * Exlucde __pycache__ in setuptools * Add load balancer example * wip * Update example * rename * remove prints * _LoadBalancer -> LoadBalancer * AutoScaler(work) * change var name * remove locust * Update docs * include autoscaler in api ref * docs typo * docs typo * docs typo * docs typo * remove unused loadtest * remove unused device_type * clean up * clean up * clean up * Add docstring * type * env vars to args * expose an API for users to override to customise autoscaling logic * update example * comment * udpate var name * fix scale mechanism and clean up * Update exampl * ignore mypy * Add test file * . * update impl and update tests * Update changlog * . * revert docs * update test * update state to keep calling 'flow.run()' Co-authored-by: Aniket Maurya <[email protected]> * Add aiohttp to base requirements * Update docs Co-authored-by: Luca Antiga <[email protected]> * Use deserializer utility * fake trigger * wip: protect /system/* with basic auth * read password at runtime * Change env var name * import torch as optional * Don't overcreate works * simplify imports * Update example * aiohttp * Add work_args work_kwargs * More docs * remove FIXME * Apply Jirka's suggestions Co-authored-by: Jirka Borovec <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean example device * add comment on init threshold value * bad merge * nit: logging format * {in,out}put_schema -> {in,out}put_type * lowercase * docs on seconds * process_time -> processing_time * Dont modify work state from flow * Update tests * worker_url -> endpoint * fix exampl * Fix default scale logic * Fix default scale logic * Fix num_pending_works * Update num_pending_works * Fix bug creating too many works * Remove up/downscale_threshold args * Update example * Add typing * Fix example in docstring * Fix default scale logic * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename method * rename locvar * Add todo * docs ci * docs ci * asdfafsdasdf pls docs * Apply suggestions from code review Co-authored-by: Ethan Harris <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * . * doc * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 24983a0. * Revert "Update src/lightning_app/components/auto_scaler.py" This reverts commit 56ea78b. * Remove redefinition * Remove load balancer run blocker * raise RuntimeError * remove has_sent * lower the default timeout_batching from 10 to 1 * remove debug * update the default timeout_batching * . * tighten condition * fix endpoint * typo in runtimeerror cond * async lock update severs * add a test * {in,out}put_type typing * Update examples/app_server_with_auto_scaler/app.py Co-authored-by: Jirka Borovec <[email protected]> * Update .actions/setup_tools.py Co-authored-by: Aniket Maurya <[email protected]> Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Noha Alon <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: thomas chaton <[email protected]> (cherry picked from commit 64b19fb) * ENG-627: Docs for CloudCompute Mount Argument (#15182) fixed conflicts (cherry picked from commit 2041908) * Fix LRScheduler import for PyTorch 2.0 (#15940) * Fix LRScheduler import for PyTorch 2.0 * Add comment for posterity (cherry picked from commit de93167) * update chlog Co-authored-by: Yurij Mikhalevich <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Laverne Henderson <[email protected]> Co-authored-by: Raphael Randschau <[email protected]> Co-authored-by: Luca Furst <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: Rick Izzo <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jerome Anand <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]>
This was referenced Dec 8, 2022
justusschock
added a commit
that referenced
this pull request
Dec 9, 2022
* Simplify enabling CPU offload in FSDP (#15832) Co-authored-by: Jirka Borovec <[email protected]> * [App] Enable running with spawn context (#15923) * Fix compiler support test (#15927) * Enable back inference mode support with hpu & update links (#15918) * Enable back inference mode support with hpu * Remove unused * Update document link and address comment Signed-off-by: Jerome <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [App] Introduce auto scaler (#15769) * Exlucde __pycache__ in setuptools * Add load balancer example * wip * Update example * rename * remove prints * _LoadBalancer -> LoadBalancer * AutoScaler(work) * change var name * remove locust * Update docs * include autoscaler in api ref * docs typo * docs typo * docs typo * docs typo * remove unused loadtest * remove unused device_type * clean up * clean up * clean up * Add docstring * type * env vars to args * expose an API for users to override to customise autoscaling logic * update example * comment * udpate var name * fix scale mechanism and clean up * Update exampl * ignore mypy * Add test file * . * update impl and update tests * Update changlog * . * revert docs * update test * update state to keep calling 'flow.run()' Co-authored-by: Aniket Maurya <[email protected]> * Add aiohttp to base requirements * Update docs Co-authored-by: Luca Antiga <[email protected]> * Use deserializer utility * fake trigger * wip: protect /system/* with basic auth * read password at runtime * Change env var name * import torch as optional * Don't overcreate works * simplify imports * Update example * aiohttp * Add work_args work_kwargs * More docs * remove FIXME * Apply Jirka's suggestions Co-authored-by: Jirka Borovec <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean example device * add comment on init threshold value * bad merge * nit: logging format * {in,out}put_schema -> {in,out}put_type * lowercase * docs on seconds * process_time -> processing_time * Dont modify work state from flow * Update tests * worker_url -> endpoint * fix exampl * Fix default scale logic * Fix default scale logic * Fix num_pending_works * Update num_pending_works * Fix bug creating too many works * Remove up/downscale_threshold args * Update example * Add typing * Fix example in docstring * Fix default scale logic * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename method * rename locvar * Add todo * docs ci * docs ci * asdfafsdasdf pls docs * Apply suggestions from code review Co-authored-by: Ethan Harris <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * . * doc * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 24983a0. * Revert "Update src/lightning_app/components/auto_scaler.py" This reverts commit 56ea78b. * Remove redefinition * Remove load balancer run blocker * raise RuntimeError * remove has_sent * lower the default timeout_batching from 10 to 1 * remove debug * update the default timeout_batching * . * tighten condition * fix endpoint * typo in runtimeerror cond * async lock update severs * add a test * {in,out}put_type typing * Update examples/app_server_with_auto_scaler/app.py Co-authored-by: Jirka Borovec <[email protected]> * Update .actions/setup_tools.py Co-authored-by: Aniket Maurya <[email protected]> Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Noha Alon <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: thomas chaton <[email protected]> * ENG-627: Docs for CloudCompute Mount Argument (#15182) fixed conflicts * Fix LRScheduler import for PyTorch 2.0 (#15940) * Fix LRScheduler import for PyTorch 2.0 * Add comment for posterity * CI: fix pypi flow (#15944) * CI: fixing pypi syntax (#15943) * connect * input * [App] Remove `SingleProcessRuntime` (#15933) * Remove SingleProcessRuntime * Remove unused queues * Docs * [App] Fix bug when using structures with works (#15911) * Fix bug when using structures with works * Add test * Update CHANGELOG.md * [App] Wait for full file to be transferred in Path / Payload (#15934) * Wait for full file to be transferred in Path / Payload * Fixes * [docs] Include all components in the API reference (#15805) * Update docs Co-authored-by: Jirka Borovec <[email protected]> * Bump playwright from 1.27.1 to 1.28.0 in /requirements (#15903) * Bump playwright from 1.27.1 to 1.28.0 in /requirements Bumps [playwright](https://github.com/Microsoft/playwright-python) from 1.27.1 to 1.28.0. - [Release notes](https://github.com/Microsoft/playwright-python/releases) - [Commits](microsoft/playwright-python@v1.27.1...v1.28.0) --- updated-dependencies: - dependency-name: playwright dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * 1.28 Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> * [App] Add `configure_layout` method for works (#15926) * Add `configure_layout` method for works * Check for api access availability * Updates from review * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Sherin Thomas <[email protected]> * Make gradients available for all_gather on TPU (#15003) * Make gradients available for all_gather on TPU * Modify switch and tests * Apply suggestions from code review * Modify tests * Fix test * Drop test Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> * Don't try to aggregate `requirements/__pycache__/base.txt` in setuptools (#15775) Exlucde __pycache__ in setuptools * [App] Multiprocessing-safe work pickling (#15836) * Upgrade to HPU release 1.7.1 (#15956) * Upgrade to HPU release 1.7.1 Update torch version check for hpu Signed-off-by: Jerome <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Multinode on MPS (#15748) * Fix restarting attribute for lr finder * update lite executor * update trainer executor * update spawn executor * add multinode component tests * add testing helpers * add lite tests * add trainer tests * update changelog * update trainer * update workflow * update tests * debug * add reason for skipif * Apply suggestions from code review * switch skipif Co-authored-by: Jirka <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> * [App] Resolve PythonServer on M1 (#15949) Co-authored-by: thomas <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Lite: Fix DataLoader shuffling when using DistributedSampler (#15931) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [App] Temporarily disable ready (#15958) * Fix restarting attribute for lr finder (#15620) * [App] Improve pdb for multiprocessing (#15950) Co-authored-by: thomas <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [App] Improve debug triggering (#15951) * [App] Add automatic conversion to structures (#15961) * Make LightningModule torch.jit.script-able again (#15947) * Make LightningModule torch.jit.script-able again * remove skip Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * refactor: simplify Tensor import (#15959) * Fix ImportErrors on Multinode if package not present (#15963) * Fix typo in definition of world size in docs (#15954) * [App] Enable running an app from the Gallery (#15941) Co-authored-by: thomas <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Jirka <[email protected]> * Apply dynamo to training_step, validation_step, test_step, predict_step (#15957) * Apply dynamo to training_step, validation_step, test_step, predict_step * Add entry to CHANGELOG.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge conflict * rename tpu workflow Signed-off-by: Jerome <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Jerome Anand <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Aniket Maurya <[email protected]> Co-authored-by: Noha Alon <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Rick Izzo <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: stekiri <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: thomas <[email protected]>
carmocca
added a commit
that referenced
this pull request
Jan 4, 2023
* Simplify enabling CPU offload in FSDP (#15832) Co-authored-by: Jirka Borovec <[email protected]> * [App] Enable running with spawn context (#15923) * Fix compiler support test (#15927) * Enable back inference mode support with hpu & update links (#15918) * Enable back inference mode support with hpu * Remove unused * Update document link and address comment Signed-off-by: Jerome <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [App] Introduce auto scaler (#15769) * Exlucde __pycache__ in setuptools * Add load balancer example * wip * Update example * rename * remove prints * _LoadBalancer -> LoadBalancer * AutoScaler(work) * change var name * remove locust * Update docs * include autoscaler in api ref * docs typo * docs typo * docs typo * docs typo * remove unused loadtest * remove unused device_type * clean up * clean up * clean up * Add docstring * type * env vars to args * expose an API for users to override to customise autoscaling logic * update example * comment * udpate var name * fix scale mechanism and clean up * Update exampl * ignore mypy * Add test file * . * update impl and update tests * Update changlog * . * revert docs * update test * update state to keep calling 'flow.run()' Co-authored-by: Aniket Maurya <[email protected]> * Add aiohttp to base requirements * Update docs Co-authored-by: Luca Antiga <[email protected]> * Use deserializer utility * fake trigger * wip: protect /system/* with basic auth * read password at runtime * Change env var name * import torch as optional * Don't overcreate works * simplify imports * Update example * aiohttp * Add work_args work_kwargs * More docs * remove FIXME * Apply Jirka's suggestions Co-authored-by: Jirka Borovec <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * clean example device * add comment on init threshold value * bad merge * nit: logging format * {in,out}put_schema -> {in,out}put_type * lowercase * docs on seconds * process_time -> processing_time * Dont modify work state from flow * Update tests * worker_url -> endpoint * fix exampl * Fix default scale logic * Fix default scale logic * Fix num_pending_works * Update num_pending_works * Fix bug creating too many works * Remove up/downscale_threshold args * Update example * Add typing * Fix example in docstring * Fix default scale logic * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rename method * rename locvar * Add todo * docs ci * docs ci * asdfafsdasdf pls docs * Apply suggestions from code review Co-authored-by: Ethan Harris <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * . * doc * Update src/lightning_app/components/auto_scaler.py Co-authored-by: Noha Alon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit 24983a0. * Revert "Update src/lightning_app/components/auto_scaler.py" This reverts commit 56ea78b. * Remove redefinition * Remove load balancer run blocker * raise RuntimeError * remove has_sent * lower the default timeout_batching from 10 to 1 * remove debug * update the default timeout_batching * . * tighten condition * fix endpoint * typo in runtimeerror cond * async lock update severs * add a test * {in,out}put_type typing * Update examples/app_server_with_auto_scaler/app.py Co-authored-by: Jirka Borovec <[email protected]> * Update .actions/setup_tools.py Co-authored-by: Aniket Maurya <[email protected]> Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Noha Alon <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: thomas chaton <[email protected]> * ENG-627: Docs for CloudCompute Mount Argument (#15182) fixed conflicts * Fix LRScheduler import for PyTorch 2.0 (#15940) * Fix LRScheduler import for PyTorch 2.0 * Add comment for posterity * CI: fix pypi flow (#15944) * CI: fixing pypi syntax (#15943) * connect * input * [App] Remove `SingleProcessRuntime` (#15933) * Remove SingleProcessRuntime * Remove unused queues * Docs * [App] Fix bug when using structures with works (#15911) * Fix bug when using structures with works * Add test * Update CHANGELOG.md * [App] Wait for full file to be transferred in Path / Payload (#15934) * Wait for full file to be transferred in Path / Payload * Fixes * [docs] Include all components in the API reference (#15805) * Update docs Co-authored-by: Jirka Borovec <[email protected]> * Bump playwright from 1.27.1 to 1.28.0 in /requirements (#15903) * Bump playwright from 1.27.1 to 1.28.0 in /requirements Bumps [playwright](https://github.com/Microsoft/playwright-python) from 1.27.1 to 1.28.0. - [Release notes](https://github.com/Microsoft/playwright-python/releases) - [Commits](microsoft/playwright-python@v1.27.1...v1.28.0) --- updated-dependencies: - dependency-name: playwright dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * 1.28 Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> * [App] Add `configure_layout` method for works (#15926) * Add `configure_layout` method for works * Check for api access availability * Updates from review * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Sherin Thomas <[email protected]> * Make gradients available for all_gather on TPU (#15003) * Make gradients available for all_gather on TPU * Modify switch and tests * Apply suggestions from code review * Modify tests * Fix test * Drop test Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> * Don't try to aggregate `requirements/__pycache__/base.txt` in setuptools (#15775) Exlucde __pycache__ in setuptools * [App] Multiprocessing-safe work pickling (#15836) * Upgrade to HPU release 1.7.1 (#15956) * Upgrade to HPU release 1.7.1 Update torch version check for hpu Signed-off-by: Jerome <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Multinode on MPS (#15748) * Fix restarting attribute for lr finder * update lite executor * update trainer executor * update spawn executor * add multinode component tests * add testing helpers * add lite tests * add trainer tests * update changelog * update trainer * update workflow * update tests * debug * add reason for skipif * Apply suggestions from code review * switch skipif Co-authored-by: Jirka <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> * [App] Resolve PythonServer on M1 (#15949) Co-authored-by: thomas <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Lite: Fix DataLoader shuffling when using DistributedSampler (#15931) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [App] Temporarily disable ready (#15958) * Fix restarting attribute for lr finder (#15620) * [App] Improve pdb for multiprocessing (#15950) Co-authored-by: thomas <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [App] Improve debug triggering (#15951) * [App] Add automatic conversion to structures (#15961) * Make LightningModule torch.jit.script-able again (#15947) * Make LightningModule torch.jit.script-able again * remove skip Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * refactor: simplify Tensor import (#15959) * Fix ImportErrors on Multinode if package not present (#15963) * Fix typo in definition of world size in docs (#15954) * [App] Enable running an app from the Gallery (#15941) Co-authored-by: thomas <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Jirka <[email protected]> * Apply dynamo to training_step, validation_step, test_step, predict_step (#15957) * Apply dynamo to training_step, validation_step, test_step, predict_step * Add entry to CHANGELOG.md * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix merge conflict * rename tpu workflow Signed-off-by: Jerome <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: thomas chaton <[email protected]> Co-authored-by: Luca Antiga <[email protected]> Co-authored-by: Jerome Anand <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Aniket Maurya <[email protected]> Co-authored-by: Noha Alon <[email protected]> Co-authored-by: Ethan Harris <[email protected]> Co-authored-by: Akihiro Nitta <[email protected]> Co-authored-by: Rick Izzo <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jirka <[email protected]> Co-authored-by: Sherin Thomas <[email protected]> Co-authored-by: stekiri <[email protected]> Co-authored-by: Jirka Borovec <[email protected]> Co-authored-by: Carlos Mocholí <[email protected]> Co-authored-by: Justus Schock <[email protected]> Co-authored-by: thomas <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
app (removed)
Generic label for Lightning App package
feature
Is an improvement or enhancement
ready
PRs ready to be merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Introduces AutoScaler.
Example 1 (autoscale out of the box)
Example 2 (customize the scaling logic)
Does your PR introduce any breaking changes? If yes, please list them.
None
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Did you have fun?
Make sure you had fun coding 🙃
(Not sure if this app feature PR should be part of the 1.9 milestone or 1.8.x.)
cc @Borda