Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP Batch Integration: launch jobs directly on GCP Batch #621

Closed
wants to merge 1 commit into from

Conversation

priyaramani
Copy link
Contributor

@priyaramani priyaramani commented Oct 18, 2022

Support directly scheduling jobs on GCP Batch

  • Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
  • Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Addresses #410

Test plan:
Unit tests
Screen Shot 2022-10-18 at 12 30 38 PM

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 18, 2022
@facebook-github-bot
Copy link
Contributor

@priyaramani has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@priyaramani has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@priyaramani has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@codecov
Copy link

codecov bot commented Oct 21, 2022

Codecov Report

Merging #621 (1c4fedb) into main (ad4b5da) will decrease coverage by 0.09%.
The diff coverage is 91.20%.

@@            Coverage Diff             @@
##             main     #621      +/-   ##
==========================================
- Coverage   94.60%   94.50%   -0.10%     
==========================================
  Files          69       71       +2     
  Lines        4908     5024     +116     
==========================================
+ Hits         4643     4748     +105     
- Misses        265      276      +11     
Impacted Files Coverage Δ
torchx/schedulers/__init__.py 95.23% <ø> (ø)
torchx/schedulers/gcp_batch_scheduler.py 90.43% <90.43%> (ø)
torchx/schedulers/kubernetes_scheduler.py 93.24% <100.00%> (-0.14%) ⬇️
torchx/util/strings.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

priyaramani added a commit that referenced this pull request Oct 25, 2022
Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: 82fe7d50668871a31805116eb77b2318bb823abf
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

priyaramani added a commit that referenced this pull request Oct 25, 2022
Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: 0a9afc9b2fe585ed9fbd30e6c06ed9fe7794db7a
priyaramani added a commit that referenced this pull request Oct 25, 2022
Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: 11c7bbb81bffc959585d1120bc4a57a1e8b19d71
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

priyaramani added a commit that referenced this pull request Oct 26, 2022
Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: 2ed7577632df1118aa414f67ca3525720790ac04
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

priyaramani added a commit that referenced this pull request Oct 28, 2022
Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: ab6b2d2e2ac7aeaceb8904fe643537d33d90b8ef
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

priyaramani added a commit that referenced this pull request Oct 28, 2022
Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: 5047dc749a629bce232d77d11e0c3cd6a2de1253
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

priyaramani added a commit that referenced this pull request Oct 28, 2022
Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: fd5b3025b3debb78276ea41a5a7a1d26ee6d711a
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

Summary:
Support directly scheduling jobs on GCP Batch

- Native support for launching Pytorch jobs on GCP: Currently you could use TorchX to launch training jobs on Kubernetes on GCP for which you need to set up Kube clusters etc, or use GCP managed services like Vertex AI. With this integration, the overhead to setup other services goes away and customers can directly launch their training jobs from TorchX on GCP schedulers.
- Cloud agnostic interface: In addition to current Pytorch customers using GCP, this adds flexibility for customers using one cloud provider to explore others as this adds the ability to easily migrate their Pytorch jobs from one platform to another.

Pull Request resolved: #621

Test Plan:
Unit tests
![Screen Shot 2022-10-18 at 12 30 38 PM](https://user-images.githubusercontent.com/87679608/196532219-8da3df5c-3053-4800-9cc3-8b2f4c52acea.png)

Reviewed By: d4l3k

Differential Revision: D40486955

Pulled By: priyaramani

fbshipit-source-id: 742222936a97767891a03eae9ccd7c488665da70
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D40486955

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants