Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update AKS gpu cluster setup #49992

Merged
merged 2 commits into from
Jan 31, 2025
Merged

Conversation

anson627
Copy link
Contributor

@anson627 anson627 commented Jan 21, 2025

This pull request adds a new user guide for setting up an Azure AKS cluster with GPU nodes specifically for KubeRay, and updates references in the Kubernetes cluster setup documentation.

New Azure AKS GPU cluster setup guide:

Documentation updates:

Why are these changes needed?

Updated based on latest AKS public docs

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    make sure steps in quickstart run without issue on AKS cluster

@anson627
Copy link
Contributor Author

@kevin85421 can you help to take a look at simple doc update?

@jcotant1 jcotant1 added the core Issues that should be addressed in Ray Core label Jan 23, 2025
@anson627
Copy link
Contributor Author

@jcotant1 can you help to take a look?

@pcmoritz
Copy link
Contributor

Thanks a lot for the contribution @anson627, can you fix the following high level points before we merge the PR?

  1. Sign off on your commits so the DCO passes (https://github.com/ray-project/ray/pull/49992/checks?check_run_id=36079332073)
  2. Put back the original links, that were in https://github.com/ray-project/ray/pull/49992/files#diff-4b96da3370400e06b8f96f19d13bfdeb122d56f179b4404ffbf501b17781cb48L30 to the top of the new documentation page to make it easier for people to follow and get more detailed information about the steps (or if applicable, you can mention the relevant docs in each step)

Signed-off-by: Anson Qian <[email protected]>
Signed-off-by: Anson Qian <[email protected]>
@anson627
Copy link
Contributor Author

@pcmoritz @csivanich thanks for getting back to me! all comments addressed

@anson627 anson627 requested a review from csivanich January 31, 2025 00:39
@pcmoritz pcmoritz added the go add ONLY when ready to merge, run all tests label Jan 31, 2025
@pcmoritz pcmoritz enabled auto-merge (squash) January 31, 2025 02:43
@pcmoritz pcmoritz merged commit c9da11d into ray-project:master Jan 31, 2025
7 checks passed
n30111 pushed a commit to minds-ai/ray that referenced this pull request Jan 31, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->
This pull request adds a new user guide for setting up an Azure AKS
cluster with GPU nodes specifically for KubeRay, and updates references
in the Kubernetes cluster setup documentation.

New Azure AKS GPU cluster setup guide:

*
[`doc/source/cluster/kubernetes/user-guides/azure-aks-gpu-cluster.md`](diffhunk://#diff-0b5f6ba4d8b02475f9b0eef738c62ac56e9a782d524b8e38eb4fdf453d283630R1-R51):
Added a detailed guide on creating an Azure AKS cluster with GPU nodes
for KubeRay, including steps for creating a resource group, creating an
AKS cluster, adding a GPU node group, and obtaining kubeconfig.

Documentation updates:

*
[`doc/source/cluster/kubernetes/user-guides/k8s-cluster-setup.md`](diffhunk://#diff-4b96da3370400e06b8f96f19d13bfdeb122d56f179b4404ffbf501b17781cb48R11):
Added a reference to the new Azure AKS GPU cluster setup guide in the
list of available cluster setup guides.
*
[`doc/source/cluster/kubernetes/user-guides/k8s-cluster-setup.md`](diffhunk://#diff-4b96da3370400e06b8f96f19d13bfdeb122d56f179b4404ffbf501b17781cb48L29-R31):
Updated the section for setting up an AKS cluster to include a reference
to the new detailed setup guide.
<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?


Updated based on latest AKS public docs

## Related issue number



## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [x] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [x] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
make sure steps in
[quickstart](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started/raycluster-quick-start.html#kuberay-raycluster-quickstart)
run without issue on AKS cluster

---------

Signed-off-by: Anson Qian <[email protected]>

Signed-off-by: n3011 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants