-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[Core] Add TPU utility functions to support slice placement groups #56723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the contribution @chiayi - just a reminder to run pre-commit linting to unblock CI checks. ./ci/lint/lint.sh code_format and ./ci/lint/lint.sh pre_commit should do it.
ea13d03 to
1ac7a80
Compare
1ac7a80 to
a00cb87
Compare
a00cb87 to
e1cd5bb
Compare
|
After building the ray image, I manually ran the slice placement group against v6e 4x4 multihost tpus using the following testcode: |
|
@ryanaoleary Please take a look when you get the chance. Thank you! |
|
cc: @MengjinYan @edoakes |
de1ba7c to
a8486d2
Compare
94c013a to
4cae047
Compare
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Aaron Liang <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
4cae047 to
0ce9389
Compare
|
cc: @andrewsykim @liulehui @matthewdeng for review, for the JaxTrainer we'd call |
I feel the for the Ray Train side, the Wondering if we should create something like then maybe get rid of the cc @matthewdeng |
That sounds good to me, in the follow-up change to this PR that adds the multi-slice API to the Ray Train side I could add it in a new |
python/ray/util/accelerators/tpu.py
Outdated
| A handle to a placement group reservation for a TPU slice. | ||
| The following definitions are added for clarity: | ||
| - Accelerator type: A string describing the accelerator type and version (e.g. TPU-V2, TPU-V6E). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these match the existing accelerator type resource & label names? @ryanaoleary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All do except for "Accelerator version" which is just the TPU generation:
- Accelerator type -> ray.io/accelerator-type
- Accelerator version -> No exact equivalent label is set, but we can derive it from the other info
- Pod type -> ray.io/tpu-pod-type
- Accelerator topology -> ray.io/tpu-topology
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and yeah in the example the accelerator type names match:
| GOOGLE_TPU_V6E = "TPU-V6E" |
|
|
||
|
|
||
| @PublicAPI | ||
| class SlicePlacementGroup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving SlicePlacementGroup API (and any future scheduling API) to python/ray/util/tpu and keeping python/ray/util/accelerators/tpu.py for small helper functions for getting specific information about TPUs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly confused about what the structure should be, there's currently 2 locations with a tpu.py:
- _private/accelerators/tpu.py: This is where helper functions and those called internally by other libraries are (i.e.
TPUAcceleratorManager,reserve_tpu_slice, etc.)
-util/accelerators/tpu.py: This is where the public APIs currently are implemented.
Should the structure be:
- _private/accelerators/tpu.py: internal functions and helpers
- util/accelerators/tpu.py: public APIs but only those like
get_current_pod_namewhich are smaller helpers for getting info about the accelerator - util/tpu.py: This would be a new file where we implement the APIs related to TPUs and scheduling. I should probably move some functions like
reserve_tpu_slicehere.
I can change it to the above, my only concern would be that the logic is getting kind of spread out/complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, we should combine everything in util/accelerators/tpu.py into util/tpu.py? Those are all public alpha APIs. If we want to avoid breaking we could have the same functions in util/accelerators/tpu.py call util/tpu.py
_private/accelerators/tpu.py seems fine as it's for internal calls for TPUs as you mentioned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a23aed9
Signed-off-by: Ryan O'Leary <[email protected]>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
7a5ff6b to
a447c66
Compare
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
edoakes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As next step, should we add a user guide to the Ray docs about how to use Ray w/ TPUs?
| for _ in range(self.num_slices): | ||
| # Reserving a slice is done through constructing num_workers bundles, each with a label selector for | ||
| # the unique name of an available TPU slice. | ||
| slice_name = reserve_tpu_slice(self._topology, accelerator_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: TPU Reservation Logic Inconsistency
The reserve_tpu_slice function creates a placement group to reserve the TPU slice head and returns only the slice name, but the placement group reference is lost. This orphaned placement group will hold resources but won't be tracked. The subsequent placement group created at lines 183-189 doesn't include the head resource reservation that reserve_tpu_slice was supposed to provide, potentially causing scheduling issues. The function should either return both the slice name and placement group reference, or the reservation logic should be integrated directly into _reserve_slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is covered in a TODO comment, I plan to update it in the PR to add multi-slice to the JaxTrainer since it will require editing the call there.
Yeah that sounds good to me, I can write up a guide on the new API. There's this existing overview of TPUs with KubeRay: https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/tpu.html, and I could add something similar but for the util library / Ray core support under something like Ray Core > User Guides > Advanced Topics. |
…ay-project#56723) This PR adds to the utility library for TPU slice placement group scheduling. We generalize the 2 phase approach that the JaxTrainer uses to reserve and schedule the workers on the TPU slice. ray-project#55162 --------- Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Aaron Liang <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]>
…ay-project#56723) This PR adds to the utility library for TPU slice placement group scheduling. We generalize the 2 phase approach that the JaxTrainer uses to reserve and schedule the workers on the TPU slice. ray-project#55162 --------- Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Aaron Liang <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Signed-off-by: Aydin Abiar <[email protected]>
## Description This PR adds an API overview and example usage for the TPU utility library added in this PR: #56723. I added this section to the existing "Using TPUs with KubeRay guide", because the utility library would be primarily used with KubeRay on GKE (the values used for default labels are set on GKE with a mutating webhook). ## Related issues #55162 ## Additional information > Optional: Add implementation details, API changes, usage examples, screenshots, etc. --------- Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Ryan O'Leary <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…ay-project#56723) This PR adds to the utility library for TPU slice placement group scheduling. We generalize the 2 phase approach that the JaxTrainer uses to reserve and schedule the workers on the TPU slice. ray-project#55162 --------- Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: Aaron Liang <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Co-authored-by: Ryan O'Leary <[email protected]> Signed-off-by: Future-Outlier <[email protected]>
Why are these changes needed?
This PR adds to the utility library for TPU slice placement group scheduling. We generalize the 2 phase approach that the JaxTrainer uses to reserve and schedule the workers on the TPU slice.
Related issue number
#55162
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.