Commit 6e1f067
[train][jax_trainer] add
## Description
1. This PR added the `jax.distributed.shutdown()` for JaxBackend in
order to free up any leaked resources on TPU RayTrainWorkers.
2. if `jax.distributed` is not on, it is a noop:
https://docs.jax.dev/en/latest/_autosummary/jax.distributed.shutdown.html
3. Tested on Anyscale workspace.
<img width="1264" height="62" alt="image"
src="https://github.com/user-attachments/assets/f28102ff-f6d1-4da0-b41a-6cc785603e72"
/>
Signed-off-by: elliot-barn <[email protected]>jax.distributed.shutdown() for JaxBackend (#57802)1 parent 44dff2b commit 6e1f067
File tree
3 files changed
+64
-0
lines changed- python/ray/train
- tests
- v2/jax
3 files changed
+64
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
135 | 139 | | |
136 | 140 | | |
137 | 141 | | |
| |||
144 | 148 | | |
145 | 149 | | |
146 | 150 | | |
| 151 | + | |
147 | 152 | | |
148 | 153 | | |
149 | 154 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| 31 | + | |
31 | 32 | | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| |||
631 | 633 | | |
632 | 634 | | |
633 | 635 | | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
634 | 651 | | |
635 | 652 | | |
636 | 653 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
9 | 14 | | |
10 | 15 | | |
11 | 16 | | |
| |||
36 | 41 | | |
37 | 42 | | |
38 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
39 | 58 | | |
40 | 59 | | |
41 | 60 | | |
| |||
57 | 76 | | |
58 | 77 | | |
59 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
0 commit comments