-
Notifications
You must be signed in to change notification settings - Fork 7k
[Collective][PR 2/6] Driver program declarative interfaces #12874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
18ea0e2
bf1051c
3c5628a
0714c4a
20df179
5267df1
8ff63ad
88fbea1
1e66354
c41f046
912bd0f
5db388f
bd91da9
d971237
3f2f86b
135b9ec
03e49e7
ec02002
5588322
49e59a3
be40e84
0133c6a
cbeaafe
893142d
ec1c07a
8f15ba4
5b40ec3
c76a645
793830c
f8587df
d7e4aee
cd62a50
bdb90de
63973ec
e027891
4136fa9
ac603ad
a8f6898
a3aafba
af15ca5
9abf10f
6aad76d
82170bf
93567cf
d521fcb
5a71f2c
38daf7b
c5c414a
b0ab663
322c822
7c5f414
9018ccd
f50758d
1b3ba3b
918c905
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| import cupy as cp | ||
| import ray | ||
|
|
||
| import ray.util.collective as collective | ||
|
|
||
|
|
||
| @ray.remote(num_gpus=1) | ||
| class Worker: | ||
| def __init__(self): | ||
| self.send = cp.ones((4, ), dtype=cp.float32) | ||
|
|
||
| def compute(self): | ||
| collective.allreduce(self.send, "177") | ||
| return self.send | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| ray.init(num_gpus=2) | ||
|
|
||
| num_workers = 2 | ||
| workers = [] | ||
| for i in range(num_workers): | ||
| w = Worker.remote() | ||
| workers.append(w) | ||
| _options = { | ||
| "group_name": "177", | ||
| "world_size": 2, | ||
| "ranks": [0, 1], | ||
| "backend": "nccl" | ||
| } | ||
| collective.declare_collective_group(workers, **_options) | ||
| results = ray.get([w.compute.remote() for w in workers]) | ||
| print(results) | ||
| ray.shutdown() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,3 +40,28 @@ def get_id(self): | |
| logger.warning("The NCCL ID has not been " | ||
| "set yet for store {}.".format(self.name)) | ||
| return self.nccl_id | ||
|
|
||
|
|
||
| @ray.remote | ||
| class Info: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you please rename? Also, this is just a makeshift kvstore right? Can we maybe use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zhisbug let's chat offline about possible alternatives here? this may seem harmless but I think it could easily be a source of issues later on.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's talk tomorrow. If the |
||
| """Store the group information created via `declare_collective_group`. | ||
|
|
||
| Note: Should be used as a NamedActor. | ||
| """ | ||
|
|
||
| def __init__(self): | ||
| self.ids = None | ||
| self.world_size = -1 | ||
| self.rank = -1 | ||
| self.backend = None | ||
|
|
||
| def set_info(self, ids, world_size, rank, backend): | ||
| """Store collective information.""" | ||
| self.ids = ids | ||
| self.world_size = world_size | ||
| self.rank = rank | ||
| self.backend = backend | ||
|
|
||
| def get_info(self): | ||
| """Get previously stored collective information.""" | ||
| return self.ids, self.world_size, self.rank, self.backend | ||
Uh oh!
There was an error while loading. Please reload this page.