-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Deserialize torch.Tensors to the correct device #50134
Comments
@stephanie-wang can you explain more about your thinking re: "we cannot eliminate unnecessary copies" (you've thought about this a lot more than me). One direction I was thinking is to use the |
I think we should also introduce a warning when this happens as it's likely unexpected to the user. Something like "serializing GPU tensor foobar, a CPU copy will be made, to avoid this you can do ..." |
Awesome! Thanks for opening this issue @stephanie-wang .
Yes, I think the warning makes sense. When I tried it the first time with just one GPU, I thought the tensor would NOT be copied to CPU in between, b/c it magically came right out from the object store on the correct device. That's why my intuition was that direct GPU-tensor handover from actor to actor (who share the same GPU) was already implemented. |
That's basically the idea behind the proposed GPU support for Ray Core API :) But I want to avoid doing this without the associated API changes because it brings up questions of how we should manage the GPU data on the sending actor:
I think it would be better to have a dumb, kind of slow, but fully reliable approach for the normal Ray Core API, and then we can improve on it with the GPU-native API. Anyway, it is probably a good idea to have both options for the future since the latter may take some time to stabilize. |
Agree with all of the above. I'd propose let's:
|
Description
Ray currently serializes torch.Tensors to the object store then deserializes using torch's default deserialization method. This can result in deserialization to the wrong device. Ideally, on deserialization, we should place the tensor directly on the correct device. Currently we do this in Ray Compiled Graphs but we could also support it for all Ray programs (although we cannot eliminate unnecessary copies).
Some questions to consider:
Example:
Use case
No response
The text was updated successfully, but these errors were encountered: