Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][Ray Core] Support zero-copy Pytorch tensor in Ray #26229

Open
jiaodong opened this issue Jun 30, 2022 · 6 comments
Open

[RFC][Ray Core] Support zero-copy Pytorch tensor in Ray #26229

jiaodong opened this issue Jun 30, 2022 · 6 comments
Labels
core Issues that should be addressed in Ray Core core-object-store enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks RFC RFC issues

Comments

@jiaodong
Copy link
Member

jiaodong commented Jun 30, 2022

Description

Previous work by @suquark that is reverted: #12344

Currently Ray support zero-copy for numpy arrays, but not Pytorch Tensor. This seems to be a feature asked by multiple folks we interacted with from Pytorch team in TorchX and PyTorch Geometric, etc. Similarly we encounter needs within Ray libraries down the road such as AIR (training data ingest) and Serve (ModelMesh, cc: @sihanwang41 ).

From our chat with Yaroslav:

I’d love to have a fast interoperability between Ray/PyTorch. Making training efficient on cheap/pre-emptible instances needs some experimentation, and Ray already has the right abstractions for it. (imagine implementing sync PS with backup workers — easy with Ray actors, hard with RPC interface). In ideal world, one would be able to use Ray to quickly 

1) send PyTorch tensor to another machine
2) receive PyTorch Tensor 
3) (is it possible) receive it in “pinned” memory, so that CPU-GPU transfer could happen without CPU involvement (ie, what DataLoader does)

As I chatted with @suquark , previous implementation is less ideal as we need to import torch, register a custom serializer and suppress warnings from pytorch since when we deserialize pytorch tensor as immutable objects, pytorch would raise a warning say torch tensor cannot be readonly

Our suggestion approach is to enable "numpy read-only" for Pytorch tensors and make Ray-Pytorch fast interoperability cleaner, related issues in Pytorch github are:

pytorch/pytorch#32868
pytorch/pytorch#44027

A bigger related RFC in pytorch is TensorStore: pytorch/pytorch#64932

cc: @yaroslavvb @msaroufim

Use case

AIR training data ingest
Ray Serve

@jiaodong jiaodong added enhancement Request for new feature and/or capability RFC RFC issues labels Jun 30, 2022
@jiaodong jiaodong self-assigned this Jun 30, 2022
@stale
Copy link

stale bot commented Oct 29, 2022

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 29, 2022
@jiaodong jiaodong removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Oct 30, 2022
@AndyBug0
Copy link

It'll benefit us a lot if ray support this issue.

@nickchomey
Copy link

I'd also like this very much

@HuangLED
Copy link
Contributor

+1 to supporting this.

@idthanm
Copy link
Contributor

idthanm commented Jul 17, 2023

Any progress on this feature?

@alialamiidrissi
Copy link

alialamiidrissi commented Jan 26, 2024

This issue seems to be quite old but still very relevant. I found a workaround and I thought it might help others.

array_np = np.zeros((23,40))
obj_ref = ray.put(array_np)
array_np = ray.get(obj_ref)
array_torch = torch.from_numpy(array_np)

Explanation
Ray already supports zero-copy reads for numpy arrays. Also according to pytorch documentation, when using torch.from_numpy, the newly created tensor shares the same memory with the numpy tensor. This means that it will use the same buffer already allocated by the object store instead of creating a new one.
Notes

  • When running this code, Pytorch will complain about the fact that the numpy array is read only but it seems to still work fine. We can make the numpy array writable using this flag array_np.flags.writeable = True. However, this will break the immutability assumption about the objects in the Ray object store
  • One can test that the pytorch tensors are using the zero copy memory by using the following code
array_np, array_np_2= ray.get([obj_ref]*2)
array_torch, array_torch_2 = torch.from_numpy(array_np), torch.from_numpy(array_np_2)
array_torch.data_ptr() == array_torch_2.data_ptr() 
  • There is an old merged PR that seems to be solving the same problem but it seems that the code introduced there was overwritten. I manually ran the unit test added in this PR and it failed

@jjyao jjyao added the core Issues that should be addressed in Ray Core label Mar 13, 2024
@jjyao jjyao added P1 Issue that should be fixed within a few weeks core-object-store labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core core-object-store enhancement Request for new feature and/or capability P1 Issue that should be fixed within a few weeks RFC RFC issues
Projects
None yet
Development

No branches or pull requests

7 participants