Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The zero-copy behaviour is not valid for np.recarray #37573

Closed
dlee992 opened this issue Jul 19, 2023 · 10 comments
Closed

The zero-copy behaviour is not valid for np.recarray #37573

dlee992 opened this issue Jul 19, 2023 · 10 comments
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core triage Needs triage (eg: priority, bug/not-bug, and owning component)

Comments

@dlee992
Copy link

dlee992 commented Jul 19, 2023

What happened + What you expected to happen

The zero-copy behaviour is not valid for np.recarray.

The output is:

INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at  
-----------------------------------np.ndarray-----------------------------------
Result: 499999500000
Original Address: 46976540344576
New Address: 46976540344576
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False

----------------------------------np.recarray-----------------------------------
Result: 0.0
Original Address: 94814134452736
New Address: 94814150452784
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Versions / Dependencies

np=1.23.5, ray=2.3.1

Reproduction script

# %%
import numpy as np
import ray

# Step 2: Import libraries
import numpy as np
import ray

# Step 3: Initialize Ray
ray.init()

print("np.ndarray".center(80, '-'))
# Step 4: Define function to process the array
@ray.remote
def process_array(arr):
    # Perform some operations on the array
    # For example, let's calculate the sum of the elements
    return np.sum(arr)

# Step 5: Create and store array in shared memory using Ray
array_size = 1000000
shared_array_id = ray.put(np.arange(array_size))
shared_array = ray.get(shared_array_id)

# Step 6: Call the function process_array on the shared array
result = ray.get(process_array.remote(shared_array))

# Step 7: Verify the result
print("Result:", result)

# Step 8: Check memory addresses
original_address = shared_array.__array_interface__['data'][0]
new_address = ray.get(shared_array_id).__array_interface__['data'][0]
print("Original Address:", original_address)
print("New Address:", new_address)
print(shared_array.flags)


print("np.recarray".center(80, '-'))
# Step 4: Define function to process the recarray
@ray.remote
def process_recarray(recarray):
    # Perform some operations on the recarray
    # For example, let's calculate the sum of the 'value' field
    return np.sum(recarray['value'])

# Step 5: Create and store recarray in shared memory using Ray
recarray_size = 1000000
recarray_dtype = np.dtype([('id', np.int64), ('value', np.float64)])
shared_recarray_id = ray.put(np.recarray(recarray_size, dtype=recarray_dtype))
shared_recarray = ray.get(shared_recarray_id)

# Step 6: Call the function process_recarray on the shared recarray
result = ray.get(process_recarray.remote(shared_recarray))

# Step 7: Verify the result
print("Result:", result)

# Step 8: Check memory addresses
original_address = shared_recarray.__array_interface__['data'][0]
new_address = ray.get(shared_recarray_id).__array_interface__['data'][0]
print("Original Address:", original_address)
print("New Address:", new_address)
print(shared_recarray.flags)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

@dlee992 dlee992 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 19, 2023
@fyrestone
Copy link
Contributor

fyrestone commented Jul 21, 2023

Ray just build readonly buffer objects referencing to blocks of shared memory when deserialize buffers. Upper level framework decides whether to copy this buffer or not.

e.g. If upper level framework needs a writeable buffer, but it found a readonly one when deserializing, then the framework may copy the buffer or raise an exception.

@dlee992
Copy link
Author

dlee992 commented Jul 21, 2023

Thanks for your reply @fyrestone . I can simplify my test case:

# %%
import numpy as np
import ray
from ray.util import inspect_serializability

print(np.__version__)
print(ray.__version__)

# Step 3: Initialize Ray
ray.init()


print("np.ndarray".center(80, '-'))
# Step 5: Create and store array in shared memory using Ray
array_size = 1000000
old_array = np.arange(array_size)
print(old_array.flags)
shared_array_id = ray.put(old_array)
shared_array = ray.get(shared_array_id)
print(shared_array.flags)

# Step 8: Check memory addresses
original_address = shared_array.__array_interface__['data'][0]
new_address = ray.get(shared_array_id).__array_interface__['data'][0]
print("Original Address:", original_address)
print("New Address:", new_address)


print("np.recarray".center(80, '-'))
# Step 5: Create and store recarray in shared memory using Ray
recarray_size = 1000000
recarray_dtype = np.dtype([('id', np.int64), ('value', np.float64)])
old_recarray = np.recarray(recarray_size, dtype=recarray_dtype)
print(old_recarray.flags)
shared_recarray_id = ray.put(old_recarray)
shared_recarray = ray.get(shared_recarray_id)
print(shared_recarray.flags)

# Step 8: Check memory addresses
original_address = shared_recarray.__array_interface__['data'][0]
new_address = ray.get(shared_recarray_id).__array_interface__['data'][0]
print("Original Address:", original_address)
print("New Address:", new_address)
assert original_address == new_address

Output is:

1.23.5
2.3.1
2023-07-21 05:07:50,768 INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
-----------------------------------np.ndarray-----------------------------------
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  WRITEBACKIFCOPY : False

Original Address: 46946207138048
New Address: 46946207138048
----------------------------------np.recarray-----------------------------------
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False

Original Address: 94073057342448
New Address: 94073073342496
Traceback (most recent call last):
  File "/home/dli/bts/study/ray/test_recarray.py", line 44, in <module>
    assert original_address == new_address
AssertionError

You can see, I just create a recarray, and put & get it using ray, don't do anything related to upper framework, ray already copy this buffer as the output shows.

@dlee992
Copy link
Author

dlee992 commented Jul 21, 2023

So, I think this should be either a bug or a feature request for Ray.

@fyrestone
Copy link
Contributor

In this case, numpy is the upper level framework running on ray.

@dlee992
Copy link
Author

dlee992 commented Jul 21, 2023

You mean it's still a numpy pickle issue, unrelated with Ray?

@fyrestone
Copy link
Contributor

You mean it's still a numpy pickle issue, unrelated with Ray?

I think so. Ray only provides readonly buffers.

@dlee992
Copy link
Author

dlee992 commented Jul 21, 2023

I really don't think so. As you can see from similar issues/PRs:
#30615
#26229
#17186
And in this comment #30615 (comment) and official doc, ray claims it can support np.ndarray for zero-copy.
And issubclass(np.recarray, np.ndarray) is True! In fact, it's a problem about Ray, not numpy issue, I think.
Maybe cc @stephanie-wang @ericl , can you share some ideas on this?

@dlee992
Copy link
Author

dlee992 commented Jul 21, 2023

Oh, sry. You're right @fyrestone Thanks! Will try to forward this issue to numpy community.

@rkooo567
Copy link
Contributor

Seems like this is an external issue?

@rkooo567
Copy link
Contributor

We will close the issue for now. But please reopen it if there's an action item from our end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core triage Needs triage (eg: priority, bug/not-bug, and owning component)
Projects
None yet
Development

No branches or pull requests

4 participants