You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When looping over rows in a shared 2D array I am seeing a consistent memory buildup in my ray workers if I do any work with the array. I attempted to recreate this in a loop on the main process but I do not see the problem occuring.
I assumed this was a numpy bug similar to this Issue below and built my test both in the main process and a ray worker, but I can't seem to work on the shared object in my worker without causing the leak. numpy/numpy#15746
Is this expected behavior when touching a read-only item in a worker's loop, or should it be acting nice such as in the main process example(illustrated in pic below)?
I imagine over the loop the entire contents of the read-only object are being loaded into local memory for manipulation, perhaps what I'm asking is: is there a way around this or do I need to work with a local copy of very big objects or back to loading data batches from file?
Ray version and other system information (Python version, TensorFlow version, OS):
Ubuntu 1404
Python 3.7.6 (Anaconda environment active)
[GCC 7.3.0] :: Anaconda, Inc. on linux
numpy # 1.18.1
ray # 0.8.2
Reproduction
I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the [latest wheels]
I ran my script with the following to obtain memory profile plot:
mprof run --include-children --multiprocess test_ray_numpy_read_only_leak_in_loop.py
mprof plot
# Similar to issue 15746? Unable to get around the leak though with copy so prob ndarray+pyarrow/objectstore related?
# https://github.com/numpy/numpy/issues/15746
# Wed Mar 18 2020
import numpy as np # 1.18.1
import ray # 0.8.2
ray.init(num_cpus=1)
# fake data
features = np.random.random((1000000,100)).astype(np.float32)
# shared memory store
feature_store_id = ray.put(features)
del features
# We may have many workers looping over the read only shared array, with intention of reducing memory
def test_read_only(feats):
# ensure read only
try:
feats += 1.
except:
print("Features are properly read-only from ray shared memory store.")
###################################################
# I do not believe there is any problem here according to first half of mprof plot
def event_loop_local():
# I could also use global features here
# (bypass ray and skip deleting features above) and I would have no mem leak.
shared_features = ray.get(feature_store_id)
test_read_only(shared_features)
for i in range(1000000):
# Does this mem leak?: NO (see 'mprof plot')
features_i = shared_features[i]
features_i = [float(v) for v in features_i]
# test as func on main process
print("Testing event_loop_local")
event_loop_local()
###################################################
###################################################
# Now test in ray as a worker
@ray.remote
def event_loop_remote(shared_features):
# tested with get and pass via .remote()
#shared_features = ray.get(feature_store_id)
test_read_only(shared_features)
for i in range(1000000):
# Does this leak?: YES, lets try the copy fix from np issue 15746
features_i = shared_features[i]
# Does this leak?: YES (see 'mprof plot')
#features_i = shared_features[i].copy()
# Does this leak?: YES
#features_i = np.array(shared_features[i], copy=True, dtype=object)#.copy()
# Does this leak?: YES
#features_i = shared_features[i:i+1]#.copy()
# All of the above are unable to prevent any work done here from leaking
features_i = [float(v) for v in features_i[0]]
# Does this leak?: YES
#features_i = shared_features[i].copy()
#features_i += 1.
# test inside a ray worker
print("Testing event_loop_remote.remote")
ray.get(event_loop_remote.remote(feature_store_id))
###################################################
ray.shutdown()
Expected Output:
Testing event_loop_local
Features are properly read-only from ray shared memory store.
Testing event_loop_remote.remote
(pid=13191) Features are properly read-only from ray shared memory store.
The text was updated successfully, but these errors were encountered:
What is the problem?
When looping over rows in a shared 2D array I am seeing a consistent memory buildup in my ray workers if I do any work with the array. I attempted to recreate this in a loop on the main process but I do not see the problem occuring.
I assumed this was a numpy bug similar to this Issue below and built my test both in the main process and a ray worker, but I can't seem to work on the shared object in my worker without causing the leak.
numpy/numpy#15746
Is this expected behavior when touching a read-only item in a worker's loop, or should it be acting nice such as in the main process example(illustrated in pic below)?
I imagine over the loop the entire contents of the read-only object are being loaded into local memory for manipulation, perhaps what I'm asking is: is there a way around this or do I need to work with a local copy of very big objects or back to loading data batches from file?
Ray version and other system information (Python version, TensorFlow version, OS):
Ubuntu 1404
Python 3.7.6 (Anaconda environment active)
[GCC 7.3.0] :: Anaconda, Inc. on linux
numpy # 1.18.1
ray # 0.8.2
Reproduction
I ran my script with the following to obtain memory profile plot:
Expected Output:
The text was updated successfully, but these errors were encountered: