Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Gets timeout on randomly generated ObjectIDs #7074

Open
1 of 2 tasks
stephanie-wang opened this issue Feb 6, 2020 · 3 comments
Open
1 of 2 tasks

[core] Gets timeout on randomly generated ObjectIDs #7074

stephanie-wang opened this issue Feb 6, 2020 · 3 comments
Labels
bug Something that is supposed to be working; but isn't P3 Issue moderate in impact or severity

Comments

@stephanie-wang
Copy link
Contributor

What is the problem?

Ray version and other system information (Python version, TensorFlow version, OS): 0.8

Normally when retrieving a plasma object, the worker waits until it is sure that the object has been created by contacting the owner and checking whether the task that created the object is still pending. Then, it tries to fetch the object, and if the object still is not available after some timeout, then the object is assumed to have been lost.

For randomly generated object IDs, we do not know what task will create it, so anyone who tries to call ray.get() on the objectID will timeout.

Reproduction (REQUIRED)

import ray
import time

ray.init()

@ray.remote
def fulfill(id):
    time.sleep(11)  # 1s longer than the initial_reconstruction_timeout
    id = id[0]
    ray.worker.global_worker.put_object(None, object_id=id)

random_id = ray.ObjectID.from_random()
fulfill.remote(id)
ray.get(random_id)

Results in:

2020-02-06 09:28:04,280 WARNING worker.py:1511 -- Local object store memory usage:
num clients with quota: 0
quota map size: 0
pinned quota map size: 0
allocated bytes: 1
allocation limit: 2655929548
pinned bytes: 77
(global lru) capacity: 2655929548
(global lru) used: 3.76516e-08%
(global lru) num objects: 1
(global lru) num evictions: 0
(global lru) bytes evicted: 0

Traceback (most recent call last):
  File "test.py", line 14, in <module>
    ray.get(random_id)
  File "/home/swang/ray/python/ray/worker.py", line 1515, in get
    raise value
ray.exceptions.UnreconstructableError: Object 6e991f1c2b7354c8729f5ea57639000000000000 is lost (either LRU evicted or deleted by user) and cannot be reconstructed. Try increasing the object store memory available with ray.init(object_store_memory=<bytes>) or setting object store limits with ray.remote(object_store_memory=<bytes>). See also: https://ray.readthedocs.io/en/latest/memory-management.html
  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@stephanie-wang stephanie-wang added the bug Something that is supposed to be working; but isn't label Feb 6, 2020
@stephanie-wang
Copy link
Contributor Author

cc @ericl @edoakes

@ericl
Copy link
Contributor

ericl commented Feb 6, 2020

I don't think we want to support this feature, could we remove it instead?

@stephanie-wang
Copy link
Contributor Author

Remove the from_random() API? Yeah, I guess so, but it is useful for testing.

@ericl ericl added the P3 Issue moderate in impact or severity label Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P3 Issue moderate in impact or severity
Projects
None yet
Development

No branches or pull requests

2 participants