Skip to content

Conversation

@robertnishihara
Copy link
Collaborator

This replaces #3629.

This makes the following changes:

  • object_store_memory -> object_store_memory_bytes
  • redis_max_memory -> redis_max_memory_bytes
  • timeout -> timeout_milliseconds

Addresses #3411.

…y -> object_store_memory_bytes, timeout -> timeout_seconds.
@robertnishihara
Copy link
Collaborator Author

Following up on comments from #3629.

@ericl I agree megabytes are more intuitive here, but I felt that object_store_memory_mb and redis_max_memory_mb would be too ambiguous and people would still need to look at the documentation to check, whereas bytes are very clear. On the other hand writing out megabytes would be acceptable but starts to get long. On the other hand, for the timeout, seconds are probably the right unit, and timeout_seconds is perfectly clear.

@atumanov Consistency with SI unit prefixes is an interesting idea, I guess that means using Mb instead of mb. However, I can't quite bring myself to name a variable object_store_memory_Mb, and unfortunately it still seems less clear than bytes.

@ericl
Copy link
Contributor

ericl commented Dec 29, 2018

@robertnishihara I believe memory_mb is common terminology, imo the gains of removing six zeros is worth it even if it wasn't...

@ericl
Copy link
Contributor

ericl commented Dec 29, 2018

@atumanov
Copy link
Contributor

@robertnishihara , actually, it would be object_store_memory_MB for megabytes. Mb would be megabits. And MiB if we want to go with million bytes. But I agree that it's awkward, given our naming conventions. One possibility is to allow specifying human readable values like 10M, 1G. It's fairly standard to enable that for command line tools. I think there might be a readily available python library.

@atumanov
Copy link
Contributor

something like this :
https://pypi.org/project/humanfriendly/

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10502/
Test FAILed.

INFINITE_RECONSTRUCTION = 2**30

# Max bytes to allocate to plasma unless overriden by the user
DEFAULT_MAX_MEMORY_MB = 20 * 1000
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to this file from services.py

@robertnishihara
Copy link
Collaborator Author

Thanks @ericl @atumanov Let's give _mb a try. I agree it is more intuitive. Hopefully people understand the abbreviation..

I really dislike the "humanfriendly" approach. It's more difficult to work with programmatically and it's so confusing that I always have to check the documentation or other examples every time I use it (e.g., for docker, is it --memory=1G or --memory=1g or --memory=1gb or any number of other possibilities).

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10506/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10505/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10507/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10508/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10509/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10510/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10511/
Test FAILed.

@atumanov
Copy link
Contributor

  1. to clarify my proposal was to try object_memory_size filtered through humanfriendly (or equiv) only at the I/O boundary with the user. It would then get immediately translated and all internal APIs could then use integral _bytes.
  2. Note that with my proposal you can still specify the number of bytes, and it's supported as a special case. This is strictly more expressive.
  3. The interface supported by humanfriendly is a Unix standard.
  4. I still think that "mb" is confusing. Technically it means "millibits". Practically, it will leave some ambiguity as to whether it's megabytes or megabits. Only contextually it will be clear that we're talking about megabytes (because it's memory). But if this same approach is used, say, for network bandwidth, then the confusion will be very real. Your call.

# Compare the requested memory size to the memory available in
# /dev/shm.
if shm_avail > object_store_memory:
if shm_avail > object_store_memory_mb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you comparing bytes to megabytes here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! that was a bug!

# Print the object store memory using two decimal places.
object_store_memory_str = (object_store_memory / 10**7) / 10**2
logger.info("Starting the Plasma object store with {} GB memory "
object_store_memory_str = (object_store_memory_mb // 10) / 10**2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great place where a library could "pretty-print" the object store memory for you into a human readable format.

raise DeprecationWarning("The use_raylet argument is deprecated. "
"Please remove it.")

if object_store_memory is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object_store_memory_mb ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. It is used to print a deprecation warning.

"deprecated. Please use 'object_store_memory_mb'.")
object_store_memory_mb = object_store_memory / 10**6

if redis_max_memory is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redis_max_memory_mb ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional. It is used to print a deprecation warning.

def wait(object_ids,
num_returns=1,
timeout_seconds=None,
timeout=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're trying to make it backward compatible. However, what I'm worried about is existing applications using positional arguments instead of keyword arguments. This change will break them, unfortunately, due to the mismatch of time units between timeout and timeout_seconds! I propose explicitly deprecating the previous interface. See below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could consider using @deprecated on the old interface, and make it call the new interface. What I'm worried about here is that the current approach is changing the API twice, once to add timeout_seconds and again to remove timeout. This is a noop for keyword argument users, but is a breaking change for positional argument users. Why not:

@deprecated(version='0.6.2', reason='timeout deprecated in favor of timeout_seconds')
def ray.wait(object_ids, num_returns=1, timeout=None, worker=global_worker):
    return ray.wait(object_ids = object_ids, 
        num_returns = num_returns, 
        timeout_seconds = timeout, 
        global_worker = global_worker)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully that won't happen much, since using positional arguments for keyword arguments is a bad practice.

That said, I'm not sure I understand your suggestion as Python functions can't be overloaded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I am ok with immediately invalidating the old API instead of trying to do it gracefully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I meant something like ray.future.wait(object_ids, num_returns, timeout_seconds, global_worker).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. I prefer to update the ray.wait API right away.

<< "by the redis LRU configuration. Consider increasing the memory "
"allocation via "
<< "ray.init(redis_max_memory=<max_memory_bytes>).";
<< "ray.init(redis_max_memory_mb=<max_memory_bytes>).";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: =<max_memory_megabytes>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed

for _ in range(num_objects):
obj = a.create_object.remote(object_store_memory // num_objects)
obj = a.create_object.remote(
10**6 * object_store_memory_mb // num_objects)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: This is correct, but I would put parens around (10**6 * object_store_memory_mb) to ensure/express correct/desired order of operations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@robertnishihara
Copy link
Collaborator Author

Thanks @atumanov, I understand your proposal, however I still think the "human-readable" approach is too difficult to use.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10519/
Test FAILed.

@robertnishihara
Copy link
Collaborator Author

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10520/
Test PASSed.

@pcmoritz
Copy link
Contributor

pcmoritz commented Jan 1, 2019

Timeouts are in seconds everywhere in python, and the argument is called timeout, so we should adhere to this standard.

For the transition I suggest the following: Till 1.0, give a deprecation warning if timeout is given as an int; if it is given as a float, it is just interpreted as in seconds. Fortunately not too many people are using ray.wait with a timeout at the moment I think (it was a mistake to make it milliseconds initially).

Copy link
Contributor

@atumanov atumanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, esp. combined with #3706. The net effect of #3666 + #3706 will be unmodified ray.wait API, but with a breaking semantics change for the timeout parameter (seconds instead of milliseconds). The latter is done for consistency with Python unit conventions for timeouts.



def wait(object_ids, num_returns=1, timeout=None, worker=None):
def wait(object_ids, num_returns=1, timeout_seconds=None, worker=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've agreed offline that this will go back to timeout in #3706 . Note that this will preserve the API, but will change the semantics (units) for the timeout parameter in the 0.6.2 release.

if object_store_memory > MAX_DEFAULT_MEM:
if object_store_memory_mb > ray.ray_constants.DEFAULT_MAX_MEMORY_MB:
logger.warning(
"Warning: Capping object memory store to {}GB. ".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "capping object memory store" --> "capping object store memory"


# Do some sanity checks.
if object_store_memory > system_memory:
if object_store_memory_mb > system_memory_mb:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would make this sanity check immediately after if plasma_directory is None
line: https://github.com/ray-project/ray/pull/3666/files#diff-54ac27010a06993004bec4677c7e583eR1048

cleanup=True,
resources={"CPU": 1},
object_store_memory=100 * (2**20) # 100 MB
object_store_memory_mb=100 # 100 MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to document somewhere that we interpret MB as 10e6, not 2^20

@atumanov atumanov added the api label Jan 9, 2019
@robertnishihara
Copy link
Collaborator Author

Closing for now due to #3706.

@robertnishihara robertnishihara deleted the apiunits2 branch January 12, 2019 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants