Skip to content

Conversation

@guoyuhong
Copy link
Contributor

What do these changes do?

Current ray parameters appears in multiple places. Therefore, many same parameters are passed through many functions. If we need to add a single parameter, we need to add this parameter to all the functions and document this parameter every time. It's verbose and error-prone.

In this PR, all the parameters are stored in one class. The developer can add one parameter once and use it in the ray function call stack deeply.

Related issue number

N/A

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10092/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10093/
Test FAILed.

@ericl
Copy link
Contributor

ericl commented Dec 18, 2018

This is great. We can merge this as is, but should it actually be a proto so it can be used in c++ as well?

@guoyuhong
Copy link
Contributor Author

@ericl I don't really understand how do we use it c++.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10134/
Test FAILed.

Copy link
Contributor

@raulchen raulchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I left a few small comments. Will take a closer take later.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10171/
Test FAILed.

temp_dir=None,
include_log_monitor=None,
_internal_config=None,
autoscaling_config=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_internal_config should come last?

if (hasattr(self, arg)):
setattr(self, arg, kwargs[arg])
else:
raise Exception("Invalid RayParams parameter in"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise ValueError?

temp_dir=temp_dir,
_internal_config=internal_config)
.format(ray_params.node_ip_address))
ray_params.redis_port = redis_port
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be using the apply_config method here?

following parameters could be checked: redis_address,
node_ip_address, worker_path, resources, object_manager_ports,
node_manager_ports, redis_password
index (int): the index used in resources, object_manager_ports,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the index of specifically? does it indicate Raylet index? Would be ideal to have this documented.

worker_path (str): The path of the source code that will be run by the
worker.
ray_params (ray.params.RayParams): The RayParams instance. The
following parameters could be checked: address_info,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I feel like these docs aren't particularly helpful. If I wanted to know which attributes were going to be used, I can just check the code.

One thing that would be good is to note specifically how the params instance is updated (which default values are being set, etc).

config = json.loads(
ray_params._internal_config) if ray_params._internal_config else None

if ray_params.include_log_monitor is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pattern appears pretty often; would it makes sense to have some method that batches this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add a function called apply_when_none.


# Start any raylets that do not exist yet.
for i in range(len(raylet_socket_names), num_local_schedulers):
for i in range(len(raylet_socket_names), ray_params.num_local_schedulers):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use a better variable here? like worker index or something

"object_store_memory": 100 * (2**20) # 100 MB
}
node_kwargs.update(override_kwargs)
ray_params = RayParams()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just (

ray_params = RayParams(
    node_ip_address=services.get_node_ip_address(),
    **node_kwargs)

@guoyuhong
Copy link
Contributor Author

@ericl Thanks for the approve. I am working on other 2 PRs. I will come back and fix the comments soon.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10397/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10396/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10440/
Test FAILed.

@guoyuhong
Copy link
Contributor Author

For mnist_example.py failure, it is not caused by this PR. I have submitted another PR: #3648 . However, there is still failure in multi_node_docker_test.py. I will take a look later.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10443/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10454/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10455/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10451/
Test FAILed.

@guoyuhong guoyuhong force-pushed the rayPythonParam branch 2 times, most recently from 7bb3870 to 12f444f Compare December 27, 2018 15:08
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10457/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10458/
Test FAILed.

@guoyuhong guoyuhong changed the title Add RayParams to refactor the parameters used by ray python. [WIP] Add RayParams to refactor the parameters used by ray python. Dec 27, 2018
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10459/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10460/
Test PASSed.

@guoyuhong guoyuhong changed the title [WIP] Add RayParams to refactor the parameters used by ray python. Add RayParams to refactor the parameters used by ray python. Dec 28, 2018
@guoyuhong
Copy link
Contributor Author

This Jenkins test passed, so I reverted the temporary Jenkins fix and kept the related commit.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10466/
Test FAILed.

def update_if_absent(self, **kwargs):
"""Update the settings when the target fields are None.

Attributes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Args:

def update(self, **kwargs):
"""Update the settings according to the keyword arguments.

Attributes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Args:

@robertnishihara
Copy link
Collaborator

@guoyuhong this looks great, thanks for this PR!

@robertnishihara
Copy link
Collaborator

Looking over these changes made me think:

  1. We should really deprecate this ray.worker._init stuff and instead only use ray.test.cluster_utils to simulate multiple nodes on a single machine.
  2. Instead of having ray.services.start_ray_head and ray.services.start_ray_node, we should just have start_ray_node and initialize_cluster (not sure if that's the right term), where initialize_cluster just starts all of the stuff that only gets created once per cluster (e.g., redis shards, monitor.py, monitor.cc, the web UI). There's no real reason for these processes to all be tied to a "head" node. They could live on different machines or multiple machines.

In subsequent PRs..

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/10493/
Test PASSed.

@raulchen raulchen merged commit c9b8ecc into ray-project:master Dec 29, 2018
@guoyuhong guoyuhong deleted the rayPythonParam branch January 25, 2019 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants