-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are there asynchronous parallel examples? #896
Comments
@deng-cy Have you taken a look at https://ax.dev/tutorials/raytune_pytorch_cnn.html? |
Thanks for your reply. Yes I did, but I feel it is just a black box without telling how to assign |
Hi @deng-cy thank you for reaching out! Your use case may be a good candidate for our Scheduler API, which has a tutorial here. It involves a little more overhead to set up compared to the Service API, you will define a Runner object to send trials to your external system and Metric objects to retrieve the data once the trial has been run, but it gives you the ability to set maximum parallelism (among other settings) and allow Ax to handle scheduling, new point generation, and trial deployment and polling for you automatically. Personally this is my favorite way of using Ax as it allows the user to "set it and forget it", which I find more than worth the effort in setting the Scheduler up for most experiments. If you need any assistance setting this up or any other questions with your specific use case please feel free to continue to respond in this thread. |
Thanks for your reply! I think Scheduler should work. But I felt it was too complicated since I need to write the whole class even if I only need to customize a little bit. I solved the issue by multiprocessing in Python with Service API.
|
@deng-cy nice job! I'll keep this in mind. I take it you meant Service API (not Server API), correct? Thanks for including a code snippet. |
@sgbaird Yeah Service API, I corrected it. |
This is great and I'm trying to use the code but have two questions:
I tried using Ray but apparently it is not set up yet for multiobjective problems, am I right? |
@bernardo-suez I believe you are correct.
This also doesn't seem to be on Ray's roadmap.
In other words, if you want to do asynchronous multi-objective optimization in Ax, adapting
I think this will be specific to whatever server you're using. For example, sending a job to Amazon AWS vs. Google Cloud vs. an HPC university cluster will each take different forms. In other words, it's specific to your "external client". If your external client doesn't have a way of communicating with the machine running the optimization algorithm (which could be the same machine, btw), then this is something you'll need to implement yourself. For example, you could communicate via a Google sheets page, an external database (SQL, MongoDB, etc.), or an MQTT server. |
Looping back to this, I played around this a while back. Here's a simple example of using "plain" ray (i.e., not raytune), to do work in parallel. This one takes the square of a number.
import ray
# Start Ray. This creates some processes that can do work in parallel.
ray.init(num_cpus=2)
# Add this line to signify that the function can be run in parallel (as a
# "task"). Ray will load-balance different `square` tasks automatically.
@ray.remote
def square(x):
return x * x
# Create some parallel work using a list comprehension, then block until the
# results are ready with `ray.get`.
results = ray.get([square.remote(x) for x in range(100)])
ray.shutdown() Applying this to Ax Service API batch optimization:
import ray
from ax.service.ax_client import AxClient
from ax.utils.measurement.synthetic_functions import branin
batch_size = 2
num_trials = 11
ax_client = AxClient()
ax_client.create_experiment(
parameters=[
{"name": "x1", "type": "range", "bounds": [-5.0, 10.0]},
{"name": "x2", "type": "range", "bounds": [0.0, 15.0]},
],
objective_name="branin",
minimize=True,
# Sets max parallelism to 10 for all steps of the generation strategy.
choose_generation_strategy_kwargs={
"num_trials": num_trials,
"max_parallelism_override": batch_size,
"enforce_sequential_optimization": False,
},
)
@ray.remote
def evaluate(parameters):
return {"branin": branin(parameters["x1"], parameters["x2"])}
n = 0
while n < num_trials:
curr_batch_size = batch_size if n + batch_size < num_trials else num_trials - n
trial_mapping, optimization_complete = ax_client.get_next_trials(curr_batch_size)
n = n + curr_batch_size
# start running trials in a queue (new trials will start as resources are freed)
futures = [evaluate.remote(parameters) for parameters in trial_mapping.values()]
# wait for all trials in the batch to complete before continuing (i.e. blocking)
results = ray.get(futures)
# report the completion of trials to the Ax client
for trial_index, raw_data in zip(trial_mapping.keys(), results):
ax_client.complete_trial(trial_index=trial_index, raw_data=raw_data)
ray.shutdown() Copied from scripts/ray_get_reproducer.py and ax_batch_reproducer.py Perhaps the logic can be adjusted to handle asynchronous cases. |
I have a function that needs to be evaluate remotely
For instance, I have 5 servers and want to optimize the metric in parallel. Currently the only way I could think of is to run the following :
This is actually synchronous, which is less efficient. Is there a way to construct an asynchronous parallel evaluation?
The text was updated successfully, but these errors were encountered: