Support multiple availability zones in AWS (fix #2177)#2254
Support multiple availability zones in AWS (fix #2177)#2254ericl merged 7 commits intoray-project:masterfrom
Conversation
|
Test FAILed. |
ericl
left a comment
There was a problem hiding this comment.
Looks good. Could you add an example usage and a comment to the aws-full example yaml?
|
|
||
| subnet_ids = [s.subnet_id for s in subnets] | ||
| subnet_descr = [(s.subnet_id, s.availability_zone) for s in subnets] | ||
| if "SubnetIds" not in config["head_node"]: |
There was a problem hiding this comment.
Does it make sense to keep the map_public_ip_on_launch filter, at least for the head node? I know we don't require it for worker nodes any more.
There was a problem hiding this comment.
This check is already performed on line 149; the later checks I believe are redundant so I removed them.
| "Key": k, | ||
| "Value": v, | ||
| }) | ||
| subnet_ids = conf.pop('SubnetIds') |
| "Key": k, | ||
| "Value": v, | ||
| }) | ||
| subnet_ids = conf.pop('SubnetIds') |
There was a problem hiding this comment.
comment here that SubnetIds is not a real config key, and we need to resolve it before handing it to AWS
There was a problem hiding this comment.
agreed, comment added
|
Test PASSed. |
|
Test PASSed. |
* 'master' of https://github.com/ray-project/ray: (157 commits) Fix build failure while using make -j1. Issue 2257 (ray-project#2279) Cast locator with index type (ray-project#2274) fixing zero length partitions (ray-project#2237) Make actor handles work in Python mode. (ray-project#2283) [xray] Add error table and push error messages to driver through node manager. (ray-project#2256) addressing comments (ray-project#2210) Re-enable some actor tests. (ray-project#2276) Experimental: enable automatic GCS flushing with configurable policy. (ray-project#2266) [xray] Sets good object manager defaults. (ray-project#2255) [tune] Update Trainable doc to expose interface (ray-project#2272) [rllib] Add a simple REST policy server and client example (ray-project#2232) [asv] Pushing to s3 (ray-project#2246) [rllib] Remove need to pass around registry (ray-project#2250) Support multiple availability zones in AWS (fix ray-project#2177) (ray-project#2254) [rllib] Add squash_to_range model option (ray-project#2239) Mitigate randomly building failure: adding gen_local_scheduler_fbs to raylet lib. (ray-project#2271) [rllib] Refactor Multi-GPU for PPO (ray-project#1646) [rllib] Envs for vectorized execution, async execution, and policy serving (ray-project#2170) [Dataframe] Change pandas and ray.dataframe imports (ray-project#1942) [Java] Replace binary rewrite with Remote Lambda Cache (SerdeLambda) (ray-project#2245) ...
What do these changes do?
When launching a node, picks a subnet round-robin from the configured subnets (previously: only one subnet allowed). The subnets default to all those matching a specified availability zones (previously: a single subnet matching the single availability zone).
It might be cleaner to push some of this logic up into the root autoscaler, so other node providers can benefit. This looks to be a more substantial change and isn't needed for my use case so I haven't done it, but I'd welcome anyone else who wants to give it a stab :) (Note I expect it to be a little tricky to get right as the semantics of AZs can vary between providers.)
Related issue number
Fixes #2177.