[autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml #14551

xcharleslin · 2021-03-09T01:40:03Z

What is the problem?

Cluster config docs say that you can put any valid AWS field in NodeConfig. But, putting SubnetId is silently ignored.

This issue has already been raised before; see #12530 . This new issue is not to find a workaround, but to fix the documentation/implementation mismatch.

My suggested fix (only to start discussion, not volunteering 😅 ) is the following:

Move any fields that are Ray-specific concepts out of the NodeConfig dict and into the Provider dict. (This is already true today for most fields; I believe SubnetIds is the sole violation of this.)
No hidden behaviour or silent failures in the NodeConfig dict. Either pass through all AWS fields to the underlying create_instance call, or explicitly raise an error for the ones we don't want the user passing in.

Reproduction (REQUIRED)

(From #12530.)

# ~/Desktop/ray_cluster.yaml
cluster_name: default

min_workers: 0
max_workers: 2

docker:
    image: "rayproject/ray:latest-gpu"
    container_name: "ray_container"
    pull_before_run: True
    run_options: []

provider:
    type: aws
    region: eu-west-1
    availability_zone: eu-west-1a,eu-west-1b
    cache_stopped_nodes: False # If not present, the default is True.

auth:
    ssh_user: ubuntu

head_node:
    # VpcId:
    SubnetId: subnet-***********qwe
    # CidrBlock: 10.19.0.0/16
    InstanceType: m5a.large
    ImageId: ami-006ff58f5247c50eb # Deep Learning AMI (Ubuntu) Version 30

    BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
              VolumeSize: 100


worker_nodes:
    SubnetId: subnet-***********qwe
    InstanceType: m5a.large
    ImageId: ami-006ff58f5247c50eb # Deep Learning AMI (Ubuntu) Version 30

ray up -y ~/Desktop/ray_cluster.yaml

I have verified my script runs in a clean environment and reproduces the issue.
I have verified the issue also occurs with the latest wheels.

The text was updated successfully, but these errors were encountered:

xcharleslin · 2021-03-09T01:41:32Z

cc @wuisawesome , @richardliaw

wuisawesome · 2021-03-09T01:48:56Z

"support all AWS fields" is vague.

provider seems like a logical place to put the SubnetIds (plural) field, but for the sake of backwards compatibility, we should consider leaving it where it is, and just not clobber a user provided SubnetId

cc @AmeerHajAli

xcharleslin · 2021-03-09T01:52:58Z

"support all AWS fields" is vague.

@wuisawesome good point, rephrased for precision :)

xcharleslin · 2021-03-22T19:40:56Z

@pcmoritz please triage so this doesn't get forgotten ;)

xieus · 2023-05-22T21:41:55Z

@architkulkarni I am hitting the same issue on cloud launcher. If users have more than one subnets, it is not easy to deploy a Ray cluster to the target subnet, all by chance. Could we please prioritize the fix?

architkulkarni · 2023-05-22T22:02:25Z

Thanks @xieus , I'll see if we can prioritize it.

architkulkarni · 2023-05-22T22:25:08Z

@xieus does this unblock you in the meantime? #12530 (comment)

wuisawesome · 2023-05-22T22:56:31Z

@xieus the short term mitigation here is to provide a SubnetIds: ["<your subnet id>"] field in the node_config.

xieus · 2023-05-23T03:22:08Z

Right. I tried that but only providing SubnetIds doesn't work. We have to provide SecurityGroupIds, and then either set use_internal_ips as true, or make the target subnet's mapPublicIpOnLaunch property true (after reading the cloud launcher source codes). All of the desirable parameters are hidden from users.

I would recommend updating Ray OSS documents or at least this thread with an example, which shows the following setup:
(1) Two VPCs, two subnets per VPC;
(2) Each VPC has its own SG, not the default SG;
(3) A sample yaml file that specifies one subnet (can be any random subnet)..

This setup could give user a good example if they want to expand to more complicated setup. @architkulkarni @wuisawesome

xieus · 2023-05-23T12:46:04Z

I followed this yaml file: https://docs.ray.io/en/latest/cluster/vms/references/ray-cluster-configuration.html#full-configuration, which works only in the scenario of unspecified SubnetId, and cloud launch picks one for user although the criteria is random from user perspective. We currently don't have a working example that allows users to specify a subnet and just work.

xcharleslin added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Mar 9, 2021

richardliaw added P2 Important issue, but not time-critical and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Apr 21, 2021

richardliaw added this to the Serverless Autoscaling milestone Apr 21, 2021

AmeerHajAli added the infra autoscaler, ray client, kuberay, related issues label Mar 26, 2022

architkulkarni self-assigned this May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml #14551

[autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml #14551

xcharleslin commented Mar 9, 2021 •

edited

Loading

xcharleslin commented Mar 9, 2021 •

edited

Loading

wuisawesome commented Mar 9, 2021

xcharleslin commented Mar 9, 2021 •

edited

Loading

xcharleslin commented Mar 22, 2021

xieus commented May 22, 2023

architkulkarni commented May 22, 2023

architkulkarni commented May 22, 2023

wuisawesome commented May 22, 2023

xieus commented May 23, 2023

xieus commented May 23, 2023 •

edited

Loading

[autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml #14551

[autoscaler] SubnetId, a valid AWS field, is being ignored in cluster yaml #14551

Comments

xcharleslin commented Mar 9, 2021 • edited Loading

What is the problem?

Reproduction (REQUIRED)

xcharleslin commented Mar 9, 2021 • edited Loading

wuisawesome commented Mar 9, 2021

xcharleslin commented Mar 9, 2021 • edited Loading

xcharleslin commented Mar 22, 2021

xieus commented May 22, 2023

architkulkarni commented May 22, 2023

architkulkarni commented May 22, 2023

wuisawesome commented May 22, 2023

xieus commented May 23, 2023

xieus commented May 23, 2023 • edited Loading

xcharleslin commented Mar 9, 2021 •

edited

Loading

xcharleslin commented Mar 9, 2021 •

edited

Loading

xcharleslin commented Mar 9, 2021 •

edited

Loading

xieus commented May 23, 2023 •

edited

Loading