Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler] Ray Cluster Launcher on AWS | Minimizing Permissions #9327

Open
VishDev12 opened this issue Jul 7, 2020 · 10 comments
Open

[autoscaler] Ray Cluster Launcher on AWS | Minimizing Permissions #9327

VishDev12 opened this issue Jul 7, 2020 · 10 comments
Labels
docs An issue or change related to documentation infra autoscaler, ray client, kuberay, related issues
Milestone

Comments

@VishDev12
Copy link
Contributor

VishDev12 commented Jul 7, 2020

This (non) issue takes a brief look at how we can minimize the permissions granted to the Ray Cluster Launcher when using it with AWS.

The cluster launcher works by launching a single head node and using that node to launch the cluster’s worker nodes. If you’re using the launcher with AWS for the first time, an Instance Profile is auto-created and a role with full EC2 and S3 permissions is attached to it; this role also has the sts:AssumeRole permission.

This works seamlessly for basic use-cases, but if you need to grant AWS permissions to the worker nodes – to allow them to access S3, for example – you’re going to need to make a few changes. While we’re doing that, let’s also trim down the EC2 and S3 permissions granted to the head node.

Example Use Case

Let’s say we need a setup that has the following properties:

  • The Ray Cluster Launcher is allowed to launch instances only in the us-west-1 region.

  • The head and the worker nodes will have access to the ray-data S3 bucket.

Breakdown

  • The console you’re using to launch the cluster (launchpad) needs permissions to launch instances in the us-west-1 region. It also needs to assign an IAM role to the head node.

  • The head node needs similar permissions since it has to launch worker nodes in the same region and pass an IAM role to each one. It will also need to access the ray-data S3 bucket

  • The worker nodes will only need permissions to access the ray-data bucket.

Steps

1. Create an IAM role to assign to the head node

Role name: ray-head-v1

If you create this role for EC2 on the AWS console, an instance profile will be automatically created.

If you create this role using the AWS CLI, then create an instance profile of the same name and assign the role to it as below.

aws iam create-instance-profile --instance-profile-name ray-head-v1
aws iam add-role-to-instance-profile --instance-profile-name ray-head-v1 --role-name ray-head-v1

The AWS console page for this role will also list the ARN for the instance profile. Or to access it with the CLI:

aws iam list-instance-profiles | grep ray-head-v1

2. Create an IAM role to assign to the worker node

Role name: ray-worker-v1

Follow the same procedure as the previous step.

3. Create an IAM policy that will allow EC2 instance launches

Policy name: ray-ec2-launcher

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": "arn:aws:ec2:us-west-1::image/ami-*"
        },
        {
            "Effect": "Allow",
            "Action": "ec2:RunInstances",
            "Resource": [
                "arn:aws:ec2:us-west-1:<aws-account-number>:instance/*",
                "arn:aws:ec2:us-west-1:<aws-account-number>:network-interface/*",
                "arn:aws:ec2:us-west-1:<aws-account-number>:subnet/*",
                "arn:aws:ec2:us-west-1:<aws-account-number>:key-pair/*",
                "arn:aws:ec2:us-west-1:<aws-account-number>:volume/*",
                "arn:aws:ec2:us-west-1:<aws-account-number>:security-group/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:TerminateInstances",
                "ec2:DeleteTags",
                "ec2:StartInstances",
                "ec2:CreateTags",
                "ec2:StopInstances"
            ],
            "Resource": "arn:aws:ec2:us-west-1:<aws-account-number>:instance/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:Describe*"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
              "arn:aws:iam::<aws-account-number>:instance-profile/ray-head-v1",
              "arn:aws:iam::<aws-account-number>:instance-profile/ray-worker-v1"
            ]
        }
    ]
}

4. Create a policy to access the S3 bucket

Policy name: ray-s3-access

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:*"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::ray-data/*",
                "arn:aws:s3:::ray-data",
            ]
        }
    ]
}

5. Assign both of the above policies to the ray-head-v1 role

You can do this either through the AWS console interactively or using the CLI with:

aws iam attach-role-policy --policy-arn arn:aws:iam::<aws-account-number>:policy/ray-ec2-launcher --role-name ray-head-v1
aws iam attach-role-policy --policy-arn arn:aws:iam::<aws-account-number>:policy/ray-s3-access --role-name ray-head-v1

6. Assign the S3 access policy to the ray-worker-v1 role

7. Assign the ray-ec2-launcher policy to a launchpad role/user

This can optionally be done to limit the permissions assigned to the role/user that will be operating the Ray cluster launcher. For example, if you’re an AWS administrator and need to allow one of your users to (only) launch Ray clusters.

8. Edit your cluster config YAML file

Under head_node:, add:

IamInstanceProfile:
  Arn: arn:aws:iam::<aws-account-number>:instance-profile/ray-head-v1

Under worker_nodes:, add:

IamInstanceProfile:
  Arn: arn:aws:iam::<aws-account-number>:instance-profile/ray-worker-v1

Summary

While the ray-ec2-launcher policy has reduced permissions compared to the original, it’s still possible to whittle this down further by specifying the AMIs, subnets, key-pairs, etc that the cluster launcher is allowed to access, as opposed to using a wildcard.

@richardliaw
Copy link
Contributor

Maybe we can post this to the docs somewhere?

@VishDev12
Copy link
Contributor Author

Yeah, that sounds like a good idea; do you mean something like linking this issue from there?

@stale
Copy link

stale bot commented Nov 11, 2020

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 11, 2020
@richardliaw richardliaw added autoscaler good-first-issue Great starter issue for someone just starting to contribute to Ray labels Nov 11, 2020
@stale stale bot removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Nov 11, 2020
@richardliaw
Copy link
Contributor

Yeah, maybe this should be added to this page: https://docs.ray.io/en/master/cluster/aws-tips.html#aws-cluster

@WillCodeCo
Copy link

WillCodeCo commented May 31, 2021

This almost worked for me but I needed to change the ARNs for the iam::PassRole to be:

  • role instead of instance-profile
{
...
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": [
                "arn:aws:iam::<aws-account-number>:role/ray-head-v1",
                "arn:aws:iam::<aws-account-number>:role/ray-worker-v1"
            ]
        }
}

@AmeerHajAli AmeerHajAli added the infra autoscaler, ray client, kuberay, related issues label Mar 26, 2022
@bveeramani bveeramani added docs An issue or change related to documentation and removed fix-docs labels May 24, 2022
@mlubej
Copy link

mlubej commented Jul 15, 2022

If you're running this and want to spawn a cluster of SPOT instances and in the slim chance that you don't have the service-linked-role for creating SPOT instances (e.g. because you don't have access to a root user), you should do:

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

@zahababu
Copy link

zahababu commented Oct 6, 2023

Maybe we can post this to the docs somewhere?

is there a good beginner guide of ray

@Michalos88
Copy link

This should be put somewhere in docs. I find the default permissions too open for a production setting.

@Mystorius
Copy link

Mystorius commented Jan 18, 2024

Hey, I am facing a problem with the above mentioned guide.

If i provide the following configuration for my worker nodes, once the ray up config.yaml --yes command is done, then I attached to the cluster I get the error: ray.worker.default: UnauthorizedOperation However the role I used to authenticate with AWS has AdministratorAcess.

node_config: InstanceType: t3.micro IamInstanceProfile: Arn: arn:aws:iam::OURAWSID:instance-profile/ray-worker-v1

However for the head node, everything works fine.
IamInstanceProfile: Arn: arn:aws:iam::OURAWSID:instance-profile/ray-head-v1

If i do not provide any Arn configuration to the worker node, the cluster also starts without any problems. However all created worker nodes have no IAM role attached to it. If I look at the Worker-Node EC2 instance, under "IAM Role" I get no attached role therefore my worker nodes are not able to access my S3 storage for example.

Does anyone have an Idea why the config is not working?

PS: The only workaround is to manually set the IAM role on the worker using the EC2 portal i.e. Actions -> Security -> Modify IAM role -> select ray-worker-v1 but that is not really a viable solution.

@ronaldo-valente-sgpiu
Copy link

Is there a similar solution for Fargate PODs, especially when the RayCluster is being automatically created by a RayJob yaml?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs An issue or change related to documentation infra autoscaler, ray client, kuberay, related issues
Projects
None yet
Development

No branches or pull requests