Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage with AWS S3 and Ray #59

Open
0x2b3bfa0 opened this issue Sep 16, 2023 · 5 comments
Open

Usage with AWS S3 and Ray #59

0x2b3bfa0 opened this issue Sep 16, 2023 · 5 comments

Comments

@0x2b3bfa0
Copy link
Contributor

0x2b3bfa0 commented Sep 16, 2023

Usage

Cluster creation

ray up --yes cluster.yml
ray dashboard cluster.yml

Job submission

git clone https://github.com/mlfoundations/datacomp
ray job submit \
--address=http://localhost:8265 \
--working-dir=datacomp \
--runtime-env-json="$(
  jq --null-input '
    {
      conda: "datacomp/environment.yml",
      env_vars: {
        AWS_ACCESS_KEY_ID: env.AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY: env.AWS_SECRET_ACCESS_KEY,
        AWS_SESSION_TOKEN: env.AWS_SESSION_TOKEN
      }
    }
  '
)" \
-- \
python download_upstream.py \
--subjob_size=11520 \
--thread_count=128 \
--processes_count=1 \
--distributor=ray \
--metadata_dir=/tmp/metadata \
--data_dir=s3://datacomp-small \
--scale=small

Note

Image shards would be saved to the datacomp-small AWS S3 bucket, specified with the --data_dir option.

Cluster deletion

$ ray down --yes cluster.yml

Configuration

Sample cluster.yml

cluster_name: datacomp-downloader

min_workers: 0
max_workers: 10
upscaling_speed: 1.0

docker:
  run_options: [--dns=127.0.0.1]
  image: rayproject/ray:2.6.1-py310
  container_name: ray

provider:
  type: aws
  region: us-east-1
  cache_stopped_nodes: false

available_node_types:
  ray.head.default:
    resources: {}
    node_config:
      InstanceType: m5.12xlarge
      ImageId: ami-068d304eca3399469
      BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
            DeleteOnTermination: true
            VolumeSize: 200
            VolumeType: gp2
  ray.worker.default:
    resources: {}
    node_config:
      InstanceType: m5.12xlarge
      ImageId: ami-068d304eca3399469
      BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
            DeleteOnTermination: true
            VolumeSize: 200
            VolumeType: gp2

initialization_commands:
  - wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
  - sudo dpkg --install knot-resolver-release.deb
  - sudo apt-get update
  - sudo apt-get install --yes knot-resolver
  - echo $(hostname --all-ip-addresses) $(hostname) | sudo tee --append /etc/hosts
  - sudo systemctl start kresd@{1..48}.service
  - echo nameserver 127.0.0.1 | sudo tee /etc/resolv.conf
  - sudo systemctl stop systemd-resolved

setup_commands:
  - sudo apt-get update
  - sudo apt-get install --yes build-essential ffmpeg

Obscure details

  • When --data_dir points to a cloud storage like S3, we also have to specify a local --metadata_dir because the downloader script doesn't support saving metadata to cloud storage.

  • The last pip install on the setup_commands section is needed for compatibility with AWS S3, because the required libraries aren't included in the conda environment file.

  • There is no need to provide additional AWS credentials if the destination bucket is on the same account as the cluster, because it already has S3 full access through an instance profile.

    • While the cluster has a default instance profile that grants full S3 access, it doesn't seem to work as intended (probably due to rate limit of IMDS endpoint), and I ended up having to pass my local AWS credentials as environment variables.
  • The Python version in environment.yml must match the Python version of the Ray cluster; make sure that docker.image on cluster.yaml has exactly the same version as the environment.yml from this project.

@0x2b3bfa0
Copy link
Contributor Author

🔔 @Vaishaal & @rom1504

@0x2b3bfa0

This comment was marked as off-topic.

@0x2b3bfa0 0x2b3bfa0 closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2023
@rom1504
Copy link

rom1504 commented Sep 19, 2023

Hey why did you close it ?

I think it's a good improvement and people will review the PRs soon

@0x2b3bfa0
Copy link
Contributor Author

0x2b3bfa0 commented Sep 19, 2023

Hello! I closed the issue because it wasn't quite actionable, but rather a “note to my future self” that could eventually become documentation. 🙈 I'll reopen it if you wish, though.

@0x2b3bfa0 0x2b3bfa0 reopened this Sep 19, 2023
@0x2b3bfa0
Copy link
Contributor Author

Alternative version, without containers.

cluster_name: datacomp-downloader

min_workers: 0
max_workers: 10
upscaling_speed: 1.0

provider:
  type: aws
  region: us-east-1
  cache_stopped_nodes: false

available_node_types:
  ray.head.default:
    resources: {}
    node_config:
      InstanceType: m5.12xlarge
      ImageId: ami-068d304eca3399469
      BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
            DeleteOnTermination: true
            VolumeSize: 200
            VolumeType: gp2
  ray.worker.default:
    resources: {}
    node_config:
      InstanceType: m5.12xlarge
      ImageId: ami-068d304eca3399469
      BlockDeviceMappings:
        - DeviceName: /dev/sda1
          Ebs:
            DeleteOnTermination: true
            VolumeSize: 200
            VolumeType: gp2

initialization_commands:
  # Knot Resolver
  - wget https://secure.nic.cz/files/knot-resolver/knot-resolver-release.deb
  - sudo dpkg --install knot-resolver-release.deb
  - rm knot-resolver-release.deb
  - sudo apt-get update
  - sudo apt-get install --yes knot-resolver
  - echo $(hostname --all-ip-addresses) $(hostname) | sudo tee --append /etc/hosts
  - sudo systemctl start kresd@{1..48}.service
  - echo nameserver 127.0.0.1 | sudo tee /etc/resolv.conf
  - sudo systemctl stop systemd-resolved
  # Anaconda
  - sudo mkdir /opt/miniconda3 && sudo chown $USER /opt/miniconda3
  - wget https://repo.anaconda.com/miniconda/Miniconda3-py39_22.11.1-1-Linux-x86_64.sh
  - bash Miniconda3-py39_22.11.1-1-Linux-x86_64.sh -f -b -p /opt/miniconda3
  - rm Miniconda3-py39_22.11.1-1-Linux-x86_64.sh
  - /opt/miniconda3/bin/conda init bash
  # Ray
  - conda create --yes --name=ray python=3.10.8
  - echo conda activate ray >> ~/.bashrc
  - pip install ray[all]==2.7.0

setup_commands:
  - sudo apt-get update
  - sudo apt-get install --yes build-essential ffmpeg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants