Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torchx integration #321

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
runs
balance_bot.xml
cleanrl/ppo_continuous_action_isaacgym/isaacgym/examples
cleanrl/ppo_continuous_action_isaacgym/isaacgym/isaacgym
Expand Down
27 changes: 16 additions & 11 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
FROM nvidia/cuda:11.4.2-runtime-ubuntu20.04
FROM nvidia/cuda:11.4.1-cudnn8-devel-ubuntu20.04

RUN rm /etc/apt/sources.list.d/cuda.list

# install ubuntu dependencies
ENV DEBIAN_FRONTEND=noninteractive
Expand All @@ -11,18 +13,21 @@ RUN mkdir cleanrl_utils && touch cleanrl_utils/__init__.py
RUN pip install poetry
COPY pyproject.toml pyproject.toml
COPY poetry.lock poetry.lock
COPY README.md README.md
COPY ./cleanrl/ppo_continuous_action_isaacgym /cleanrl/ppo_continuous_action_isaacgym
RUN poetry config virtualenvs.create false
RUN poetry install
RUN poetry install --with atari
RUN poetry install --with pybullet
RUN poetry install --with envpool,jax
RUN poetry run pip install --upgrade "jax[cuda]==0.3.17" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

# install mujoco
RUN apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
libglew-dev \
libosmesa6-dev patchelf
RUN poetry install --with mujoco
RUN poetry run python -c "import mujoco_py"
# # install mujoco
# RUN apt-get -y install wget unzip software-properties-common \
# libgl1-mesa-dev \
# libgl1-mesa-glx \
# libglew-dev \
# libosmesa6-dev patchelf
# RUN poetry install --with mujoco
# RUN poetry run python -c "import mujoco_py"

COPY entrypoint.sh /usr/local/bin/
RUN chmod 777 /usr/local/bin/entrypoint.sh
Expand Down
38 changes: 38 additions & 0 deletions Dockerfile.torchx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
FROM nvidia/cuda:11.4.1-cudnn8-devel-ubuntu20.04

RUN rm /etc/apt/sources.list.d/cuda.list

# install ubuntu dependencies
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get -y install python3-pip xvfb ffmpeg git build-essential python-opengl
RUN ln -s /usr/bin/python3 /usr/bin/python

# install python dependencies
RUN mkdir cleanrl_utils && touch cleanrl_utils/__init__.py
RUN pip install poetry
COPY pyproject.toml pyproject.toml
COPY poetry.lock poetry.lock
COPY README.md README.md
COPY ./cleanrl/ppo_continuous_action_isaacgym /cleanrl/ppo_continuous_action_isaacgym
# ENV POETRY_VIRTUALENVS_CREATE=false

# install mujoco
RUN apt-get update && apt-get -y install wget unzip software-properties-common \
libgl1-mesa-dev \
libgl1-mesa-glx \
libglew-dev \
libosmesa6-dev patchelf

# RUN poetry install --with dev,atari,pybullet,procgen,pytest,mujoco,docs,jax,optuna,envpool,pettingzoo,cloud
RUN poetry install --with atari
# RUN poetry run pip install --upgrade "jax[cuda]==0.3.17" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
RUN poetry run pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
# RUN poetry run python -c "import mujoco_py"

COPY entrypoint.sh /usr/local/bin/
RUN chmod 777 /usr/local/bin/entrypoint.sh
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]

# copy local files
COPY ./cleanrl /cleanrl
9 changes: 9 additions & 0 deletions docs/cloud/submit-experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,12 @@ poetry run python -m cleanrl_utils.submit_exp \
docker buildx inspect --bootstrap
python -m cleanrl_utils.submit_exp -b --archs linux/arm64,linux/amd64
```


### Torchx Support (Experimental)

```
poetry run torchx run --scheduler local_docker utils.python --gpu 1 --script cleanrl/cleanrl.py
poetry run torchx run --scheduler aws_batch --scheduler_args queue=c5a-large,image_repo=vwxyzjn/cleanrl utils.python --script cleanrl/ppo.py
poetry run torchx status aws_batch://torchx/c5a-large:torchx_utils_python-pn9sx3wzq0qcwd
```
2 changes: 1 addition & 1 deletion entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ Xvfb :1 -screen 0 1024x768x24 -ac +extension GLX +render -noreset &> xvfb.log &
export DISPLAY=:1
set -e
# bash -c "echo vm.overcommit_memory=1 >> /etc/sysctl.conf" && sysctl -p
exec "$@"
exec poetry run "$@"