Skip to content

Commit

Permalink
[RLlib-contrib] ES (evolutionary strategies). (ray-project#36625)
Browse files Browse the repository at this point in the history
  • Loading branch information
avnishn authored and Andrew Xue committed Oct 10, 2023
1 parent a0adfb5 commit 1f18cda
Show file tree
Hide file tree
Showing 12 changed files with 1,270 additions and 4 deletions.
21 changes: 17 additions & 4 deletions .buildkite/pipeline.ml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/a2c && pip install -r requirements.txt && pip install -e ".[development"])
- (cd rllib_contrib/a2c && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/a2c/tests/
- python rllib_contrib/a2c/examples/a2c_cartpole_v1.py --run-as-test
Expand All @@ -468,7 +468,7 @@
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/a3c && pip install -r requirements.txt && pip install -e ".[development"])
- (cd rllib_contrib/a3c && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/a3c/tests/test_a3c.py

Expand All @@ -479,7 +479,7 @@
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/alpha_star && pip install -r requirements.txt && pip install -e ".[development"])
- (cd rllib_contrib/alpha_star && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/alpha_star/tests/
- python rllib_contrib/alpha_star/examples/multi-agent-cartpole-alpha-star.py --run-as-test
Expand Down Expand Up @@ -544,6 +544,18 @@
- pytest rllib_contrib/ddpg/tests/
- python rllib_contrib/ddpg/examples/ddpg_pendulum_v1.py --run-as-test

- label: ":exploding_death_star: RLlib Contrib: ES Tests"
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_RLLIB_CONTRIB_AFFECTED"]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/es && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/es/tests/
- python rllib_contrib/es/examples/es_cartpole_v1.py --run-as-test

- label: ":exploding_death_star: RLlib Contrib: Leela Chess Zero Tests"
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_RLLIB_CONTRIB_AFFECTED"]
commands:
Expand Down Expand Up @@ -632,10 +644,11 @@
- python rllib_contrib/r2d2/examples/r2d2_stateless_cartpole.py --run-as-test

- label: ":exploding_death_star: RLlib Contrib: SimpleQ Tests"
- (cd rllib_contrib/simple_q && pip install -r requirements.txt && pip install -e ".[development]")
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/simple_q && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/simple_q/tests/
- python rllib_contrib/simple_q/examples/simple_q_cartpole_v1.py --run-as-test
Expand Down
17 changes: 17 additions & 0 deletions rllib_contrib/es/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# ES (Evolution Strategies)

[ES](https://arxiv.org/abs/1703.03864) is a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. It is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.


## Installation

```
conda create -n rllib-es python=3.10
conda activate rllib-es
pip install -r requirements.txt
pip install -e '.[development]'
```

## Usage

[ES Example]()
50 changes: 50 additions & 0 deletions rllib_contrib/es/examples/es_cartpole_v1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import argparse

from rllib_es.es import ES, ESConfig

import ray
from ray import air, tune
from ray.rllib.utils.test_utils import check_learning_achieved


def get_cli_args():
"""Create CLI parser and return parsed arguments"""
parser = argparse.ArgumentParser()
parser.add_argument("--run-as-test", action="store_true", default=False)
args = parser.parse_args()
print(f"Running with following CLI args: {args}")
return args


if __name__ == "__main__":
args = get_cli_args()

ray.init()

config = (
ESConfig()
.rollouts(num_rollout_workers=2)
.framework("torch")
.environment("CartPole-v1")
.training(noise_size=25000000, episodes_per_batch=50)
)

stop_reward = 100

tuner = tune.Tuner(
ES,
param_space=config.to_dict(),
run_config=air.RunConfig(
stop={
"sampler_results/episode_reward_mean": stop_reward,
"timesteps_total": 500000,
},
failure_config=air.FailureConfig(fail_fast="raise"),
),
)
results = tuner.fit()

if args.run_as_test:
check_learning_achieved(
results, stop_reward, metric="sampler_results/episode_reward_mean"
)
18 changes: 18 additions & 0 deletions rllib_contrib/es/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
where = ["src"]

[project]
name = "rllib-es"
authors = [{name = "Anyscale Inc."}]
version = "0.1.0"
description = ""
readme = "README.md"
requires-python = ">=3.7, <3.11"
dependencies = ["gymnasium", "ray[rllib]==2.5.0"]

[project.optional-dependencies]
development = ["pytest>=7.2.2", "pre-commit==2.21.0", "tensorflow==2.11.0", "torch==1.12.0"]
2 changes: 2 additions & 0 deletions rllib_contrib/es/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tensorflow==2.11.0
torch==1.12.0
9 changes: 9 additions & 0 deletions rllib_contrib/es/src/rllib_es/es/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from rllib_es.es.es import ES, ESConfig
from rllib_es.es.es_tf_policy import ESTFPolicy
from rllib_es.es.es_torch_policy import ESTorchPolicy

from ray.tune.registry import register_trainable

__all__ = ["ES", "ESConfig", "ESTFPolicy", "ESTorchPolicy"]

register_trainable("rllib-contrib-es", ES)
Loading

0 comments on commit 1f18cda

Please sign in to comment.