Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib-contrib] ES (evolutionary strategies). #36625

Merged
merged 11 commits into from
Oct 4, 2023
21 changes: 17 additions & 4 deletions .buildkite/pipeline.ml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -456,7 +456,7 @@
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/a2c && pip install -r requirements.txt && pip install -e ".[development"])
- (cd rllib_contrib/a2c && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/a2c/tests/
- python rllib_contrib/a2c/examples/a2c_cartpole_v1.py --run-as-test
Expand All @@ -468,7 +468,7 @@
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/a3c && pip install -r requirements.txt && pip install -e ".[development"])
- (cd rllib_contrib/a3c && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/a3c/tests/test_a3c.py

Expand All @@ -479,7 +479,7 @@
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/alpha_star && pip install -r requirements.txt && pip install -e ".[development"])
- (cd rllib_contrib/alpha_star && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/alpha_star/tests/
- python rllib_contrib/alpha_star/examples/multi-agent-cartpole-alpha-star.py --run-as-test
Expand Down Expand Up @@ -544,6 +544,18 @@
- pytest rllib_contrib/ddpg/tests/
- python rllib_contrib/ddpg/examples/ddpg_pendulum_v1.py --run-as-test

- label: ":exploding_death_star: RLlib Contrib: ES Tests"
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_RLLIB_CONTRIB_AFFECTED"]
commands:
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/es && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/es/tests/
- python rllib_contrib/es/examples/es_cartpole_v1.py --run-as-test

- label: ":exploding_death_star: RLlib Contrib: Leela Chess Zero Tests"
conditions: ["NO_WHEELS_REQUIRED", "RAY_CI_RLLIB_CONTRIB_AFFECTED"]
commands:
Expand Down Expand Up @@ -632,10 +644,11 @@
- python rllib_contrib/r2d2/examples/r2d2_stateless_cartpole.py --run-as-test

- label: ":exploding_death_star: RLlib Contrib: SimpleQ Tests"
- (cd rllib_contrib/simple_q && pip install -r requirements.txt && pip install -e ".[development]")
- cleanup() { if [ "${BUILDKITE_PULL_REQUEST}" = "false" ]; then ./ci/build/upload_build_info.sh; fi }; trap cleanup EXIT
- conda deactivate
- conda create -n rllib_contrib python=3.8 -y
- conda activate rllib_contrib
- (cd rllib_contrib/simple_q && pip install -r requirements.txt && pip install -e ".[development]")
- ./ci/env/env_info.sh
- pytest rllib_contrib/simple_q/tests/
- python rllib_contrib/simple_q/examples/simple_q_cartpole_v1.py --run-as-test
Expand Down
17 changes: 17 additions & 0 deletions rllib_contrib/es/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# ES (Evolution Strategies)

[ES](https://arxiv.org/abs/1703.03864) is a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients. It is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation.


## Installation

```
conda create -n rllib-es python=3.10
conda activate rllib-es
pip install -r requirements.txt
pip install -e '.[development]'
```

## Usage

[ES Example]()
50 changes: 50 additions & 0 deletions rllib_contrib/es/examples/es_cartpole_v1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import argparse

from rllib_es.es import ES, ESConfig

import ray
from ray import air, tune
from ray.rllib.utils.test_utils import check_learning_achieved


def get_cli_args():
"""Create CLI parser and return parsed arguments"""
parser = argparse.ArgumentParser()
parser.add_argument("--run-as-test", action="store_true", default=False)
args = parser.parse_args()
print(f"Running with following CLI args: {args}")
return args


if __name__ == "__main__":
args = get_cli_args()

ray.init()

config = (
ESConfig()
.rollouts(num_rollout_workers=2)
.framework("torch")
.environment("CartPole-v1")
.training(noise_size=25000000, episodes_per_batch=50)
)

stop_reward = 100

tuner = tune.Tuner(
ES,
param_space=config.to_dict(),
run_config=air.RunConfig(
stop={
"sampler_results/episode_reward_mean": stop_reward,
"timesteps_total": 500000,
},
failure_config=air.FailureConfig(fail_fast="raise"),
),
)
results = tuner.fit()

if args.run_as_test:
check_learning_achieved(
results, stop_reward, metric="sampler_results/episode_reward_mean"
)
18 changes: 18 additions & 0 deletions rllib_contrib/es/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

[tool.setuptools.packages.find]
where = ["src"]

[project]
name = "rllib-es"
authors = [{name = "Anyscale Inc."}]
version = "0.1.0"
description = ""
readme = "README.md"
requires-python = ">=3.7, <3.11"
dependencies = ["gymnasium", "ray[rllib]==2.5.0"]

[project.optional-dependencies]
development = ["pytest>=7.2.2", "pre-commit==2.21.0", "tensorflow==2.11.0", "torch==1.12.0"]
2 changes: 2 additions & 0 deletions rllib_contrib/es/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tensorflow==2.11.0
torch==1.12.0
9 changes: 9 additions & 0 deletions rllib_contrib/es/src/rllib_es/es/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
from rllib_es.es.es import ES, ESConfig
from rllib_es.es.es_tf_policy import ESTFPolicy
from rllib_es.es.es_torch_policy import ESTorchPolicy

from ray.tune.registry import register_trainable

__all__ = ["ES", "ESConfig", "ESTFPolicy", "ESTorchPolicy"]

register_trainable("rllib-contrib-es", ES)
Loading
Loading