Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for v1.0.0 release #314

Merged
merged 14 commits into from
Nov 14, 2022
3 changes: 2 additions & 1 deletion .github/issue_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@

## Checklist
- [ ] I have installed dependencies via `poetry install` (see [CleanRL's installation guideline](https://docs.cleanrl.dev/get-started/installation/).
- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo (required)
- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo.
- [ ] I have checked the [documentation site](https://docs.cleanrl.dev/) and found not relevant information in [GitHub issues](https://github.com/vwxyzjn/cleanrl/issues).

## Current Behavior
<!--- Tell us what happens instead of the expected behavior -->
Expand Down
6 changes: 3 additions & 3 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,16 @@
- [ ] I have updated the documentation and previewed the changes via `mkdocs serve`.
- [ ] I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
- [ ] I have contacted [vwxyzjn](https://github.com/vwxyzjn) to obtain access to the [openrlbenchmark W&B team](https://wandb.ai/openrlbenchmark) (**required**).
- [ ] I have tracked applicable experiments in [openrlbenchmark/cleanrl](https://wandb.ai/openrlbenchmark/cleanrl) with `--capture-video` flag toggled on (**required**).
- [ ] I have added additional documentation and previewed the changes via `mkdocs serve`.
- [ ] I have explained note-worthy implementation details.
- [ ] I have explained the logged metrics.
- [ ] I have added links to the original paper and related papers (if applicable).
- [ ] I have added links to the PR related to the algorithm.
- [ ] I have added links to the PR related to the algorithm variant.
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- [ ] I have added the learning curves (in PNG format with `width=500` and `height=300`).
- [ ] I have added the learning curves (in PNG format).
- [ ] I have added links to the tracked experiments.
- [ ] I have updated the overview sections at the [docs](https://docs.cleanrl.dev/rl-algorithms/overview/) and the [repo](https://github.com/vwxyzjn/cleanrl#overview)
- [ ] I have updated the tests accordingly (if applicable).
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
- uses: pre-commit/[email protected]
with:
extra_args: --hook-stage manual --all-files
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@ You may also use a prebuilt development environment hosted in Gitpod:
| | [`ppo_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_continuous_action.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_continuous_actionpy)
| | [`ppo_atari_lstm.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_lstm.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_lstmpy)
| | [`ppo_atari_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy)
| | [`ppo_atari_envpool_xla_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool_xla_jax.py), [docs](/rl-algorithms/ppo/#ppo_atari_envpool_xla_jaxpy)
| | [`ppo_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_procgen.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_procgenpy)
| | [`ppo_atari_multigpu.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_multigpu.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_multigpupy)
| | [`ppo_pettingzoo_ma_atari.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_pettingzoo_ma_atari.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy)
Expand All @@ -138,14 +139,18 @@ You may also use a prebuilt development environment hosted in Gitpod:
| ✅ [Twin Delayed Deep Deterministic Policy Gradient (TD3)](https://arxiv.org/pdf/1802.09477.pdf) | [`td3_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py), [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_actionpy) |
| | [`td3_continuous_action_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action_jax.py), [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_action_jaxpy) |
| ✅ [Phasic Policy Gradient (PPG)](https://arxiv.org/abs/2009.04416) | [`ppg_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppg_procgen.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppg/#ppg_procgenpy) |
| ✅ [Random Network Distillation (RND)](https://arxiv.org/abs/1810.12894) | [`ppo_rnd_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_rnd_envpool.py), [docs](/rl-algorithms/ppo-rnd/#ppo_rnd_envpoolpy) |


## Open RL Benchmark

CleanRL has a sub project called Open RL Benchmark (https://benchmark.cleanrl.dev/), where we have tracked thousands of experiments across domains. The benchmark is interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. Here are some screenshots.
To make our experimental data transparent, CleanRL participates in a related project called [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark), which contains tracked experiments from popular DRL libraries such as ours, [Stable-baselines3](https://github.com/DLR-RM/stable-baselines3), [openai/baselines](https://github.com/openai/baselines), [jaxrl](https://github.com/ikostrikov/jaxrl), and others.

Check out https://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (see [repo](https://github.com/openrlbenchmark/openrlbenchmark)).

![](docs/static/o1.png)
![](docs/static/o2.png)
![](docs/static/o3.png)
![](docs/static/o1.png)


## Support and get involved
Expand Down
2 changes: 1 addition & 1 deletion benchmark/ppo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
--workers 1


poetry install --with envpool
poetry install --with envpool,jax
poetry run python -m cleanrl_utils.benchmark \
--env-ids Alien-v5 Amidar-v5 Assault-v5 Asterix-v5 Asteroids-v5 Atlantis-v5 BankHeist-v5 BattleZone-v5 BeamRider-v5 Berzerk-v5 Bowling-v5 Boxing-v5 Breakout-v5 Centipede-v5 ChopperCommand-v5 CrazyClimber-v5 Defender-v5 DemonAttack-v5 \
--command "poetry run python ppo_atari_envpool_xla_jax.py --track --wandb-project-name envpool-atari --wandb-entity openrlbenchmark" \
Expand Down
14 changes: 9 additions & 5 deletions cleanrl_utils/benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ def parse_args():
parser.add_argument("--num-seeds", type=int, default=3,
help="the number of random seeds")
parser.add_argument("--start-seed", type=int, default=1,
help="the number of random seeds")
parser.add_argument('--workers', type=int, default=0,
help='the number of eval workers to run benchmark experimenets (skips evaluation when set to 0)')
help="the number of the starting seed")
parser.add_argument("--workers", type=int, default=0,
help="the number of workers to run benchmark experimenets")
parser.add_argument("--auto-tag", type=lambda x: bool(strtobool(x)), default=True, nargs="?", const=True,
help="if toggled, the runs will be tagged with the output from `git describe --tags` (e.g., v1.0.0b2-11-g5db4db7)")
help="if toggled, the runs will be tagged with git tags, commit, and pull request number if possible")
args = parser.parse_args()
# fmt: on
return args
Expand Down Expand Up @@ -78,7 +78,9 @@ def autotag() -> str:
for env_id in args.env_ids:
commands += [" ".join([args.command, "--env-id", env_id, "--seed", str(args.start_seed + seed)])]

print(commands)
print("======= commands to run:")
for command in commands:
print(command)

if args.workers > 0:
from concurrent.futures import ThreadPoolExecutor
Expand All @@ -87,3 +89,5 @@ def autotag() -> str:
for command in commands:
executor.submit(run_experiment, command)
executor.shutdown(wait=True)
else:
print("not running the experiments because --workers is set to 0; just printing the commands to run")
4 changes: 4 additions & 0 deletions docs/blog/.authors.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
costa:
name: Costa Huang
description: Lead dev of CleanRL
avatar: https://avatars.githubusercontent.com/u/5555347
3 changes: 3 additions & 0 deletions docs/blog/.meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# comments: true
# hide:
# - feedback
1 change: 1 addition & 0 deletions docs/blog/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Blog
Loading