Skip to content

Commit c37a3ec

Browse files
authored
Prepare for v1.0.0 release (#314)
* Prepare for v1.0.0 release * add missing documentation * update docs on open rl benchmark * change site name * update documentation * point reproducibility script to master * revert mkdocs change * add docs * Check requirements.txt exports in pre-commit CI * check all files in pre-commit * fix docs for dqn * v1.0.0 blog (#315) * v1.0.0 blog * add changes * support insider * properly link github usernames * Add a note on support gymnasium * fix typo * add link for google jax * fix typo * add release note and highlight jax's performance * fix typo * highlight performance * Address comments * update blog * update description * remove words * quick change * quick change * quick change * fix typo * omit `dqn_jax.py` from the announcement
1 parent 19a0907 commit c37a3ec

33 files changed

+497
-152
lines changed

.github/issue_template.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,8 @@
33

44
## Checklist
55
- [ ] I have installed dependencies via `poetry install` (see [CleanRL's installation guideline](https://docs.cleanrl.dev/get-started/installation/).
6-
- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo (required)
6+
- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo.
7+
- [ ] I have checked the [documentation site](https://docs.cleanrl.dev/) and found not relevant information in [GitHub issues](https://github.com/vwxyzjn/cleanrl/issues).
78

89
## Current Behavior
910
<!--- Tell us what happens instead of the expected behavior -->

.github/pull_request_template.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -16,16 +16,16 @@
1616
- [ ] I have updated the documentation and previewed the changes via `mkdocs serve`.
1717
- [ ] I have updated the tests accordingly (if applicable).
1818

19-
If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
19+
If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR.
2020
- [ ] I have contacted [vwxyzjn](https://github.com/vwxyzjn) to obtain access to the [openrlbenchmark W&B team](https://wandb.ai/openrlbenchmark) (**required**).
2121
- [ ] I have tracked applicable experiments in [openrlbenchmark/cleanrl](https://wandb.ai/openrlbenchmark/cleanrl) with `--capture-video` flag toggled on (**required**).
2222
- [ ] I have added additional documentation and previewed the changes via `mkdocs serve`.
2323
- [ ] I have explained note-worthy implementation details.
2424
- [ ] I have explained the logged metrics.
2525
- [ ] I have added links to the original paper and related papers (if applicable).
26-
- [ ] I have added links to the PR related to the algorithm.
26+
- [ ] I have added links to the PR related to the algorithm variant.
2727
- [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
28-
- [ ] I have added the learning curves (in PNG format with `width=500` and `height=300`).
28+
- [ ] I have added the learning curves (in PNG format).
2929
- [ ] I have added links to the tracked experiments.
3030
- [ ] I have updated the overview sections at the [docs](https://docs.cleanrl.dev/rl-algorithms/overview/) and the [repo](https://github.com/vwxyzjn/cleanrl#overview)
3131
- [ ] I have updated the tests accordingly (if applicable).

.github/workflows/pre-commit.yml

+2
Original file line numberDiff line numberDiff line change
@@ -22,3 +22,5 @@ jobs:
2222
with:
2323
python-version: ${{ matrix.python-version }}
2424
- uses: pre-commit/[email protected]
25+
with:
26+
extra_args: --hook-stage manual --all-files

README.md

+7-2
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ You may also use a prebuilt development environment hosted in Gitpod:
122122
| | [`ppo_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_continuous_action.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_continuous_actionpy)
123123
| | [`ppo_atari_lstm.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_lstm.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_lstmpy)
124124
| | [`ppo_atari_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy)
125+
| | [`ppo_atari_envpool_xla_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool_xla_jax.py), [docs](/rl-algorithms/ppo/#ppo_atari_envpool_xla_jaxpy)
125126
| | [`ppo_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_procgen.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_procgenpy)
126127
| | [`ppo_atari_multigpu.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_multigpu.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_multigpupy)
127128
| | [`ppo_pettingzoo_ma_atari.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_pettingzoo_ma_atari.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy)
@@ -138,14 +139,18 @@ You may also use a prebuilt development environment hosted in Gitpod:
138139
|[Twin Delayed Deep Deterministic Policy Gradient (TD3)](https://arxiv.org/pdf/1802.09477.pdf) | [`td3_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py), [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_actionpy) |
139140
| | [`td3_continuous_action_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action_jax.py), [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_action_jaxpy) |
140141
|[Phasic Policy Gradient (PPG)](https://arxiv.org/abs/2009.04416) | [`ppg_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppg_procgen.py), [docs](https://docs.cleanrl.dev/rl-algorithms/ppg/#ppg_procgenpy) |
142+
|[Random Network Distillation (RND)](https://arxiv.org/abs/1810.12894) | [`ppo_rnd_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_rnd_envpool.py), [docs](/rl-algorithms/ppo-rnd/#ppo_rnd_envpoolpy) |
143+
141144

142145
## Open RL Benchmark
143146

144-
CleanRL has a sub project called Open RL Benchmark (https://benchmark.cleanrl.dev/), where we have tracked thousands of experiments across domains. The benchmark is interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. Here are some screenshots.
147+
To make our experimental data transparent, CleanRL participates in a related project called [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark), which contains tracked experiments from popular DRL libraries such as ours, [Stable-baselines3](https://github.com/DLR-RM/stable-baselines3), [openai/baselines](https://github.com/openai/baselines), [jaxrl](https://github.com/ikostrikov/jaxrl), and others.
148+
149+
Check out https://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (see [repo](https://github.com/openrlbenchmark/openrlbenchmark)).
145150

151+
![](docs/static/o1.png)
146152
![](docs/static/o2.png)
147153
![](docs/static/o3.png)
148-
![](docs/static/o1.png)
149154

150155

151156
## Support and get involved

benchmark/ppo.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
7373
--workers 1
7474

7575

76-
poetry install --with envpool
76+
poetry install --with envpool,jax
7777
poetry run python -m cleanrl_utils.benchmark \
7878
--env-ids Alien-v5 Amidar-v5 Assault-v5 Asterix-v5 Asteroids-v5 Atlantis-v5 BankHeist-v5 BattleZone-v5 BeamRider-v5 Berzerk-v5 Bowling-v5 Boxing-v5 Breakout-v5 Centipede-v5 ChopperCommand-v5 CrazyClimber-v5 Defender-v5 DemonAttack-v5 \
7979
--command "poetry run python ppo_atari_envpool_xla_jax.py --track --wandb-project-name envpool-atari --wandb-entity openrlbenchmark" \

cleanrl_utils/benchmark.py

+9-5
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ def parse_args():
1717
parser.add_argument("--num-seeds", type=int, default=3,
1818
help="the number of random seeds")
1919
parser.add_argument("--start-seed", type=int, default=1,
20-
help="the number of random seeds")
21-
parser.add_argument('--workers', type=int, default=0,
22-
help='the number of eval workers to run benchmark experimenets (skips evaluation when set to 0)')
20+
help="the number of the starting seed")
21+
parser.add_argument("--workers", type=int, default=0,
22+
help="the number of workers to run benchmark experimenets")
2323
parser.add_argument("--auto-tag", type=lambda x: bool(strtobool(x)), default=True, nargs="?", const=True,
24-
help="if toggled, the runs will be tagged with the output from `git describe --tags` (e.g., v1.0.0b2-11-g5db4db7)")
24+
help="if toggled, the runs will be tagged with git tags, commit, and pull request number if possible")
2525
args = parser.parse_args()
2626
# fmt: on
2727
return args
@@ -78,7 +78,9 @@ def autotag() -> str:
7878
for env_id in args.env_ids:
7979
commands += [" ".join([args.command, "--env-id", env_id, "--seed", str(args.start_seed + seed)])]
8080

81-
print(commands)
81+
print("======= commands to run:")
82+
for command in commands:
83+
print(command)
8284

8385
if args.workers > 0:
8486
from concurrent.futures import ThreadPoolExecutor
@@ -87,3 +89,5 @@ def autotag() -> str:
8789
for command in commands:
8890
executor.submit(run_experiment, command)
8991
executor.shutdown(wait=True)
92+
else:
93+
print("not running the experiments because --workers is set to 0; just printing the commands to run")

docs/blog/.authors.yml

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
costa:
2+
name: Costa Huang
3+
description: Lead dev of CleanRL
4+
avatar: https://avatars.githubusercontent.com/u/5555347

docs/blog/.meta.yml

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# comments: true
2+
# hide:
3+
# - feedback

docs/blog/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Blog

0 commit comments

Comments
 (0)