vwxyzjn · vwxyzjn · Nov 14, 2022 · Nov 6, 2022 · Nov 6, 2022 · Nov 6, 2022
diff --git a/.github/issue_template.md b/.github/issue_template.md
@@ -3,7 +3,8 @@
 
 ## Checklist
 - [ ] I have installed dependencies via `poetry install` (see [CleanRL's installation guideline](https://docs.cleanrl.dev/get-started/installation/).
-- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo (required)
+- [ ] I have checked that there is no similar [issue](https://github.com/vwxyzjn/cleanrl/issues) in the repo.
+- [ ] I have checked the [documentation site](https://docs.cleanrl.dev/) and found not relevant information in [GitHub issues](https://github.com/vwxyzjn/cleanrl/issues).
 
 ## Current Behavior
 <!--- Tell us what happens instead of the expected behavior -->

diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -16,16 +16,16 @@
 - [ ] I have updated the documentation and previewed the changes via `mkdocs serve`.
 - [ ] I have updated the tests accordingly (if applicable).
 
-If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR. 
+If you are adding new algorithm variants or your change could result in performance difference, you may need to (re-)run tracked experiments. See https://github.com/vwxyzjn/cleanrl/pull/137 as an example PR. 
 - [ ] I have contacted [vwxyzjn](https://github.com/vwxyzjn) to obtain access to the [openrlbenchmark W&B team](https://wandb.ai/openrlbenchmark) (**required**).
 - [ ] I have tracked applicable experiments in [openrlbenchmark/cleanrl](https://wandb.ai/openrlbenchmark/cleanrl) with `--capture-video` flag toggled on (**required**).
 - [ ] I have added additional documentation and previewed the changes via `mkdocs serve`.
     - [ ] I have explained note-worthy implementation details.
     - [ ] I have explained the logged metrics.
     - [ ] I have added links to the original paper and related papers (if applicable).
-    - [ ] I have added links to the PR related to the algorithm.
+    - [ ] I have added links to the PR related to the algorithm variant.
     - [ ] I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
-    - [ ] I have added the learning curves (in PNG format with `width=500` and `height=300`).
+    - [ ] I have added the learning curves (in PNG format).
     - [ ] I have added links to the tracked experiments.
     - [ ] I have updated the overview sections at the [docs](https://docs.cleanrl.dev/rl-algorithms/overview/) and the [repo](https://github.com/vwxyzjn/cleanrl#overview)
 - [ ] I have updated the tests accordingly (if applicable).

diff --git a/.github/workflows/pre-commit.yml b/.github/workflows/pre-commit.yml
@@ -22,3 +22,5 @@ jobs:
         with:
           python-version: ${{ matrix.python-version }}
       - uses: pre-commit/[email protected]
+        with:
+          extra_args: --hook-stage manual --all-files
diff --git a/README.md b/README.md
@@ -122,6 +122,7 @@ You may also use a prebuilt development environment hosted in Gitpod:
 | |  [`ppo_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_continuous_action.py),   [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_continuous_actionpy)
 | |  [`ppo_atari_lstm.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_lstm.py),   [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_lstmpy)
 | |  [`ppo_atari_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool.py),   [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_envpoolpy)
+| | [`ppo_atari_envpool_xla_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_envpool_xla_jax.py), [docs](/rl-algorithms/ppo/#ppo_atari_envpool_xla_jaxpy)
 | |  [`ppo_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_procgen.py),   [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_procgenpy)
 | |  [`ppo_atari_multigpu.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_atari_multigpu.py),  [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_multigpupy)
 | | [`ppo_pettingzoo_ma_atari.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_pettingzoo_ma_atari.py),  [docs](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy)
@@ -138,14 +139,18 @@ You may also use a prebuilt development environment hosted in Gitpod:
 | ✅ [Twin Delayed Deep Deterministic Policy Gradient (TD3)](https://arxiv.org/pdf/1802.09477.pdf) |  [`td3_continuous_action.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action.py),  [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_actionpy) |
 |  | [`td3_continuous_action_jax.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/td3_continuous_action_jax.py),  [docs](https://docs.cleanrl.dev/rl-algorithms/td3/#td3_continuous_action_jaxpy) |
 | ✅ [Phasic Policy Gradient (PPG)](https://arxiv.org/abs/2009.04416) |  [`ppg_procgen.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppg_procgen.py),  [docs](https://docs.cleanrl.dev/rl-algorithms/ppg/#ppg_procgenpy) |
+| ✅ [Random Network Distillation (RND)](https://arxiv.org/abs/1810.12894) |  [`ppo_rnd_envpool.py`](https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/ppo_rnd_envpool.py),  [docs](/rl-algorithms/ppo-rnd/#ppo_rnd_envpoolpy) |
+
 
 ## Open RL Benchmark
 
-CleanRL has a sub project called Open RL Benchmark (https://benchmark.cleanrl.dev/), where we have tracked thousands of experiments across domains. The benchmark is interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. Here are some screenshots.
+To make our experimental data transparent, CleanRL participates in a related project called [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark), which contains tracked experiments from popular DRL libraries such as ours, [Stable-baselines3](https://github.com/DLR-RM/stable-baselines3), [openai/baselines](https://github.com/openai/baselines), [jaxrl](https://github.com/ikostrikov/jaxrl), and others. 
+
+Check out https://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (see [repo](https://github.com/openrlbenchmark/openrlbenchmark)).
 
+![](docs/static/o1.png)
 ![](docs/static/o2.png)
 ![](docs/static/o3.png)
-![](docs/static/o1.png)
 
 
 ## Support and get involved

diff --git a/benchmark/ppo.sh b/benchmark/ppo.sh
@@ -73,7 +73,7 @@ xvfb-run -a poetry run python -m cleanrl_utils.benchmark \
     --workers 1
 
 
-poetry install --with envpool
+poetry install --with envpool,jax
 poetry run python -m cleanrl_utils.benchmark \
     --env-ids Alien-v5 Amidar-v5 Assault-v5 Asterix-v5 Asteroids-v5 Atlantis-v5 BankHeist-v5 BattleZone-v5 BeamRider-v5 Berzerk-v5 Bowling-v5 Boxing-v5 Breakout-v5 Centipede-v5 ChopperCommand-v5 CrazyClimber-v5 Defender-v5 DemonAttack-v5 \
     --command "poetry run python ppo_atari_envpool_xla_jax.py --track --wandb-project-name envpool-atari --wandb-entity openrlbenchmark" \

diff --git a/cleanrl_utils/benchmark.py b/cleanrl_utils/benchmark.py
@@ -17,11 +17,11 @@ def parse_args():
     parser.add_argument("--num-seeds", type=int, default=3,
         help="the number of random seeds")
     parser.add_argument("--start-seed", type=int, default=1,
-        help="the number of random seeds")
-    parser.add_argument('--workers', type=int, default=0,
-        help='the number of eval workers to run benchmark experimenets (skips evaluation when set to 0)')
+        help="the number of the starting seed")
+    parser.add_argument("--workers", type=int, default=0,
+        help="the number of workers to run benchmark experimenets")
     parser.add_argument("--auto-tag", type=lambda x: bool(strtobool(x)), default=True, nargs="?", const=True,
-        help="if toggled, the runs will be tagged with the output from `git describe --tags` (e.g., v1.0.0b2-11-g5db4db7)")
+        help="if toggled, the runs will be tagged with git tags, commit, and pull request number if possible")
     args = parser.parse_args()
     # fmt: on
     return args
@@ -78,7 +78,9 @@ def autotag() -> str:
         for env_id in args.env_ids:
             commands += [" ".join([args.command, "--env-id", env_id, "--seed", str(args.start_seed + seed)])]
 
-    print(commands)
+    print("======= commands to run:")
+    for command in commands:
+        print(command)
 
     if args.workers > 0:
         from concurrent.futures import ThreadPoolExecutor
@@ -87,3 +89,5 @@ def autotag() -> str:
         for command in commands:
             executor.submit(run_experiment, command)
         executor.shutdown(wait=True)
+    else:
+        print("not running the experiments because --workers is set to 0; just printing the commands to run")
diff --git a/docs/blog/.authors.yml b/docs/blog/.authors.yml
@@ -0,0 +1,4 @@
+costa:
+  name: Costa Huang
+  description: Lead dev of CleanRL
+  avatar: https://avatars.githubusercontent.com/u/5555347
diff --git a/docs/blog/.meta.yml b/docs/blog/.meta.yml
@@ -0,0 +1,3 @@
+# comments: true
+# hide:
+#   - feedback
diff --git a/docs/blog/index.md b/docs/blog/index.md
@@ -0,0 +1 @@
+# Blog