ra #7

jamesthesnake · 2023-04-12T22:15:13Z

Description

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See vwxyzjn#137 as an example PR.

Adding a paper that usages CleanRL for implementation.

* Prepare for v1.0.0 release * add missing documentation * update docs on open rl benchmark * change site name * update documentation * point reproducibility script to master * revert mkdocs change * add docs * Check requirements.txt exports in pre-commit CI * check all files in pre-commit * fix docs for dqn * v1.0.0 blog (#315) * v1.0.0 blog * add changes * support insider * properly link github usernames * Add a note on support gymnasium * fix typo * add link for google jax * fix typo * add release note and highlight jax's performance * fix typo * highlight performance * Address comments * update blog * update description * remove words * quick change * quick change * quick change * fix typo * omit `dqn_jax.py` from the announcement

* Fix target update freq * Ensure target is update after learning start

…-upgrade. (#326)

* Add Gymnasium and dependencies * Implement Gymnasium-compliant PPO script * Ensure pre-commit passes * Fix CI, add a `gymnasium_support` folder * update lock files * add dependencies * update requirements.txt; fix pre-commit * update poetry files * Support dm control action spaces * add dm_control support * Enable num_envs>1 * Enable auto-install of torch based on CUDA version * Fix pre-commit * bump torch version * bump wandb version * change key for mujoco_py installation * update CI * update docs * downgrade torch * update docs * update teset cases * set default env = HalfCheetah-v4 * directly replace `ppo_continuous_action.py` * deprecate pybullet dependency in ppo * remove pybullet test case * support video recording to wandb * update docs * update depdency for test cases * update test cases and add dm_control tests * update docs * update mkdocs base * revert doc changes * fix dm_control test cases * quick docs * fix tests on CI * fix test case * fix CI * Fix CI * update mujoco dependency * Fix CI * fix CI * remote unused seed Co-authored-by: Daniel Tan <[email protected]> Co-authored-by: Costa Huang <[email protected]>

jax.scan for ppo + atari + envpool and corresponding docs and tests

* ppo jax scan * reformatting * ppo jax scan: log only last loss metrics * ppo jax scan: reformatting * code clean up * additional cleanup + reformat + compute gae test * reformat gae test file * documentation for ppo_jax_scan * shorten doc * Add docs * update docs and add the jax gae test in test_envpool * add png for learning curve vs wall-clock time * typo fixed * change end-to-end test case * fix doc links Co-authored-by: Rujikorn <[email protected]> Co-authored-by: Costa Huang <[email protected]>

* prototype jax + c51 * formatting changes * bug fix target_pmfs calculation and adapt to use TrainState API * Formatting changes and refactoring * c51 + atari + jax prototype * formatting changes * jit the action selection for further speed-up * update c51 benchmark script with c51_jax.py * update c51 benchmark script to include jax benchmarks * fix optimizer hyperparameter and improve performance * use jax.lax.fori_loop to improve jit compilation speed * update benchmark script and projection in atari script * add documentation * update report links to point to openrlbenchmark copy * add c51_jax links in README * fix document links in c51_jax scripts * doc fixes

* Add test cases * fix test cases

* adding rpo implementation * Add installation warning * add perturbed gaussian for action distribution * update for running with Slurm and manual pr tag. * generate results with RPO and PPO for 8M timesteps * remove temporary files created for experiments * add rpo docs * remove temporary file created for experiments * update with gym results * preview docs change * remove unncessary code difference * update to fix dm_control table * minimize code difference * address comments in pr-331 * fix table format * fix table format * add hyperparameter alpha * add code difference highlight * add benchmark snippets * fix some text and update mujoco v2 results * incorporate rpo alpha * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update with tracked exp * update removing iframe * Fix docs preview * add deprecation warning * update recommendation * quick change * small fix * fix typos Co-authored-by: Costa Huang <[email protected]>

* fix typo * c51 typo

* initial commit * pre-commit * Add hub integration * pre-commit * use CommitOperation * Fix pre-commit * refactor * push changes * refactor * fix pre-commit * pre-commit * close the env and writer after eval * support dqn jax * pre-commit * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * address comments * update docs * support dqn_atari_jax * bug fix and docs * Add cleanrl to the hf's `metadata` * include huggingface integration * test for enjoy.py * bump version, pip install extra hack python-poetry/poetry#4842 (comment) * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * update docs * update pre-commit * quick fix * bug fix * lazy load modules to avoid dependency issues * Add huggingface shields * Add emoji * Update docs * pre-commit * Update docs * Update docs * fix: use `algorithm_variant_filename` in model card reproduction script * typo fix * feat: add hf support for c51 * formatting fix * support pulling variant depdencies directly * support model saving for `ppo_atari_envpool_xla_jax_scan` * support `ppo_atari_envpool_xla_jax_scan` * quick change * support 'c51_jax' * formatting fix * support capture video * Add notebook * update docs * support `c51_atari` and `c51_atari_jax` * typo fix * add c51 to zoo docs * add colab badge * fix broken colab svg * pypi release * typo fix * update pre-commit * remove hf-integration reference Co-authored-by: Lucain <[email protected]> Co-authored-by: Kinal <[email protected]> Co-authored-by: Kinal Mehta <[email protected]>

* add draft of SAC discrete implementation * run pre-commit * Use log softmax instead of author's log-pi code * Revert to cleanrl SAC delay implementation (it's more stable) * Remove docstrings and duplicate code * Use correct clipreward wrapper * fix bug in log softmax calculation * adhere to cleanrl log_prob naming * fix bug in entropy target calculation * change layer initialization to match existing cleanrl codebase * working minimal diff version * implement original learning update frequency * parameterize the entropy scale for autotuning * add benchmarking script * rename target entropy factor and set new default value * add docs draft * fix SAC-discrete links to work pre merge * add preliminary result table for SAC-discrete * clean up todos and add header * minimize diff between sac_atari and sac_continuous * add sac-discrete end2end test * SAC-discrete docs rework * Update SAC-discrete @100k results * Fix doc links and unify naming in code * update docs * fix target update frequency (see PR #323) * clarify comment regarding CNN encoder sharing * fix benchmark installation * fix eps in minimal diff version and improve code readability * add docs for eps and finalize code * use no_grad for actor Q-vals and re-use action-probs & log-probs in alpha loss * update docs for new code and settings * fix links to point to main branch * update sac-discrete training plots * new sac-d training plots * update results table and fix link * fix pong chart title * add Jimmy Ba name as exception to code spell check * change target_entropy_scale default value to same value as experiments * remove blank line at end of pre-commit Co-authored-by: Costa Huang <[email protected]>

* Added Polyak update rate (tau) for soft target network updates (vanilla and Atari implementations) * Updated DQN docs with Polyak update info * Quick doc fix * Implemented Polyak update for dqn_atari_jax.py * add tau for `dqn_jax.py`, too Co-authored-by: Costa Huang <[email protected]>

* Update README.md * Update examples.md * Update .pre-commit-config.yaml

* Remove unnecessary arg * remove insider build

* update docs * update test cases * update dependencies * update PR template

* Better requirements.txt docs * add c51 docs * update c51 docs * update dqn docs * update optuna docs * update docs for ddpg, sac, and td3 * add dependencies * remove poetry groups * remove poetry groups for tests * Update PPO, RND, and RPO docs * update PPG docs * update docs * trigger CI * fix CI * Fix depdenencies * Move up pip installation

vwxyzjn and others added 28 commits November 5, 2022 18:58

Remove unindented test script (#312)

0a60a0c

Update cleanrl-supported-papers-projects.md (#316)

19a0907

Adding a paper that usages CleanRL for implementation.

Fix DQN target update frequency (#323)

c515aef

* Fix target update freq * Ensure target is update after learning start

Updated the pip install poetry lines in the docker files to contain -…

9877c0c

…-upgrade. (#326)

Update dqn.md (#329)

cb2b746

Using jax scan for PPO + atari + envpool XLA (#328)

2dd73af

jax.scan for ppo + atari + envpool and corresponding docs and tests

chore: simply ppo gae code (#334)

d67ae0c

update paper link to point to JMLR version (#336)

639a5ef

update dqn-jax docs with CPU experiments (#335)

3dab404

bug: incorrect logic in GAE calculation (#337)

95fcdd7

Add test cases (#339)

019bff0

* Add test cases * fix test cases

docs fix for ddpg and td3 to include jax implementation (#341)

3b41901

Hotfix for #331 (#342)

94a44b5

Proper description of v_min and v_max in C51 parser (#343)

3f5535c

* fix typo * c51 typo

fix: lowercase Ba name in pre-commit(#349)

3728592

fix pre-commit (#351)

d0d6bae

Remove stale algorithm reference to ppo_lstm_memory_env.py (#357)

2e41da2

* Update README.md * Update examples.md * Update .pre-commit-config.yaml

Remove unnecessary arg in SAC (#362)

f5c6cda

* Remove unnecessary arg * remove insider build

Better contribution guide (#368)

f7a9b9f

* update docs * update test cases * update dependencies * update PR template

jamesthesnake merged commit 741376f into jamesthesnake:boss Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ra #7

ra #7

jamesthesnake commented Apr 12, 2023

ra #7

ra #7

Conversation

jamesthesnake commented Apr 12, 2023

Description

Types of changes

Checklist: