forked from vwxyzjn/cleanrl
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ra #7
Merged
Merged
ra #7
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adding a paper that usages CleanRL for implementation.
* Prepare for v1.0.0 release * add missing documentation * update docs on open rl benchmark * change site name * update documentation * point reproducibility script to master * revert mkdocs change * add docs * Check requirements.txt exports in pre-commit CI * check all files in pre-commit * fix docs for dqn * v1.0.0 blog (#315) * v1.0.0 blog * add changes * support insider * properly link github usernames * Add a note on support gymnasium * fix typo * add link for google jax * fix typo * add release note and highlight jax's performance * fix typo * highlight performance * Address comments * update blog * update description * remove words * quick change * quick change * quick change * fix typo * omit `dqn_jax.py` from the announcement
* Fix target update freq * Ensure target is update after learning start
* Add Gymnasium and dependencies * Implement Gymnasium-compliant PPO script * Ensure pre-commit passes * Fix CI, add a `gymnasium_support` folder * update lock files * add dependencies * update requirements.txt; fix pre-commit * update poetry files * Support dm control action spaces * add dm_control support * Enable num_envs>1 * Enable auto-install of torch based on CUDA version * Fix pre-commit * bump torch version * bump wandb version * change key for mujoco_py installation * update CI * update docs * downgrade torch * update docs * update teset cases * set default env = HalfCheetah-v4 * directly replace `ppo_continuous_action.py` * deprecate pybullet dependency in ppo * remove pybullet test case * support video recording to wandb * update docs * update depdency for test cases * update test cases and add dm_control tests * update docs * update mkdocs base * revert doc changes * fix dm_control test cases * quick docs * fix tests on CI * fix test case * fix CI * Fix CI * update mujoco dependency * Fix CI * fix CI * remote unused seed Co-authored-by: Daniel Tan <[email protected]> Co-authored-by: Costa Huang <[email protected]>
jax.scan for ppo + atari + envpool and corresponding docs and tests
* ppo jax scan * reformatting * ppo jax scan: log only last loss metrics * ppo jax scan: reformatting * code clean up * additional cleanup + reformat + compute gae test * reformat gae test file * documentation for ppo_jax_scan * shorten doc * Add docs * update docs and add the jax gae test in test_envpool * add png for learning curve vs wall-clock time * typo fixed * change end-to-end test case * fix doc links Co-authored-by: Rujikorn <[email protected]> Co-authored-by: Costa Huang <[email protected]>
* prototype jax + c51 * formatting changes * bug fix target_pmfs calculation and adapt to use TrainState API * Formatting changes and refactoring * c51 + atari + jax prototype * formatting changes * jit the action selection for further speed-up * update c51 benchmark script with c51_jax.py * update c51 benchmark script to include jax benchmarks * fix optimizer hyperparameter and improve performance * use jax.lax.fori_loop to improve jit compilation speed * update benchmark script and projection in atari script * add documentation * update report links to point to openrlbenchmark copy * add c51_jax links in README * fix document links in c51_jax scripts * doc fixes
* Add test cases * fix test cases
* adding rpo implementation * Add installation warning * add perturbed gaussian for action distribution * update for running with Slurm and manual pr tag. * generate results with RPO and PPO for 8M timesteps * remove temporary files created for experiments * add rpo docs * remove temporary file created for experiments * update with gym results * preview docs change * remove unncessary code difference * update to fix dm_control table * minimize code difference * address comments in pr-331 * fix table format * fix table format * add hyperparameter alpha * add code difference highlight * add benchmark snippets * fix some text and update mujoco v2 results * incorporate rpo alpha * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update docs for rpo * update with tracked exp * update removing iframe * Fix docs preview * add deprecation warning * update recommendation * quick change * small fix * fix typos Co-authored-by: Costa Huang <[email protected]>
* fix typo * c51 typo
* initial commit * pre-commit * Add hub integration * pre-commit * use CommitOperation * Fix pre-commit * refactor * push changes * refactor * fix pre-commit * pre-commit * close the env and writer after eval * support dqn jax * pre-commit * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * address comments * update docs * support dqn_atari_jax * bug fix and docs * Add cleanrl to the hf's `metadata` * include huggingface integration * test for enjoy.py * bump version, pip install extra hack python-poetry/poetry#4842 (comment) * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * Update cleanrl_utils/huggingface.py Co-authored-by: Lucain <[email protected]> * update docs * update pre-commit * quick fix * bug fix * lazy load modules to avoid dependency issues * Add huggingface shields * Add emoji * Update docs * pre-commit * Update docs * Update docs * fix: use `algorithm_variant_filename` in model card reproduction script * typo fix * feat: add hf support for c51 * formatting fix * support pulling variant depdencies directly * support model saving for `ppo_atari_envpool_xla_jax_scan` * support `ppo_atari_envpool_xla_jax_scan` * quick change * support 'c51_jax' * formatting fix * support capture video * Add notebook * update docs * support `c51_atari` and `c51_atari_jax` * typo fix * add c51 to zoo docs * add colab badge * fix broken colab svg * pypi release * typo fix * update pre-commit * remove hf-integration reference Co-authored-by: Lucain <[email protected]> Co-authored-by: Kinal <[email protected]> Co-authored-by: Kinal Mehta <[email protected]>
* add draft of SAC discrete implementation * run pre-commit * Use log softmax instead of author's log-pi code * Revert to cleanrl SAC delay implementation (it's more stable) * Remove docstrings and duplicate code * Use correct clipreward wrapper * fix bug in log softmax calculation * adhere to cleanrl log_prob naming * fix bug in entropy target calculation * change layer initialization to match existing cleanrl codebase * working minimal diff version * implement original learning update frequency * parameterize the entropy scale for autotuning * add benchmarking script * rename target entropy factor and set new default value * add docs draft * fix SAC-discrete links to work pre merge * add preliminary result table for SAC-discrete * clean up todos and add header * minimize diff between sac_atari and sac_continuous * add sac-discrete end2end test * SAC-discrete docs rework * Update SAC-discrete @100k results * Fix doc links and unify naming in code * update docs * fix target update frequency (see PR #323) * clarify comment regarding CNN encoder sharing * fix benchmark installation * fix eps in minimal diff version and improve code readability * add docs for eps and finalize code * use no_grad for actor Q-vals and re-use action-probs & log-probs in alpha loss * update docs for new code and settings * fix links to point to main branch * update sac-discrete training plots * new sac-d training plots * update results table and fix link * fix pong chart title * add Jimmy Ba name as exception to code spell check * change target_entropy_scale default value to same value as experiments * remove blank line at end of pre-commit Co-authored-by: Costa Huang <[email protected]>
* Added Polyak update rate (tau) for soft target network updates (vanilla and Atari implementations) * Updated DQN docs with Polyak update info * Quick doc fix * Implemented Polyak update for dqn_atari_jax.py * add tau for `dqn_jax.py`, too Co-authored-by: Costa Huang <[email protected]>
* Update README.md * Update examples.md * Update .pre-commit-config.yaml
* Remove unnecessary arg * remove insider build
* update docs * update test cases * update dependencies * update PR template
* Better requirements.txt docs * add c51 docs * update c51 docs * update dqn docs * update optuna docs * update docs for ddpg, sac, and td3 * add dependencies * remove poetry groups * remove poetry groups for tests * Update PPO, RND, and RPO docs * update PPG docs * update docs * trigger CI * fix CI * Fix depdenencies * Move up pip installation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Types of changes
Checklist:
pre-commit run --all-files
passes (required).mkdocs serve
.If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See vwxyzjn#137 as an example PR.
--capture-video
flag toggled on (required).mkdocs serve
.width=500
andheight=300
).