Revisiting Design Choices in Proximal Policy Optimization

The PPO implementation is based on the open-source code for ICLR 2020 paper "Implementation Matters in Deep RL: A Case Study on PPO and TRPO": https://github.com/implementation-matters/code-for-paper.

All our analysis and plots are produced via Jupyter notebooks in the analysis folder.

Failure mode examples

The failure mode example environments are defined in analysis/envs.py, and the experiments are analyzed in the corresponding Jupyter notebooks in the analysis folder.

Beta policy implementation

The Beta policy is implemented as a CtsBetaPolicy class in src/policy_gradients/models.py.

MuJoCo experiments

We assume that the user has a machine with MuJoCo and mujoco_py properly set up and installed. To see if MuJoCo is properly installed, try running the following:

import gym
gym.make_env("Humanoid-v2")

The dependencies are listed in the src/requirements.txt file, can be installed via pip install -r requirements.txt.

As an example, to reproduce our MuJoCo Gaussian vs beta policy comparison figures: run the following commands:

cd src/gaussian_vs_beta/
python setup_agents.py: the setup_agents.py script contains detailed experiments settings and sets up configuration files for each agent.
cd ../
Edit the NUM_THREADS variables in the run_agents.py file according to your local machine.
Train the agents: python run_agents.py gaussian_vs_beta/agent_configs
Plot results in the corresponding Jupyter notebook in the analysis folder.

For other MuJoCo comparisons, similarly see the agent setup files in src/kl_direction and src/base_exp, or create your own custom agent setup file with the desired configurations.

For more details about the code, see the README file in the original GitHub repo: https://github.com/implementation-matters/code-for-paper.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revisiting Design Choices in Proximal Policy Optimization

Failure mode examples

Beta policy implementation

MuJoCo experiments

About

Releases

Packages

Languages

License

chloechsu/revisiting-ppo

Folders and files

Latest commit

History

Repository files navigation

Revisiting Design Choices in Proximal Policy Optimization

Failure mode examples

Beta policy implementation

MuJoCo experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages