The PPO implementation is based on the open-source code for ICLR 2020 paper "Implementation Matters in Deep RL: A Case Study on PPO and TRPO": https://github.com/implementation-matters/code-for-paper.
All our analysis and plots are produced via Jupyter notebooks in the analysis
folder.
The failure mode example environments are defined in analysis/envs.py
, and the experiments are analyzed in the corresponding Jupyter notebooks in the analysis
folder.
The Beta policy is implemented as a CtsBetaPolicy
class in src/policy_gradients/models.py
.
We assume that the user has a machine with MuJoCo and mujoco_py properly set up and installed. To see if MuJoCo is properly installed, try running the following:
import gym
gym.make_env("Humanoid-v2")
The dependencies are listed in the src/requirements.txt
file, can be installed via pip install -r requirements.txt
.
As an example, to reproduce our MuJoCo Gaussian vs beta policy comparison figures: run the following commands:
cd src/gaussian_vs_beta/
python setup_agents.py
: the setup_agents.py script contains detailed experiments settings and sets up configuration files for each agent.cd ../
- Edit the
NUM_THREADS
variables in therun_agents.py
file according to your local machine. - Train the agents:
python run_agents.py gaussian_vs_beta/agent_configs
- Plot results in the corresponding Jupyter notebook in the analysis folder.
For other MuJoCo comparisons, similarly see the agent setup files in src/kl_direction
and src/base_exp
, or create your own custom agent setup file with the desired configurations.
For more details about the code, see the README file in the original GitHub repo: https://github.com/implementation-matters/code-for-paper.