Model-Based Policy Optimization

Code to reproduce the experiments in When to Trust Your Model: Model-Based Policy Optimization.

Installation

Install MuJoCo 1.50 at ~/.mujoco/mjpro150 and copy your license key to ~/.mujoco/mjkey.txt
Clone mbpo

git clone --recursive https://github.com/jannerm/mbpo.git

Create a conda environment and install mbpo

cd mbpo
conda env create -f environment/gpu-env.yml
conda activate mbpo
pip install -e viskit
pip install -e .

Usage

Configuration files can be found in examples/config/.

mbpo run_local examples.development --config=examples.config.halfcheetah.0 --gpus=1 --trial-gpus=1

Currently only running locally is supported.

New environments

To run on a different environment, you can modify the provided template. You will also need to provide the termination function for the environment in mbpo/static. If you name the file the lowercase version of the environment name, it will be found automatically. See hopper.py for an example.

Logging

This codebase contains viskit as a submodule. You can view saved runs with:

viskit ~/ray_mbpo --port 6008

assuming you used the default log_dir.

Hyperparameters

The rollout length schedule is defined by a length-4 list in a config file. The format is [start_epoch, end_epoch, start_length, end_length], so the following:

'rollout_schedule': [20, 100, 1, 5]

corresponds to a model rollout length linearly increasing from 1 to 5 over epochs 20 to 100.

If you want to speed up training in terms of wall clock time (but possibly make the runs less sample-efficient), you can set a timeout for model training (max_model_t, in seconds) or train the model less frequently (every model_train_freq steps).

Comparing to MBPO

If you would like to compare to MBPO but do not have the resources to re-run all experiments, the learning curves found in Figure 2 of the paper (plus on the Humanoid environment) are available in this shared folder. See plot.py for an example of how to read the pickle files with the results.

Reference

@inproceedings{janner2019mbpo,
  author = {Michael Janner and Justin Fu and Marvin Zhang and Sergey Levine},
  title = {When to Trust Your Model: Model-Based Policy Optimization},
  booktitle = {Advances in Neural Information Processing Systems},
  year = {2019}
}

Acknowledgments

The underlying soft actor-critic implementation in MBPO comes from Tuomas Haarnoja and Kristian Hartikainen's softlearning codebase. The modeling code is a slightly modified version of Kurtland Chua's PETS implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
environment		environment
examples		examples
mbpo		mbpo
softlearning		softlearning
viskit @ d7c153a		viskit @ d7c153a
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model-Based Policy Optimization

Installation

Usage

New environments

Logging

Hyperparameters

Comparing to MBPO

Reference

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

jannerm/mbpo

Folders and files

Latest commit

History

Repository files navigation

Model-Based Policy Optimization

Installation

Usage

New environments

Logging

Hyperparameters

Comparing to MBPO

Reference

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages