Code for the paper "Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning" - Luisa Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson, published at ICML 2021.
@inproceedings{zintgraf2021hyperx,
title={Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning},
author={Zintgraf, Luisa and Feng, Leo and Lu, Cong and Igl, Maximilian and Hartikainen, Kristian and Hofmann, Katja and Whiteson, Shimon},
booktitle={International Conference on Machine Learning (ICML)},
year={2021}}
! Important !
If you use this code with your own environments, make sure to not use
np.random
in them (e.g. to generate the tasks) because it is not thread safe (and not using it may cause duplicates across threads). Instead, use the python native random function. For an example see here.
We use PyTorch for this code, and log results using TensorboardX.
The main requirements can be found in requirements.txt
.
For the MuJoCo experiments, you need to install MuJoCo.
Make sure you have the right MuJoCo version:
For the Cheetah and Ant environments, use mujoco150
.
(You can also use mujoco200
except for AntGoal,
because there's a bug which leads to 80% of the env state being zero).
The main training loop is in metalearner.py
.
The models are in /models/
,
the code for the exploration bonuses in /exploration/
,
the RL algorithms in /algorithms/
,
and the VAE in vae.py
.
To run the experiments found in the paper, execute these commands:
- Mountain Treasure:
python main.py --env-type treasure_hunt_hyperx
- Multi-Stage GridWorld:
python main.py --env-type room_hyperx
- Sparse HalfCheetahDir:
python main.py --env-type cds_hyperx
- Sparse AntGoal:
python main.py --env-type sparse_ant_goal_hyperx
- 2D Navigation Point Robot:
python main.py --env-type pointrobot_sparse_hyperx
Additional experiments, in particular baselines, are listed in main.py
.
The results will by default be saved at ./logs
,
but you can also pass a flag with an alternative directory using --results_log_dir /path/to/dir
.
Results will be written to tensorboard event files,
and some visualisations will be printed now and then.