#Robust Learning to Simulate (RL2S)
- Install MuJoCo 1.50 at
~/.mujoco/mjpro150
and copy your license key to~/.mujoco/mjkey.txt
pip install -r requirements.txt
All the data that RL2S needed is saved in ./l2s_dataset
.
- Training data are saved as trajectories in
./l2s_dataset/hopper/train_data/dataset_policy_{index}.pkl
. - Train policies, test policies and corresponding datasets are saved in
./l2s_dataset/hopper
. You can specify the data used to train and evaluate the simuator throughtrain_data_index
andtest_data_index
inl2s_demos_listing.yaml
. - Considering the dataset is large, you should generate them by yourself.
- To generate the policies, you should run
python run_online_sac.py -e exp_specs/online_hopper.yaml --nosrun -c 0
. - Then you can sample some policies training and test. You should place the policies into different directories as shown in this project and the name of the policies should be
policy_{index}.pkl
. - Thereafter, run
python3 utils_script.py -d hopper -t 2 -g 0
to sample data for each policy. - Finally, run
python3 utils_script.py -d hopper -t 3 -g 0
to compute the mean and std for its observation which will be used inexp_specs/l2s_hopper.yaml
.
- To generate the policies, you should run
- The learned simulators which will be used for policy ranking and offline policy improvement are saved in
./l2s_dataset/leaned_dynamic/hopper
- All the states that have appeared in the expert datasets are saved in
./l2s_dataset/end_data/hopper.pkl
##Running
Before running experiments, you should check the index in l2s_demo_listings.yaml
corresponds to the index of the policies in l2s_dataset
To run RL2S, please use a command like this, and the use_robust
in l2s_hopper.yaml
should be set to true. During training, the AVD, MVD will be logged in ./l2s_logs/RL2S/.../progress.csv
python3 run_l2s.py -e exp_specs/l2s_hopper.yaml --nosrun -c 0
For GAIL, just set the use_robust
to false.
Please use a command like this to get the performance of the policy in the learned simulator.
python3 utils_script.py -d hopper -t 0 -g 0
Please use a command like this to compute the kendall rank correlation coefficient and nDCG.
python3 utils_script.py -d hopper -t 1
For policy improvement, run the command below.
python3 run_l2s_downstream.py -e exp_specs/l2s_downstream_hopper.yaml --nosrun -c 2