GitHub - apexrl/RL2S: Implementation of the ECML-PKDD 2021 paper "Learning to Build High-fidelity and Robust Environment Models"

#Robust Learning to Simulate (RL2S)

Requirements

Install MuJoCo 1.50 at ~/.mujoco/mjpro150 and copy your license key to ~/.mujoco/mjkey.txt
pip install -r requirements.txt

Data

All the data that RL2S needed is saved in ./l2s_dataset.

Training data are saved as trajectories in ./l2s_dataset/hopper/train_data/dataset_policy_{index}.pkl.
Train policies, test policies and corresponding datasets are saved in ./l2s_dataset/hopper. You can specify the data used to train and evaluate the simuator through train_data_index and test_data_index in l2s_demos_listing.yaml.
Considering the dataset is large, you should generate them by yourself.
- To generate the policies, you should run python run_online_sac.py -e exp_specs/online_hopper.yaml --nosrun -c 0.
- Then you can sample some policies training and test. You should place the policies into different directories as shown in this project and the name of the policies should be policy_{index}.pkl.
- Thereafter, run python3 utils_script.py -d hopper -t 2 -g 0 to sample data for each policy.
- Finally, run python3 utils_script.py -d hopper -t 3 -g 0 to compute the mean and std for its observation which will be used in exp_specs/l2s_hopper.yaml.
The learned simulators which will be used for policy ranking and offline policy improvement are saved in ./l2s_dataset/leaned_dynamic/hopper
All the states that have appeared in the expert datasets are saved in ./l2s_dataset/end_data/hopper.pkl

##Running Before running experiments, you should check the index in l2s_demo_listings.yaml corresponds to the index of the policies in l2s_dataset

Policy Value Difference Evaluation

To run RL2S, please use a command like this, and the use_robust in l2s_hopper.yaml should be set to true. During training, the AVD, MVD will be logged in ./l2s_logs/RL2S/.../progress.csv

python3 run_l2s.py -e exp_specs/l2s_hopper.yaml --nosrun -c 0

For GAIL, just set the use_robust to false.

Policy Ranking

Please use a command like this to get the performance of the policy in the learned simulator.

python3 utils_script.py -d hopper -t 0 -g 0

Please use a command like this to compute the kendall rank correlation coefficient and nDCG.

python3 utils_script.py -d hopper -t 1

Policy Improvement

For policy improvement, run the command below.

python3 run_l2s_downstream.py -e exp_specs/l2s_downstream_hopper.yaml --nosrun -c 2

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
exp_specs		exp_specs
l2s_dataset		l2s_dataset
rlkit		rlkit
rllab		rllab
run_scripts		run_scripts
.gitignore		.gitignore
Readme.md		Readme.md
gendata.py		gendata.py
l2s_demos_listing.yaml		l2s_demos_listing.yaml
requirements.txt		requirements.txt
run_l2s.py		run_l2s.py
run_l2s_downstream.py		run_l2s_downstream.py
run_online_sac.py		run_online_sac.py
utils_script.py		utils_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements

Data

Policy Value Difference Evaluation

Policy Ranking

Policy Improvement

About

Releases

Packages

Languages

apexrl/RL2S

Folders and files

Latest commit

History

Repository files navigation

Requirements

Data

Policy Value Difference Evaluation

Policy Ranking

Policy Improvement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages