Official Pytorch implemetation of ICML2022 paper Depo (Plan Your Target and Learn Your Skills: Transferable State-Only Imitation Learning via Decoupled Policy Optimization).
Important Notes
This repository is based on ILSwiss. The code is for Mujoco experiments, if you are looking for NGSIM experiments, check here.
Implemented RL algorithms:
- Soft-Actor-Critic (SAC)
Implemented LfD algorithms:
- Adversarial methods for Inverse Reinforcement Learning
- AIRL / GAIL / FAIRL / DAC
- BC
Implemented LfO algorithms:
- BCO
- GAIfO
- DPO
Before running, assign important log and output paths in \rlkit\launchers\config.py
.
There are simple multiple processing shcheduling (we use multiple processing to clarify it with multi-processing since it only starts many independent sub-process without communication) for simple hyperparameter grid search.
The main entry is run_experiments.py
, with the assigned experiment yaml file in \exp_specs
:
python run_experiment.py -g 0 -e your_yaml_path
or python run_experiment.py -e your_yaml_path
.
When you run the run_experiments.py
, it reads the yaml file, and generate small yaml files with only one hyperparameter setting for each. In a yaml file, a script file path is assigned (see \run_scripts\
), which is specified to run the script with every the small yaml file. See \exp_specs\sac\bc.yaml
for necessary explaination of each parameter.
NOTE: all experiments, including the evaluation tasks (see \run_scripts\evaluate_policy.py
and \exp_specs\evaluate_policy
) and the render tasks, can be run under this framework by specifying the yaml file (in a multiple processes style).
Train an SAC agent and collect expert demos, or use the demo here. Then write the demo path in \demos_listing.yaml
.
-e
means the path to the yaml file, -g
means gpu id. Existing specs are the ones for producing the final results.
Baseline results are available in here.
Config files are in exp_specs/dpo_exps
. Example commands:
BCO
python run_experiment.py -e exp_specs/dpo_exps/bco_hopper_4.yaml
GAIfO
python run_experiment.py -e exp_specs/dpo_exps/gailfo_hopper_4.yaml
GAIfO-DP
python run_experiment.py -e exp_specs/dpo_exps/gailfo_dp_hopper_4.yaml
DPO (Supervised)
python run_experiment.py -e exp_specs/dpo_exps/sl_lfo_hopper_4.yaml
DPO
python run_experiment.py -e exp_specs/dpo_exps/dpo_hopper_4_weightedmle_qsa_weight.yaml
Config files are in exp_specs/ablation
. Example commands:
python run_experiment.py -e exp_specs/ablation/dpo_hopper_4_weightedmle_qsa_static_lambdah.yaml
Config files are in exp_specs/transfer_exps
and exp_specs/complex_transfer
. Example commands (remember to change the loaded policy ckpt path in the yaml file):
python run_experiment.py -e exp_specs/transfer_exps/dpo_hopper_4_weightedmle_qsa_weight.yaml
Config files are in exp_specs/rl
. Example commands:
python run_experiment.py -e exp_specs/rl/dpo_hopper.yaml
Config files are in exp_specs/rl_transfer
. Example commands (remember to change the loaded policy ckpt path in the yaml file):
python run_experiment.py -e exp_specs/rl_transfer/dpo_hopper.yaml
Config files are in exp_specs/evaluation
. Example commands (remember to change the loaded policy ckpt in evaluate_state_predictor.py
):
python run_experiment.py -e exp_specs/evaluation/eval_sp.yaml