Official code for NeurIPS 2024 paper: Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators.
conda create -n cbpr python=3.8
conda activate cbpr
pip install -r requirements.txt
Training of baseline algorithms and MTP agents. Note that ALL agents are implemented based on PPO.
Train SP agent in Cramped Room layout:
cd algorithms/baselines
sh train_sp.sh
FCP is introduced in Collaborating with Humans without Human Data. Train FCP agent in Cramped Room layout:
cd algorithms/baselines
sh train_fcp.sh
BCP is introduced in On the Utility of Learning about Humans for Human-AI Coordination. To train BCP agent, please firstly train behavioral cloning model using ./algorithms/bc/bc.sh
. Next, train BCP agent using:
cd algorithms/baselines
sh train_bcp.sh
CBPR is built upon MTP. To train MTP agents by pairing them with rule-based agents, run:
cd algorithms
sh mtp_scriptedPolicy.sh
Pair BCP agent with agent that switches policies every 100 timesteps in Cramped Room layout.
python experiments/exp1/evaluate_scriptPolicy.py --layout cramped_room --num_episodes 50 --mode intra --switch_human_freq 100 --seed 1 --algorithm BCP
Pair FCP agent with agent that switches policies every 2 episodes in Cramped Room layout.
python experiments/exp1/evaluate_scriptPolicy.py --layout cramped_room --num_episodes 50 --mode inter --switch_human_freq 2 --seed 1 --algorithm FCP
Pair CBPR with agent that switches policies every 200 timesteps in Cramped Room layout.
python experiments/exp1/okr_scriptedPolicy.py --layout cramped_room --num_episodes 50 --mode intra --switch_human_freq 200 --seed 1 --Q_len 20 --rho 0.9
Pair BCP agent with agent using high skill level in Cramped Room layouts.
python experiments/exp2/evaluate_skill_levels.py --layout cramped_room --num_episodes 50 --skill_level high --algorithm BCP --use_wandb
Pair FCP agent with agent using low skill level in Cramped Room layouts.
python experiments/exp2/evaluate_skill_levels.py --layout cramped_room --num_episodes 50 --skill_level low --algorithm FCP --use_wandb
Pair CBPR with agent using medium skill level in Cramped Room layouts.
python experiments/exp2/okr_skill_levels.py --layout cramped_room --num_episodes 50 --skill_level medium --Q_len 20 --rho 0.9 --use_wandb
python src/overcooked_demo/server/app.py
The Overcooked game interface can be accessed at 127.0.0.1:5001.
Our code is built upon some prior works.
- The settings of Overcooked environment are adapted from https://github.com/HumanCompatibleAI/overcooked_ai.
- The implementation of PPO is adapted from https://github.com/Lizhi-sjtu/DRL-code-pytorch.
- The implementation of FCP agent and rule-based policies are adapted from https://github.com/samjia2000/HSP.
@inproceedings{wangbeyond,
title={Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators},
author={Wang, Haoming and Tian, Zhaoming and Song, Yunpeng and Zhang, Xiangliang and Cai, Zhongmin},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}