Skip to content

[NeurIPS 2024] Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators

License

Notifications You must be signed in to change notification settings

AlexWanghaoming/CBPR

Repository files navigation

🥘 CBPR

Official code for NeurIPS 2024 paper: Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators. Figure

Installation

conda create -n cbpr python=3.8
conda activate cbpr
pip install -r requirements.txt

Training

Training of baseline algorithms and MTP agents. Note that ALL agents are implemented based on PPO.

Self-play (SP)

Train SP agent in Cramped Room layout:

cd algorithms/baselines
sh train_sp.sh

Fictitious Co-Play (FCP)

FCP is introduced in Collaborating with Humans without Human Data. Train FCP agent in Cramped Room layout:

cd algorithms/baselines
sh train_fcp.sh

Behavioral Cloning Play (BCP)

BCP is introduced in On the Utility of Learning about Humans for Human-AI Coordination. To train BCP agent, please firstly train behavioral cloning model using ./algorithms/bc/bc.sh. Next, train BCP agent using:

cd algorithms/baselines
sh train_bcp.sh

Meta-Task Playing (MTP)

CBPR is built upon MTP. To train MTP agents by pairing them with rule-based agents, run:

cd algorithms
sh mtp_scriptedPolicy.sh

Evaluation

Collaborating with agents that switch policies

Pair BCP agent with agent that switches policies every 100 timesteps in Cramped Room layout.

python experiments/exp1/evaluate_scriptPolicy.py --layout cramped_room --num_episodes 50 --mode intra --switch_human_freq 100 --seed 1 --algorithm BCP

Pair FCP agent with agent that switches policies every 2 episodes in Cramped Room layout.

python experiments/exp1/evaluate_scriptPolicy.py --layout cramped_room --num_episodes 50 --mode inter --switch_human_freq 2 --seed 1 --algorithm FCP

Pair CBPR with agent that switches policies every 200 timesteps in Cramped Room layout.

python experiments/exp1/okr_scriptedPolicy.py --layout cramped_room --num_episodes 50 --mode intra --switch_human_freq 200 --seed 1 --Q_len 20 --rho 0.9

Collaborating with agents using various skill levels

Pair BCP agent with agent using high skill level in Cramped Room layouts.

python experiments/exp2/evaluate_skill_levels.py --layout cramped_room --num_episodes 50 --skill_level high --algorithm BCP --use_wandb

Pair FCP agent with agent using low skill level in Cramped Room layouts.

python experiments/exp2/evaluate_skill_levels.py --layout cramped_room --num_episodes 50 --skill_level low --algorithm FCP --use_wandb

Pair CBPR with agent using medium skill level in Cramped Room layouts.

python experiments/exp2/okr_skill_levels.py --layout cramped_room --num_episodes 50 --skill_level medium --Q_len 20 --rho 0.9 --use_wandb

To evaluate human-AI performance

python src/overcooked_demo/server/app.py

The Overcooked game interface can be accessed at 127.0.0.1:5001.

Acknowledgement

Our code is built upon some prior works.

Publication

@inproceedings{wangbeyond,
  title={Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators},
  author={Wang, Haoming and Tian, Zhaoming and Song, Yunpeng and Zhang, Xiangliang and Cai, Zhongmin},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems}
}

About

[NeurIPS 2024] Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators

Resources

License

Stars

Watchers

Forks

Packages

No packages published