Implementation of TWOSOME (True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning).
Arxiv link: https://arxiv.org/abs/2401.14151
1. Create a conda environment
conda create -n twosome python=3.9
conda activate twosome
2. Install pytorch
pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117
3. Install requirements
pip install setuptools==65.5.0
pip install -r requirements.txt
4. Install overcooked environment
cd gym-macro-overcooked
pip install -e .
cd ..
5. Install virtual home environment
cd virtualhome
pip install -e .
# 1. For Tomato Salad environment
sh scripts/tomato_salad_ppo_llm.sh # train TWOSOME in Tomato Salad environment
# 2. For Tomato Lettuce Salad environment
sh scripts/tomato_salad_lettuce_ppo_llm.sh # train TWOSOME in Tomato Lettuce Salad environment
# 3. For Food Preparation environment
sh scripts/food_preparation_ppo_llm.sh # train TWOSOME in Food Preparation environment
# 4. For Entertainment environment
sh scripts/entertainment_ppo_llm.sh # train TWOSOME in Entertainment environment
You can change the attribute, 'normalization-mode' in [sum, toekn, word], corresponding to TWOSOME without normalization, TWOSOME with token normalization and TWOSOME with word normalization
1. For food preparation environment
sh scripts/food_preparation_ppo_llm_inference.sh
2. For entertainment environment
sh scripts/entertainment_ppo_llm_inference.sh
Overcooked env is adapted from https://github.com/WeihaoTan/gym-macro-overcooked.
VirtualHome is adapted from https://github.com/xavierpuigf/virtualhome.
If you find our work useful, please consider citing us!
@article{tan2024true,
title={True Knowledge Comes from Practice: Aligning Large Language Models with Embodied Environments via Reinforcement Learning},
author={Weihao Tan and Wentao Zhang and Shanqi Liu and Longtao Zheng and Xinrun Wang and Bo An},
journal={arXiv preprint arXiv:2401.14151},
year={2024}
}