Skip to content

Code for the paper: "Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning"

Notifications You must be signed in to change notification settings

NasikNafi/apdac

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Attention-based Partially Decoupled Actor-Critic (APDAC)

This repository contains the code for the following paper presented at the Deep RL Workshop, NeurIPS 2021:
Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning.

Citation

If you use this code, please cite our paper:

Nafi, N.M., Glasscock, C. and Hsu, W. (2021). Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning. In Deep Reinforcement Learning Workshop, NeurIPS 2021.

Our code is largely based on this implementation and the corresponding paper is available here. Their implementation used an open sourced PyTorch implementation of PPO.

Dependencies

Run the following to create the environment and install the required dependencies:

conda create -n apdac python=3.7
conda activate apdac

cd apdac
pip install -r requirements.txt

pip install procgen

pip install protobuf==3.20.0

git clone https://github.com/openai/baselines.git
cd baselines 
python setup.py install 

Instructions

To Train APDAC on CoinRun

python train.py --env_name coinrun --algo apdac

To Train IDAAC on CoinRun

python train.py --env_name coinrun --algo idaac

To Train PPO on CoinRun

python train.py --env_name coinrun --algo ppo --ppo_epoch 3

APDAC uses the same set of hyperparameters for all environments. Please refer to the paper for the details and the experimental results. APDAC significantly outperforms the PPO baseline and achieves comparable performance with respect to the recent state-of-the-art method IDAAC on the challenging RL generalization benchmark Procgen. Thus, APDAC demonstrates similar generalization benefits of a fully decoupled approach while reducing the overall parameters and computational cost.

About

Code for the paper: "Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages