Nonparametric Off-Policy Policy Gradient

Tosatto, S.; Carvalho, J.; Abdulsamad, H.; Peters, J. (2020). A Nonparametric Off-Policy Policy Gradient, Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS). https://arxiv.org/abs/2001.02435

Nonparametric Off-Policy Policy Gradient (NOPG) is a Reinforcement Learning algorithm for off-policy datasets. The gradient estimate is computed in closed-form by modelling the transition probabilities with Kernel Density Estimation (KDE) and the reward function with Kernel Regression.

The current version of NOPG supports stochastic and deterministic policies, and works for continuous state and action spaces. An extension to discrete spaces will be made available in the near future.

It supports environments with openAI-gym like interfaces.

Link to CartPole video: https://www.youtube.com/watch?v=LKtnzc4TV98

Install

The code was tested with Python 3.7.6 in a machine with Ubuntu 18.04 and uses PyTorch for automatic gradient computation. We recommend using a GPU and large RAM to improve the training speed.

We assume you have miniconda3 installed in /home/$USER/miniconda3.

Install all dependencies with

bash setup.sh

Run

The easiest way to create an experiment is to follow the template in examples/template.py or directly look at the examples in the examples directory.

Example

Swing-up Pendulum with Uniformly sampled dataset and Deterministic Policy

Activate the virtual environment first and run the code with

python examples/pendulum_nopg_d_uniform.py

You should get roughly a non-discounted return close to -500.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
datasets		datasets
examples		examples
img		img
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nonparametric Off-Policy Policy Gradient

Install

Run

Example

About

Releases

Packages

Languages

jacarvalho/nopg

Folders and files

Latest commit

History

Repository files navigation

Nonparametric Off-Policy Policy Gradient

Install

Run

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages