DeepRL

Modularized implementation of popular deep RL algorithms by PyTorch. Easy switch between classical control tasks (e.g., CartPole) and Atari games with raw pixel inputs.

Implemented algorithms:

(Double/Dueling) Deep Q-Learning (DQN)
Categorical DQN (C51, Distributional DQN with KL Distance)
Quantile Regression DQN
(Continuous/Discrete) Synchronous Advantage Actor Critic (A2C)
Synchronous N-Step Q-Learning
Deep Deterministic Policy Gradient (DDPG, pixel & low-dim-state)
(Continuous/Discrete) Synchronous Proximal Policy Optimization (PPO, pixel & low-dim-state)
The Option-Critic Architecture (OC)
Action Conditional Video Prediction

Asynchronous algorithms (e.g., A3C) are removed in the current version but can be found in v0.1.

Dependency

MacOS 10.12 or Ubuntu 16.04
PyTorch v0.4.0
Python 3.6, 3.5
Core dependencies: pip install -e .
Optional: Roboschool, PyBullet

Remarks

There is a super fast DQN implementation with an async actor for data generation and an async replay buffer to transfer data to GPU. Enable this implementation by setting config.async_actor = True and using AsyncReplay. However, with atari games this fast implementation may not work in macOS. Use Ubuntu or Docker instead.
Python 2 is not officially supported after v0.3. However, I do expect most of the code will still work well in Python 2.
Although there is a setup.py, which means you can install the repo as a library, this repo is never designed to be a high-level library like Keras. Use it as your codebase instead.

Usage

examples.py contains examples for all the implemented algorithms

Dockerfile contains an example environment (w/ pybullet, w/ roboschool, w/o GPU)

Please use this bibtex if you want to cite this repo

@misc{deeprl,
  author = {Shangtong, Zhang},
  title = {Modularized Implementation of Deep RL Algorithms in PyTorch},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/ShangtongZhang/DeepRL}},
}

Curves

BreakoutNoFrameskip-v4

This is my synchronous option-critic implementation, not the original one.
The curves are not directly comparable, as many hyper-parameters are different.

RoboschoolHopper-v1

The DDPG curve is the evaluation performance, rather than online.

PongNoFrameskip-v4

Left: One-step prediction Right: Ground truth
Prediction images are sampled after 110K iterations, and I only implemented one-step training for action-conditional-video-prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 305 Commits
deep_rl		deep_rl
images		images
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
examples.py		examples.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeepRL

Dependency

Remarks

Usage

Curves

BreakoutNoFrameskip-v4

RoboschoolHopper-v1

PongNoFrameskip-v4

References

About

Releases

Packages

Languages

License

Kredaro/DeepRL

Folders and files

Latest commit

History

Repository files navigation

DeepRL

Dependency

Remarks

Usage

Curves

BreakoutNoFrameskip-v4

RoboschoolHopper-v1

PongNoFrameskip-v4

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages