Reinforcement learning utilities and algrithm implementations using PyTorch.
Rainy has a main
decorator which converts a function that returns rainy.Config
to a CLI app.
All function arguments are re-interpreted as command line arguments.
import os
from torch.optim import RMSprop
import rainy
from rainy import Config, net
from rainy.agents import DQNAgent
from rainy.envs import Atari
from rainy.lib.explore import EpsGreedy, LinearCooler
@rainy.main(DQNAgent, script_path=os.path.realpath(__file__))
def main(
envname: str = "Breakout",
max_steps: int = int(2e7),
replay_size: int = int(1e6),
replay_batch_size: int = 32,
) -> Config:
c = Config()
c.set_env(lambda: Atari(envname))
c.set_optimizer(
lambda params: RMSprop(params, lr=0.00025, alpha=0.95, eps=0.01, centered=True)
)
c.set_explorer(lambda: EpsGreedy(1.0, LinearCooler(1.0, 0.1, int(1e6))))
c.set_net_fn("dqn", net.value.dqn_conv())
c.replay_size = replay_size
c.replay_batch_size = replay_batch_size
c.train_start = 50000
c.sync_freq = 10000
c.max_steps = max_steps
c.eval_env = Atari(envname)
c.eval_freq = None
return c
if __name__ == "__main__":
main()
Then you can use this script like
python dqn.py --replay-batch-size=64 train --eval-render
See examples directory for more.
COMING SOON
Python >= 3.7
Algorithm | Multi Worker(Sync) | Recurrent | Discrete Action | Continuous Action | MPI support |
---|---|---|---|---|---|
DQN/Double DQN | ✔️ | ❌ | ✔️ | ❌ | ❌ |
BootDQN/RPF | ❌ | ❌ | ✔️ | ❌ | ❌ |
DDPG | ✔️ | ❌ | ❌ | ✔️ | ❌ |
TD3 | ✔️ | ❌ | ❌ | ✔️ | ❌ |
SAC | ✔️ | ❌ | ❌ | ✔️ | ❌ |
PPO | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
A2C | ✔️ | 🔺(1) | ✔️ | ✔️ | ❌ |
ACKTR | ✔️ | ❌(2) | ✔️ | ✔️ | ❌ |
AOC | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
PPOC | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
ACTC(3) | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
(1): Very unstable
(2): Needs https://openreview.net/forum?id=HyMTkQZAb implemented
(3): Incomplete implementation. β is often too high.
- intrinsic-rewards
- Contains an implementation of RND(Random Network Distillation)
- http://proceedings.mlr.press/v48/mniha16.pdf , https://arxiv.org/abs/1602.01783 (A3C, original version)
- https://blog.openai.com/baselines-acktr-a2c/ (A2C, synchronized version)
- https://arxiv.org/abs/1609.05140 (DQN-like option critic)
- https://arxiv.org/abs/1709.04571 (A3C-like option critic called A2OC)
Thank you!
https://github.com/openai/baselines
https://github.com/ikostrikov/pytorch-a2c-ppo-acktr
https://github.com/ShangtongZhang/DeepRL
https://github.com/chainer/chainerrl
https://github.com/Thrandis/EKFAC-pytorch (for ACKTR)
https://github.com/jeanharb/a2oc_delib (for AOC)
https://github.com/mklissa/PPOC (for PPOC)
https://github.com/sfujim/TD3 (for DDPG and TD3)
https://github.com/vitchyr/rlkit (for SAC)
This project is licensed under Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).