d3rlpy is an offline deep reinforcement learning library for practitioners and researchers.
import d3rlpy
dataset, env = d3rlpy.datasets.get_dataset("hopper-medium-v0")
# prepare algorithm
sac = d3rlpy.algos.SACConfig().create(device="cuda:0")
# train offline
sac.fit(dataset, n_steps=1000000)
# train online
sac.fit_online(env, n_steps=1000000)
# ready to control
actions = sac.predict(x)
- Documentation: https://d3rlpy.readthedocs.io
- Paper: https://arxiv.org/abs/2111.03788
Important
v2.x.x introduces breaking changes. If you still stick to v1.x.x, please explicitly install previous versions (e.g. pip install d3rlpy==1.1.1
).
- offline RL: d3rlpy supports state-of-the-art offline RL algorithms. Offline RL is extremely powerful when the online interaction is not feasible during training (e.g. robotics, medical).
- online RL: d3rlpy also supports conventional state-of-the-art online training algorithms without any compromising, which means that you can solve any kinds of RL problems only with
d3rlpy
.
- zero-knowledge of DL library: d3rlpy provides many state-of-the-art algorithms through intuitive APIs. You can become a RL engineer even without knowing how to use deep learning libraries.
- extensive documentation: d3rlpy is fully documented and accompanied with tutorials and reproduction scripts of the original papers.
- distributional Q function: d3rlpy is the first library that supports distributional Q functions in the all algorithms. The distributional Q function is known as the very powerful method to achieve the state-of-the-performance.
d3rlpy supports Linux, macOS and Windows.
$ pip install d3rlpy
$ conda install conda-forge/noarch::d3rlpy
$ docker run -it --gpus all --name d3rlpy takuseno/d3rlpy:latest bash
algorithm | discrete control | continuous control |
---|---|---|
Behavior Cloning (supervised learning) | β | β |
Neural Fitted Q Iteration (NFQ) | β | β |
Deep Q-Network (DQN) | β | β |
Double DQN | β | β |
Deep Deterministic Policy Gradients (DDPG) | β | β |
Twin Delayed Deep Deterministic Policy Gradients (TD3) | β | β |
Soft Actor-Critic (SAC) | β | β |
Batch Constrained Q-learning (BCQ) | β | β |
Bootstrapping Error Accumulation Reduction (BEAR) | β | β |
Conservative Q-Learning (CQL) | β | β |
Advantage Weighted Actor-Critic (AWAC) | β | β |
Critic Reguralized Regression (CRR) | β | β |
Policy in Latent Action Space (PLAS) | β | β |
TD3+BC | β | β |
Implicit Q-Learning (IQL) | β | β |
Decision Transformer | β | β |
- standard Q function
- Quantile Regression
- Implicit Quantile Network
d3rlpy is benchmarked to ensure the implementation quality. The benchmark scripts are available reproductions directory. The benchmark results are available d3rlpy-benchmarks repository.
import d3rlpy
# prepare dataset
dataset, env = d3rlpy.datasets.get_d4rl('hopper-medium-v0')
# prepare algorithm
cql = d3rlpy.algos.CQLConfig().create(device='cuda:0')
# train
cql.fit(
dataset,
n_steps=100000,
evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)
See more datasets at d4rl.
import d3rlpy
# prepare dataset (1% dataset)
dataset, env = d3rlpy.datasets.get_atari_transitions(
'breakout',
fraction=0.01,
num_stack=4,
)
# prepare algorithm
cql = d3rlpy.algos.DiscreteCQLConfig(
observation_scaler=d3rlpy.preprocessing.PixelObservationScaler(),
reward_scaler=d3rlpy.preprocessing.ClipRewardScaler(-1.0, 1.0),
).create(device='cuda:0')
# start training
cql.fit(
dataset,
n_steps=1000000,
evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env, epsilon=0.001)},
)
See more Atari datasets at d4rl-atari.
import d3rlpy
import gym
# prepare environment
env = gym.make('Hopper-v3')
eval_env = gym.make('Hopper-v3')
# prepare algorithm
sac = d3rlpy.algos.SACConfig().create(device='cuda:0')
# prepare replay buffer
buffer = d3rlpy.dataset.create_fifo_replay_buffer(limit=1000000, env=env)
# start training
sac.fit_online(env, buffer, n_steps=1000000, eval_env=eval_env)
Try cartpole examples on Google Colaboratory!
More tutorial documentations are available here.
Any kind of contribution to d3rlpy would be highly appreciated! Please check the contribution guide.
Channel | Link |
---|---|
Issues | GitHub Issues |
Project | Description |
---|---|
MINERVA | An out-of-the-box GUI tool for offline RL |
SCOPE-RL | An off-policy evaluation and selection library |
The roadmap to the future release is available in ROADMAP.md.
The paper is available here.
@article{d3rlpy,
author = {Takuma Seno and Michita Imai},
title = {d3rlpy: An Offline Deep Reinforcement Learning Library},
journal = {Journal of Machine Learning Research},
year = {2022},
volume = {23},
number = {315},
pages = {1--20},
url = {http://jmlr.org/papers/v23/22-0017.html}
}
This work started as a part of Takuma Seno's Ph.D project at Keio University in 2020.
This work is supported by Information-technology Promotion Agency, Japan (IPA), Exploratory IT Human Resources Project (MITOU Program) in the fiscal year 2020.