This repository is the interface for the offline reinforcement learning benchmark NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning.
The NeoRL repository contains datasets for training, tools for validation and corresponding environments for testing the trained policies. Current datasets are collected from three open-source environments, i.e., CityLearn, FinRL, IB, and three Gym-MuJoCo tasks. We use SAC to train on each of these domains, and then use policies around 25%, 50% and 75% of the highest episode return to generate three-level quality of datasets respectively for each task. Since the action spaces of these domains are continuous, the policy output is the mean and stdev of a Gaussian distribution. During data collection, with 80% chance we take the mean of the Gaussian policy and with 20% probability to sample from the trained policies to reflect the mistakes of human operators in real-world systems. The entire datasets can be reproduced with this repo. Besides, we also provide a sales promotion task.
More about the NeoRL benchmark can be found at http://polixir.ai/research/neorl and the following paper
Rong-Jun Qin, Songyi Gao, Xingyuan Zhang, Xiong-Hui Chen, Zewen Li, Weinan Zhang, Yang Yu. NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning.
is now accessible at https://openreview.net/forum?id=jNdLszxdtra.
The benchmark is supported by two additional repos, i.e. OfflineRL for training offline RL algorithms and d3pe for offline evaluation. Details for reproducing the benchmark can be found at here.
NeoRL interface can be installed as follows:
git clone https://github.com/Polixir/neorl.git
cd neorl
pip install -e .
After installation, CityLearn, Finance, the industrial benchmark and the sales promotion environments will be available. If you want to leverage MuJoCo in your tasks, it is necessary to obtain a license and follow the setup instructions, and then run:
pip install -e .[mujoco]
So far "HalfCheetah-v3", "Walker2d-v3", and "Hopper-v3" are supported within MuJoCo.
NeoRL uses the OpenAI Gym API. Tasks are created via the neorl.make
function. A full list of all tasks is available here.
import neorl
# Create an environment
env = neorl.make("citylearn")
env.reset()
env.step(env.action_space.sample())
# Get 100 trajectories of low level policy collection on citylearn task
train_data, val_data = env.get_dataset(data_type = "low", train_num = 100)
To facilitate setting different goals, users can provide custom reward function to neorl.make()
while creating an env. See usage and examples of neorl.make()
for more details.
As a benchmark, in order to test algorithms conveniently and quickly, each task is associated
with a small training dataset and a validation dataset by default. They can be obtained by
env.get_dataset()
. Meanwhile, for flexibility, extra parameters can be passed into get_dataset()
to get multiple pairs of datasets for benchmarking. Each task collects data using a low, medium,
or high level policy; for each task, we provide training data for a maximum of 10000 trajectories.
See usage of get_dataset()
for more details about parameter usage.
In NeoRL, training data and validation data returned by get_dataset()
function are dict
with the same format:
-
obs
: An N by observation dimensional array of current step's observation. -
next_obs
: An N by observation dimensional array of next step's observation. -
action
: An N by action dimensional array of actions. -
reward
: An N dimensional array of rewards. -
done
: An N dimensional array of episode termination flags. -
index
: An trajectory number-dimensional array. The numbers in index indicate the beginning of trajectories.
- CityLearn: Vázquez-Canteli J R, Kämpf J, Henze G, et al. "CityLearn v1.0: An OpenAI Gym Environment for Demand Response with Deep Reinforcement Learning." Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp. 356-357, 2019. paper code
- FinRL: Liu X Y, Yang H, Chen Q, et al. "FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance." arXiv preprint arXiv:2011.09607, 2020. paper code
- Industrial Benchmark: Hein D, Depeweg S, Tokic M, et al. "A Benchmark Environment Motivated by Industrial Control Problems." Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence, pp. 1-8, 2017. paper code
- MuJoCo: Todorov E, Erez T, Tassa Y. "Mujoco: A Physics Engine for Model-based Control." Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033, 2012. paper website
All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.