This repository contains my final project for the Data Mining subject — Minería de Datos in Spanish, that's why mdatos
, taught in the Master's Degree In Systems And Control Engineering at UNED (Universidad Nacional de Educación a Distancia) and UCM (Universidad Complutense de Madrid), from Spain.
It is an implementation of several tabular Reinforcement Learning algorithms, which are then applied to OpenAI Gym environments. The algorithms and environments implemented are the following:
Environment | Sarsa | Q-Learning | n-step Sarsa | Dyna-Q |
---|---|---|---|---|
NChain-v0 | ✔️ | ✔️ | ✔️ | ✔️ |
FrozenLake-v0 | ✔️ | ✔️ | ✔️ | ✔️ |
CartPole-v0 | ✔️ | ✔️ | ✔️ | ✖️ |
MountainCar-v0 | ✔️ | ✔️ | ✔️ | ✖️ |
The goal of this repo is purely educational:
- For more elaborated and complicated RL algorithms, see cleanrl.
- For an intuitive, easy-to-use library widely used in research, see stable-baselines3 and rl-baselines3-zoo.
A Jupyter Notebook written in Spanish that provides basic explanations of RL concepts making use of this repo can be found here.
The bibliography I used is probably the most common entry point if you want to learn Reinforcement Learning.
In order to train and evaluate the agents in this repo, follow these steps:
Create and activate a virtual environment:
$ cd rl-mdatos
$ virtualenv .venv
$ source .venv/bin/activate
Install the required packages:
$ (.venv) pip install -r requirements.txt
Install this very repo in editable mode:
$ (.venv) pip install -e .
Go to the desired environment. For each environment, there's a script to train, execute and/or record a specific algorithm:
$ (.venv) cd rl_mdatos/envs/desired_env
To train a Q-Learning agent in CartPole-v0
:
$ (.venv) python cp_q_learning.py --train
To execute the trained agent:
$ (.venv) python cp_q_learning.py --run
To record the execution (this only works for CartPole-v0
and MountainCar-v0
):
$ (.venv) python cp_q_learning.py --run --record
3 types of files are stored in rl-mdatos/data
:
logs
: data generated during training, which can be visualized withtensorboard
(tensorboard --logdir data/...
)trained_agents
: files with final parameters of the trained agents, which are loaded at execution time.videos
: videos of the recorded episodes.
After successfully training the agents, these should be the results.
INFO:root:Running Q-Learning agent
INFO:root:Episode 1
INFO:root:Total reward: 9960
INFO:root:Mean reward: 9.96
[1] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018.
[2] David Silver. Lectures on Reinforcement Learning. URL:https://www.davidsilver.uk/teaching/. 2015.
[3] Stuart J. Russell and Peter Norvig. Artificial Intelligence - A Modern Approach, Third International Edition. Pearson Education London, 2010.