Two-Step Q-Learning

This repository contains the Python files to reproduce the results from the paper titled "Two-Step Q-learning".

Maximization Example

The first example is analogous to the classic maximization example provided in the book - Reinforcement Learning: An Introduction (Chapter 6, Example 6.1).

The directory Max_Bias has all the files required to run the algorithms and, consequently, files to compare the algorithms with plots.

Random MDPs

The second example comprises of generating a hundred random MDPs using mdptoolbox by [pymdptoolbox] (https://github.com/sawcordwell/pymdptoolbox). More specifically, a hundred random MDPs are generated, each with restriction and without restrictions on the structure of the MDPs. Further, using the runfile.py, the algorithms are compared in terms of average error.

Step 1: Download and Install Python MDP Toolbox from https://pymdptoolbox.readthedocs.io/en/latest/api/mdptoolbox.html.

Step 2: Replace the mdp.py and example.py file in original MDP Toolbox with the updated mdp.py and example.py files provided respectively.

Step 3: Run runfile_mdp.py for the results.

Roulette as Multi-Armed Bandit Example

The third example is a classic roulette problem expressed as a single-state multi-armed bandit problem. Once again, the directory roulette as a multi-armed bandit has all the files required to run the algorithms and plot the graphs.

One can refer to the paper for any further details regarding the parameters and convergence of the proposed two-step Q-learning algorithm.

Acknowledgments

The Max_Bias example and Roulette as a multi-armed bandit example are based on and adapted from The-Mean-Squared-Error-of-Double-Q-Learning.

Many thanks to the original authors for their contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
Max_Bias		Max_Bias
Random MDPs		Random MDPs
Roulette as Multi-armed Bandit		Roulette as Multi-armed Bandit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Two-Step Q-Learning

Maximization Example

Random MDPs

Step 1: Download and Install Python MDP Toolbox from https://pymdptoolbox.readthedocs.io/en/latest/api/mdptoolbox.html.

Step 2: Replace the mdp.py and example.py file in original MDP Toolbox with the updated mdp.py and example.py files provided respectively.

Step 3: Run runfile_mdp.py for the results.

Roulette as Multi-Armed Bandit Example

One can refer to the paper for any further details regarding the parameters and convergence of the proposed two-step Q-learning algorithm.

Acknowledgments

About

Releases

Packages

Languages

shreyassr123/Two-step-Q-learning

Folders and files

Latest commit

History

Repository files navigation

Two-Step Q-Learning

Maximization Example

Random MDPs

Step 1: Download and Install Python MDP Toolbox from https://pymdptoolbox.readthedocs.io/en/latest/api/mdptoolbox.html.

Step 2: Replace the mdp.py and example.py file in original MDP Toolbox with the updated mdp.py and example.py files provided respectively.

Step 3: Run runfile_mdp.py for the results.

Roulette as Multi-Armed Bandit Example

One can refer to the paper for any further details regarding the parameters and convergence of the proposed two-step Q-learning algorithm.

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages