Solution for OpenAI Gym Taxi problem v2 and v3 using temporal difference methods - SarsaMax and Expected Sarsa

This is a solution for Gym Taxi problem as discussed in the Reinforcement Learning cource at Udacity.

main.py and monitor.py are slightly modified versions for enviroment setup from the cource.

agent.py is my solution for the problem hyper_opt.py is a script to find optimal set of hyperparameters for each algorithm.

Attention - Taxt-V2 vs Taxi-V3

Recent version of Gym has deprecated Taxi-v2, which was mainly used to it's leaderboard. So, by default local requirements.txt install gym==0.14, that still has Taxi-V2.

Taxi-V3 is a more difficult version of the problem, to run it, please do manually

pip install gym==0.16

To run hyper parameters optimization you can use

python hyper_opt.py --taxi_version v3

Best score results

I've obtained following best runs (best out of 10):

Env	Sarsa Max	Expecation Sarsa
Taxi-V2	9.49	9.44
Taxi-V3	9.07	8.8

In Taxi-V2 version Sarsa Max outperformed Exp. Sarsa in 30% of cases (10 runs) - not enough for any conclusions.

In Taxi-V3 version Sarsa Max outperformed Exp. Sarsa in 60% of cases (10 runs) - not enough for any conclusions.

It would be nice if someone could run this until statistically meaningful difference p-value or t-criterion is found.

Online performance

Taxi-V2	Taxi-V3

Sarsa Max gets worse online performance but still converges to the same policy as Expected Sarsa.

How to run

To see online training

python main.py

To tune hyperparameters

python hyper_opt.py --n_iters 5 --algo sarsamax --taxi_version v2

Verify results

Run in jupyter: run_analysis_taxiv2.ipynb and run_analysis_taxiv3.ipynb

Notes

Used pseudocode

SarsaMax/ExpectedSarsa optimal hyperparameters

optimal_sarsa_max = {'algorithm': 'sarsamax','alpha': 0.2512238484351891, 'epsilon_cut': 0, 'epsilon_decay': 0.8888782926665223, 'start_epsilon': 0.9957089031634627, 'gamma': 0.7749915552696941}

optimal_exp_sarsa = {'algorithm': 'exp_sarsa', 'alpha': 0.2946281065178629, 'epsilon_cut': 0, 'epsilon_decay': 0.8978159313202051, 'start_epsilon': 0.9803552534195048, 'gamma': 0.6673937505783256}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
hyper_opt.py		hyper_opt.py
main.py		main.py
monitor.py		monitor.py
requirements.txt		requirements.txt
run_analysis_taxiV2.ipynb		run_analysis_taxiV2.ipynb
run_analysis_taxiV3.ipynb		run_analysis_taxiV3.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Solution for OpenAI Gym Taxi problem v2 and v3 using temporal difference methods - SarsaMax and Expected Sarsa

Attention - Taxt-V2 vs Taxi-V3

Best score results

Online performance

How to run

To see online training

To tune hyperparameters

Verify results

Notes

Used pseudocode

SarsaMax/ExpectedSarsa optimal hyperparameters

About

Releases

Packages

Contributors 2

Languages

License

crazyleg/gym-taxi-v2-v3-solution

Folders and files

Latest commit

History

Repository files navigation

Solution for OpenAI Gym Taxi problem v2 and v3 using temporal difference methods - SarsaMax and Expected Sarsa

Attention - Taxt-V2 vs Taxi-V3

Best score results

Online performance

How to run

To see online training

To tune hyperparameters

Verify results

Notes

Used pseudocode

SarsaMax/ExpectedSarsa optimal hyperparameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages