Skip to content

a variation on Q-learning on the Taxi environment from OpenAI Gym

Notifications You must be signed in to change notification settings

andyharless/openai-gym-taxi-v3-udacity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

openai-gym-taxi-v3-udacity

Attempts to solve OpenAI Gym's Taxi-v3 RL environment

The skeleton of this code is from Udacity. Their version uses Taxi-v2, but this version uses v3, since v2 is deprecated. (But I also did version 2 here.)

The environment is from here.

To do the simple demo, on Linux or Mac with Docker installed, make taxi.sh executable and run it:

git clone https://github.com/andyharless/openai-gym-taxi-v3-udacity.git
cd openai-gym-taxi-v3-udacity
chmod u+x taxi.sh
./taxi.sh

It should produce a score (best average reward of 100) of 9.26 (The output.txt file shows a sample output.)

This version uses a variation on standard Q-learning. The policy is epsilon-greedy, but when the non-greedy action is chosen, instead of being sampled from a uniform distribution, it is sampled from a distribution that reflects two things:

  • a preference for actions with higher Q values (i.e. "greedy but flexible")
  • a preference for novel actions (those that have recently been less often chosen in the current state)

The latter are tracked via a "path memory" table (same shape as the Q table), which counts how often each action is taken in each state. At the end of each episode, path memories from the previous episode decay geometrically.

The sampling distribution for stochastic actions is the softmax of a linear combination of the Q values (with a positive coefficient) and the path memory values (with a negative coefficient).

As of 2020-09-13, this solution is 1st on the Leaderboard for the v3 Taxi environment at OpenAI Gym (but I cheated by using a good seed).

About

a variation on Q-learning on the Taxi environment from OpenAI Gym

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published