Skip to content

noanabeshima/visual-value-iteration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for https://www.youtube.com/watch?v=ZNbIKv9gCOg

A maze Markov Decision Process solved with value iteration. Maze walls are also available to render in the code.

The maze is a 2D grid of cells / grid boxes. Each cell is a state. There are five actions for each cell: up, down, left, right, and staying still. Moving into a wall is -100 reward and bumps the player back into the cell he/she was in, moving into another cell (or staying still) is -1 reward and reaching the end cell (in the bottom right corner) is +10000 reward.

A policy is just a way of making decisions. It assigns each grid cell (state) to an action (up, down, left, right, still). For example, my policy could be always moving right whenever I'm in the top left square/grid cell and left for every other grid cell.

The whiter a square, the greater the estimated future reward of the optimal policy.

Watch how the estimated future reward of cells propagates from the bottom right all the way to the top left!

About

Visual value iteration for a toy maze problem.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages