GitHub - noanabeshima/visual-value-iteration: Visual value iteration for a toy maze problem.

Code for https://www.youtube.com/watch?v=ZNbIKv9gCOg

A maze Markov Decision Process solved with value iteration. Maze walls are also available to render in the code.

The maze is a 2D grid of cells / grid boxes. Each cell is a state. There are five actions for each cell: up, down, left, right, and staying still. Moving into a wall is -100 reward and bumps the player back into the cell he/she was in, moving into another cell (or staying still) is -1 reward and reaching the end cell (in the bottom right corner) is +10000 reward.

A policy is just a way of making decisions. It assigns each grid cell (state) to an action (up, down, left, right, still). For example, my policy could be always moving right whenever I'm in the top left square/grid cell and left for every other grid cell.

The whiter a square, the greater the estimated future reward of the optimal policy.

Watch how the estimated future reward of cells propagates from the bottom right all the way to the top left!

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
README.md		README.md
main.py		main.py
no_walls.py		no_walls.py
video.py		video.py
vis.gif		vis.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

noanabeshima/visual-value-iteration

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages