Maze Q-Learning

Introduction

Imagine we have an agent which only knows how to walk and where to go, but he doesn't know how to reach there. How can we teach him to find an optimal path to his destination ? In this repository I've been using Q-Learning algorithm to solve this problem. Q-Learning is a reinforcement learning technique (in simple words: learning by mistakes). The goal of Q-learning is to learn a policy, which tells the agent what action should take under what circumstances

Problem

Imagine our agent is in a 2D envirement like below

Each blocks represent different state

Red blocks represents obstacle which agent can't walk throgh it.
Green block represents final state, the goal of agent is to find an optimal path to reach this state.

Each state have its own reward value which is negative for obstacle states and positive for none-obstacle states. At each block our agent can move with four actions:

Go left
Go right
Go up
Go down

After each action he takes, it will takes his reward from envirement. At first our agent randomly chooses an action to move to different states, but eventually he will learn by his mistakes and will find the optimal path to reach the final state.

Implementation

Set an starting value for Reward-Table:

Reward Table is a nn table that contains reward value of each action at each state. f.e: if we have an envirement that contains 9 states (like above), then we must have a 99 reward table that each value in it represents the reward of moving from one state to another. for instance reward[0][3] represents the reward value of moving from state 0 to state 3

private void initRewards() {
        [k,r]
        for (int i = 0; i < rewards.length; i++) {
            int r1 = i % width;
            int k1 = i / width;

            for (int j = 0; j < rewards.length; j++) {
                int r2 = j % width;
                int k2 = j / width;
                
                //if reward[k1,r1] && reward[k2,r2] aren't neighbor
                //then set their reward value to -1
                if (!((r1 == r2 && Math.abs(k1 - k2) == 1) || (k1 == k2 && Math.abs(r1 - r2) == 1)))
                    rewards[i][j] = -1;
            }
        }
    }

Select a none-obstacle state as current state
Get current state's neighbors
Select one of the neighbor as next state
Update Q-Table with the Q-Function

Q-Table is a n*n table that used to calculate the maximum expected future rewards for action at each state. Basically, this table will guide our agent to the best action at each state.

Q-Function: Q-Learning algorithm uses the Bellman equation to update the Q value

private void QLearning() {
        List<Integer> possibleActions;
        Random rand = new Random();
        for (int i = 0; i < TRAIN_NUMBER; i++) {
            int currentState = randomState();
            //get all possible action for current state;
            while (!(possibleActions = getNeighbors(currentState)).isEmpty()) {
                //determine next state
                int idxNextState = rand.nextInt(possibleActions.size());
                int nextState = possibleActions.get(idxNextState);

                //calculate Q
                double r = getReward(currentState, nextState);
                double maxQ = maxQ(nextState);
                double q = Q[currentState][nextState];
                //q-function
                //q: q value of the current state
                //r: reward of the current state
                //maxQ: represents the max q value for each state on performing any action 
                //alpha: learning rate
                //gamma: discounted rate
                double value = q + ALPHA * (r + GAMMA * maxQ - q);
                Q[currentState][nextState] = value;

                //change current State
                currentState = nextState;
            }
        }
    }

Contribute

If you want to contribute in this repository feel free to fork the repo and hopefully we can learn from each other. your help is very welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
assets		assets
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maze Q-Learning

Introduction

Problem

Implementation

Contribute

About

Releases

Packages

Languages

itsamirrezah/MazeQLearning

Folders and files

Latest commit

History

Repository files navigation

Maze Q-Learning

Introduction

Problem

Implementation

Contribute

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages