In this project, you will be able to create levels for my agent and see how he can solve them; you also can test the levels yourself to compare your score to the agents!
2021-05-12.01-04-47.mp4
The map has 4 types of objects.
-
Victory points, 2 in number, the agents will have to be simulatenement on a distinct victory point to solve the puzzle
-
The blockers, which will prevent the agent from going to a square that has a blocker
You can move in all four directions (⬅️⬆️⬇️➡️ ), the movement of the agents is synchronized, i.e. if you decide to go right, both agents will try to go right.
The walls are like teleporters, if you are at the base edge of the map and you go to the bottom, you will be teleported from the top edge (if there is no blocker). Also from right to left.
To start creating your own levels and try them out or have them tested by the agent, you must go to this link :
To create a level you must first choose the size of the grid in length and width (between 3 and 10), then place your obstacles / agents / victory points.
After validating your map
Then you have the choice between two buttons : 🔘 Your Run
🔘 Agent Run
.
The reward is straightforward; at each step, I give him 0 rewards, and when he wins the game, he gets 1.
I didn't want to go for more complex rewards, like, for example, one that would have given more rewards when the agent's pattern resembles the pattern of the victory points because you can switch from a highly different pattern to the solution, main thanks to teleportation. Moreover, I wanted to see how it would go after solving these puzzles without any additional indication.
I faced the problem that the observation space must remain the same, but I want the agent to play on different types of terrain. I saw three methods that were available to me:
- 👀 Use the sensor perceptions 👀
- 📷 Use the camera sensor 📷
- 📝 Give a array of size N that fills the rest of the map with -1 when it is empty 📝
I chose the painting, because it seemed more appropriate to the game.
The observations are :
- Map Size -> X and Y
- Position Agent1 and Agent2 -> X1, Y1 and X2, Y2
- Position Victory1 and Victory2 -> X1, Y1 and X2, Y2
- Position of N blocker ([0,6]) -> Xn and Yn
- -1 to fill the array ([6-N] * 2)
Size of the observation = 16
The actions are very simple, they are <b<4 discrete values, which define the 4 movements.
0 -> ⬆️ 1 -> ⬇️ 2 -> ➡️ 3 -> ⬅️
Currently, my agent is much stronger than a human for puzzle solving. However, in some situations where a human would quickly see that the problem is solvable in 1 move, the agent does not see it and takes paths sometimes not optimized.