Comparing DP (Dynamic Programming) and QLearning implementations.
As expected QLearning solution wins (under 500 episodes needed to reach the goal consistently).
- Number of nodes in all layers
- Learning rate parameters
- Advanced activation functions