based on the RLC implementation of Arjan Groen: https://github.com/arjangroen/RLC
RLC works in three chess environments:
- Goal: Learn to find the shortest path between 2 squares on a chess board
- Motivation: Move Chess has a small statespace, which allows us to tackle this with simple RL algorithms.
- Concepts: Dynamic Programming, Policy Evaluation, Policy Improvement, Policy Iteration, Value Iteration, Synchronous & Asynchronous back-ups, Monte Carlo (MC) Prediction, MC Control, Temporal Difference (TD) Learning, TD control, TD-lambda, SARSA(-max)
- Goal: Capture as many pieces from the opponent within n fullmoves
- Motivation: Piece captures happen more frequently than win-lose-draw events. This give the algorithm more information to learn from.
- Concepts: Q-learning, value function approximation, experience replay, fixed-q-targets, policy gradients, REINFORCE, Actor-Critic.
- Goal: Play chess competitively against a human beginner
- Motivation: An actual RL chess AI, how cool is that?
- Concepts: Deep Q learning, Monte Carlo Tree Search
pipenv install
from RLC.move_chess.environment import Board
from RLC.move_chess.agent import Piece
from RLC.move_chess.learn import Reinforce
env = Board()
p = Piece(piece='rook')
r = Reinforce(p,env)
r.policy_iteration(k=1,gamma=1,synchronous=True)
from RLC.move_chess.environment import Board
from RLC.move_chess.agent import Piece
from RLC.move_chess.learn import Reinforce
from functools import partial
p = Piece(piece='king')
env = Board()
r = Reinforce(p,env)
r.glie_eps_greedy(partial(r.q_learning, alpha = 0.2, gamma = 0.2), n_episodes = 1000)
r.visualize_policy()
r.agent.action_function.max(axis=2).round().astype(int)
from RLC.capture_chess.environment import Board
from RLC.capture_chess.learn import Q_learning
from RLC.capture_chess.agent import Agent
board = Board()
agent = Agent(network='conv',gamma=0.1,lr=0.07)
R = Q_learning(agent,board)
pgn = R.learn(iters=750)
import chess
board = chess.Board()
from RLC.capture_chess.environment import Board
from RLC.capture_chess.learn import Reinforce
from RLC.capture_chess.agent import Agent, policy_gradient_loss
board = Board()
agent = Agent(network='conv_pg',lr=0.3)
R = Reinforce(agent,board)
pgn = R.learn(iters=3000)
import chess
from chess.pgn import Game
import RLC
from RLC.capture_chess.environment import Board
from RLC.capture_chess.learn import ActorCritic
from RLC.capture_chess.agent import Agent
board = Board()
critic = Agent(network='conv',lr=0.1)
critic.fix_model()
actor = Agent(network='conv_pg',lr=0.3)
R = ActorCritic(actor, critic,board)
pgn = R.learn(iters=1000)
https://www.kaggle.com/arjanso/reinforcement-learning-chess-1-policy-iteration
https://www.kaggle.com/arjanso/reinforcement-learning-chess-2-model-free-methods
https://www.kaggle.com/arjanso/reinforcement-learning-chess-3-q-networks
https://www.kaggle.com/arjanso/reinforcement-learning-chess-4-policy-gradients
- Reinforcement Learning: An Introduction
Richard S. Sutton and Andrew G. Barto
1st Edition
MIT Press, march 1998 - RL Course by David Silver: Lecture playlist
https://www.youtube.com/watch?v=2pWv7GOvuf0&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ - Notes on Policy Gradients in autodiff frameworks
Aleksis Pirinen
https://aleksispi.github.io/assets/pg_autodiff.pdf, May 2018