Current idea: Start with single-agent RL, then refresher on Game Theory, then multi-agent RL and the comparison of GT methods vs RL methods. Use Python / TensorFlow for computing.
Complete course (Denny Britz): http://www.wildml.com/2016/10/learning-reinforcement-learning/
based on Barto Sutton book and David Silver’s Reinforcement Learning Course. Pro: youtube videos with lectures.
Book examples in Python:
Book Exercises, also for Python.
-
Bierman & Fernandez: GT with economic applications (But: out of print)
-
Hans Peters, speltheorie voor economen (Dutch)
http://researchers-sbe.unimaas.nl/hanspeters/wp-content/uploads/sites/21/2014/02/Speltheorie.pdf
The Axelrod library is an open source Python package that allows for reproducible game theoretic research into the Iterated Prisoner’s Dilemma.
-
Multi-agent actor-critic for mixed cooperative-competitive environments (openAI)
- A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning (DeepMind)
Abstract: To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL). The simplest form is independent reinforcement learning (InRL), where each agent treats its experience as part of its (non-stationary) environment. In this paper, we first observe that policies learned using InRL can overfit to the other agents' policies during training, failing to sufficiently generalize during execution. We introduce a new metric, joint-policy correlation, to quantify this effect. We describe an algorithm for general MARL, based on approximate best responses to mixtures of policies generated using deep reinforcement learning, and empirical game-theoretic analysis to compute meta-strategies for policy selection. The algorithm generalizes previous ones such as InRL, iterated best response, double oracle, and fictitious play. Then, we present a scalable implementation which reduces the memory requirement using decoupled meta-solvers. Finally, we demonstrate the generality of the resulting policies in two partially observable settings: gridworld coordination games and poker.
-
Multi-armed bandits
-
Q-learning
-
TD-learning
-
Policy gradient
-
Deep Q-learning
-
Tree search, monte carlo tree search
-
Reinforce: MC Policy gradient
A Comparison od RL frameworks: Dopamine, RLLib, Keras-RL, Coah, TRFL, Tensorforce, Coach and more
- OpenAI Gym
https://arxiv.org/pdf/1606.01540.pdf
OpenAI Baselines is a set of high-quality implementations of reinforcement learning algorithms.
- RLlib
https://bair.berkeley.edu/blog/2018/12/12/rllib/
RLlib is an open-source libraryfor reinforcement learning, which offers a dedicated platform for multi-agent reinforcement learning problems as well as computational scalability.
-
TF-agents
-
Unity ML Agents
https://github.com/Unity-Technologies/ml-agents
Juliani, A., Berges, V., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., Lange, D. (2020). Unity: A General Platform for Intelligent Agents. arXiv preprint arXiv:1809.02627. https://github.com/Unity-Technologies/ml-agents.
- Mastering the Game of Sungka from Random Play (Mancala variant)
https://github.com/baudm/sungka-ai
- OpenAI Gym environment for Sungka
- Reward formulation which penalizes actions resulting in high opponent scores
- Fast-converging and stable training algorithm
The game is Pommerman, a variant of the famous Bomberman. There are four agents, power ups, and bombs galore in three modes. In FFA, enter an agent and be the last hero standing.
- Fantasy Football AI (FFAI, OpenAI Gym environment)
Blood Bowl: A New Board Game Challenge and Competition for AI.
Deep Reinforcement Learning in Strategic Multi-Agent Games: the case of No-Press Diplomacy
This appears an example on multi agent games in OpenAI gym.