A mixed-policy version of the Asynchronous 1-step Q-learning algorithm, based on WoLF-PHC, GIGA-WoLF, WPL, EMA-QL and PGA-APP, with several Game Theoretic test scenarios.
The Multi-agent Double DQN algorithm is in the asyncdqn
folder. You will need Python3.3+, matplotlib, python-tk and TensorFlow 0.13+. To run some threads locally, adjust configuration on asyncdqn/DQN-LocalThreads.py
, and just
export PYTHONPATH=$(pwd)
python3 asyncdqn/DQN-LocalThreads.py
To run 15 processes distributed, adjust configuration on asyncdqn/DQN-Distributed.py
, and
./start-dqn-mixed.sh 15
If you want to test table-based mixed-policy algorithms, you can also adjust configuration on mixedQ/run_mixed_algorithms.py
, and
export PYTHONPATH=$(pwd)
python3 mixedQ/run_mixed_algorithms.py
For specific tests of WPL in a multi-state environment, configure mixedQ/wpl_nrps.py
and
export PYTHONPATH=$(pwd)
python3 mixedQ/wpl_nrps.py
The algorithm works out of the box with all scenarios. This is a pseudo-code description.
We test on 5 famous Game Theory challenges.
We used a neural network with 2 hidden layers of 150 nodes each, and ELU activation. We share network weights to speed-up learning, as shown below.
Below we can see the evolution of the policies of 2 agents in self-play using the Wolf-PHC, GIGA-WoLF, WPL, and EMA-QL algorithms, over 1000 epochs of 10000 trials. The games shown are the Tricky Game (solid) and the Biased Game (dotted), both shown in Figure 2. Each plot represents the probability of playing the first action by each player.
Below we can see the evolution of the policies of 2 agents in self-play using the deep learning implementations of Wolf-PHC, GIGA-WoLF, WPL, and EMA-QL algorithms, over 400 epochs of 250 iterations. The games shown are the Tricky Game (solid) and the Biased Game (dotted), both shown in Figure 2. Each plot represents the probability of playing the first action by each player.