This repo provides a simple, distributed and asynchronous multi-agent reinforcement learning framework for the Google Research Football environment. Currently, it is dedicated for Google Research Football environment with the cooperative part implemented in IPPO/MAPPO and the competitive part implemented in PSRO/Simple League. In the future, we will also release codes for other related algorithms and environments.
Our codes are based on Light-MALib, which is a simplified version of MALib with restricted algorithms and environments but certain enhancements, like distributed async-training, league-like multiple population training, detailed tensorboard logging. If you are also interested in other Multi-Agent Learning algorithms and environments, you may also refer to MALib for more details.
Citation
Song, Y., Jiang, H., Tian, Z. et al. An Empirical Study on Google Research Football Multi-agent Scenarios. Mach. Intell. Res. (2024). https://doi.org/10.1007/s11633-023-1426-8
@article{song2024empirical,
title={An Empirical Study on Google Research Football Multi-agent Scenarios},
author={Song, Yan and Jiang, He and Tian, Zheng and Zhang, Haifeng and Zhang, Yingping and Zhu, Jiangcheng and Dai, Zonghong and Zhang, Weinan and Wang, Jun},
journal={Machine Intelligence Research},
pages={1--22},
year={2024},
publisher={Springer}
}
For experiment on academy scenario, please see our new repository : GRF_MARL
- Install
- Run Experiments
- Benchmark 11_vs_11 1.0 hard bot
- GRF toolkits
- Benchmark policy
- Tensorboard tags
- Documentation
- Contact
- Join Us
You can use any tool to manage your python environment. Here, we use conda as an example.
- install conda/minconda.
conda create -n light-malib python==3.9
to create a new conda env.- activate the env by
conda activate light-malib
when you want to use it or you can add this line to your.bashrc
file to enable it everytime you login into the bash.
- In the root folder of this repo (with the
setup.py
file), runpip install -r requirement.txt
to install dependencies of Light-MALib. - In the root folder of this repo (with the
setup.py
file), runpip install .
orpip install -e .
to install Light-MALib. - Follow the instructions in the official website https://pytorch.org/get-started/locally/ to install PyTorch (for example, version 1.13.0+cu116).
- Follow the instructions in the official repo https://github.com/google-research/football and install the Google Research Football environment.
- You may use
python -c "import gfootball;print(gfootball.__file__)"
or other methods to locate wheregfootball
pacakage is. - Go to the directory of
gfootball
pacakage, for example,/home/username/miniconda3/envs/light-malib/lib/python3.8/site-packages/gfootball/
. - Copy
.py
files underscenarios
folder in our repo toscenarios
folder in thegfootball
pacakage.
- If you want to run experiments on a small cluster, please follow ray's official instructions to start a cluster. For example, use
ray start --head
on the master, then connect other machines to the master following the hints from command line output. python light_malib/main_pbt.py --config <config_file_path>
to run a training experiment. An example is given bytrain_light_malib.sh
.python light_malib/scripts/play_gr_football.py
to run a competition between two models.
Beats 1.0 hard bot under multi-agent 11v11 full-game scenraios within 10 hours using IPPO, taking advantage of glitches in built-in logics.
Currently, we provide the following tools for better study in the field of Football AI.
- Google Football Game Graph: A data structure representing a game as a tree structure with branching indicating important events like goals or intercepts.
- Google Football Game Debugger: A single-step graphical debugger illustrating both 3D and 2D frames with detailed frame data, such as the movements of players and the ball.
At this stage, we release some of our trained model for use as initializations or opponents. Model files are available on Google Drive and Baidu Wangpan.
DataServer:
alive_usage_mean/std
: mean/std usage of data samples in buffer;mean_wait_time
: total reading waiting time divided reading counts;sample_per_minute_read
: number of samples read per minute;sample_per_minute_write
: number of samples written per minute;
PSRO:
Elo
: Elo-rate during PBT;Payoff Table
: plot of payoff table;
Rollout:
bad_pass,bad_shot,get_intercepted,get_tackled,good_pass,good_shot,interception,num_pass,num_shot,tackle, total_move,total_pass,total_possession,total_shot
: detailed football statistics;goal_diff
: goal difference of the training agent (positive indicates more goals);lose/win
: expected lose/win rate during rollout;score
: expected scores durig rollout, score for a single game has value 0 if lose, 1 if win and 0.5 if draw;
RolloutTimer
batch
: timer for getting a rollout batch;env_core_step
: timer for simulator stepping time;env_step
: total timer for an enviroment step;feature
: timer for feature encoding;inference
: timer for policy inference;policy_update
: timer for pulling policies from remote;reward
: timer for reward calculation;rollout
: total timer for one rollout;sample
: timer for policy sampling;stats
: timer for collecting statistics;
Training:
Old_V_max/min/mean/std
: value estimate at rollout;V_max/min/mean/std
: current value estimate;advantage_max/min/mean/std
: Advantage value;approx_kl
: KL divergence between old and new action distributions;clip_ratio
: proportion of clipped entries;delta_max/min/mean/std
: TD error;entropy
: entropy value;imp_weights_max/min/mean/std
: importance weights;kl_diff
: variation ofapprox_kl
;lower_clip_ratio
: proportion of up-clipping entries;upper_clip_ratio
: proportion of down-clipping entries;policy_loss
: policy loss;training_epoch
: number of training epoch at each iteration;value_loss
: value loss
TrainingTimer:
compute_return
: timer for GAE compute;data_copy
: timer for data copy when processing data;data_generator
: timer for generating data;loss
: total timer for loss computing;move_to_gpu
: timer for sending data to GPU;optimize
: total timer for an optimization step;push_policy
: timer for pushing trained policies to the remote;train_step
: total timer for a training step;trainer_data
: timer for get data fromlocal_queue
;trainer_optimize
: timer for a optimization step in the trainer;
Under construction, stay tuned :)
If you have any questions about this repo, feel free to leave an issue. You can also contact current maintainers, YanSong97 and DiligentPanda, by email.
Get Interested in our project? Or have great passions in:
- Multi-Agent Learning and Game AI
- Operation Research and Optimization
- Robotics and Control
- Visual and Graphic Intelligence
- Data Mining and so on
Welcome! Why not take a look at https://digitalbrain.cn/talents?
With the leading scientists, enginneers and field experts, we are going to provide Better Decisions for Better World!
Digital Brain Laboratory, Shanghai, is co-founded by the founding partner and chairman of CMC Captital, Mr. Ruigang Li, and world-renowned scientist in the field of decision intelligence, Prof. Jun Wang.