This code runs with Python version 3.* and libraries and their respective versions mentioned in requirements.txt. Following command can be executed for their installation:
pip install -r requirements.txt
This is an Udacity Nanodegree project. The dataset I chose had statitics of players in top 5 soccer leagues across Europe from year 2014-2015 to 2019-20. Apart from attributes like goals, assists, yellow and red cards,etc, there were some other interesting statistics such as expected goals, expected assists, key passes,etc.
With the data at hand, I tried to answer following questions:
- Which league has the most attacking defenders across Europe's top 5 league in the last 6 seasons?
- Which teams outperform their expected goals measure while which ones underperform (and in which season)? How is it related to performance in respective league in that year? Further, which teams constantly outperform their expected goals measure in last 5 seasons?
- In a particular season and league, which teams were most dependent on a single player for scoring goals?: a Gini coefficient analysis of expected goals chain statistic
data folder consists of csv files containing statisitics of each player in Europe's top 5 soccer league teams from year 2014-15 to 2019-20.
The main findings of this work can be found on my Medium blog
The data was obtained from Kaggle. The Gini coefficient analysis is based on ideas presented in this article. Further, I would also like to acknowledge Udacity Data Scientist Nanodegree instructors for providing an opportunity of creating a blog for a data science problem.