Our project aims to predict the outcomes of the Paris 2024 Olympics by leveraging data from the Tokyo 2021 Olympics and historical Olympic data (from 1960). We use machine learning models to forecast medal counts for various countries and provide insights into factors influencing these outcomes.
The data folder contains various datasets essential for our analysis and model building:
- 2021 Olympics Data: Link to Kaggle dataset
- World Populations and GDP per Capita: Contains data on global populations and GDP per capita.
- Hosts Countries: Information on countries that have hosted the Olympics.
- 120 Years of Olympic History: Link to Kaggle dataset. This dataset includes athlete events and NOC (National Olympic Committee) regions.
The model folder comprises two approaches to predicting medal outcomes:
- Time Series Model: This is a multivariate time series forecasting model to forecast counts of Gold, Silver and Bronze medals, number of Male and Female Athletes, Sports and Events for countries based on historical data.
- Random Forest Classifier: This model generates probabilities for each data point, predicting whether an athlete will win a gold, silver, or bronze medal.
The notebooks folder includes:
- Data Cleaning: Scripts and notebooks used to clean and preprocess the raw data.
- Exploration: Detailed exploratory data analysis (EDA) on the datasets.
- Analysis on 2024 Result Data: Analysis and insights derived from the 2024 prediction results.
For detailed insights and visualizations from our analysis, visit our project page.
- Shreeya Singh
- Aayusha Shrestha
- Angel Venegas