This project implements the George pipeline for addressing the spurious correlations problem using the SpuCo package. We train a deep learning model on the SpuCoMNIST dataset to improve the model's robustness against spurious correlations.
- We train a Convolutional Neural Network (CNN) model (LeNet) using Empirical Risk Minimization (ERM) on the SpuCoMNIST dataset.
- After ERM training, we cluster the model’s outputs using the Cluster class from SpuCo's group_inference module.
- We then perform group-balanced training using the GroupBalanceBatchERM method to ensure equal representation of each group during training.
- Finally, we evaluate the model's predictions on the MNIST digits and output the accuracy.
Before running the project, ensure you have the required dependencies. Follow the steps below to set up your environment:
Ensure that you have Python 3.x installed. You can download it from the official Python website: Download Python.
Run the following command to install the necessary libraries:
pip install torch spuco tqdm pandas