Skip to content

Human-Guided Fair Classification for NLP (ICLR 2023, Spotlight)

License

Notifications You must be signed in to change notification settings

eth-sri/fairness-feedback-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human-Guided Fair Classification for NLP

This repository accompanies our ICLR 2023 paper Human-Guided Fair Classification for Natural Language Processing.

Overview

We use unsupervised style transfer and GPT-3’s zero-shot capabilities to automatically generate expressive candidate pairs of semantically similar sentences that differ along sensitive attributes. Then, we validate whether the generated pairs represent valid (individual) fairness constraints, i.e., should be treated the same, using human feedback. Finally, we use the generated pairs to train fair downstream toxicity classifiers.

Dataset

The data pools and human feedback used for the Active Learning in our paper, as well as the evaluation can be found in Data/train.csv and Data/test.csv respectively. The datasets are described in detail in Data/datasheet.md.

Warning: As our datasets are derived from the Civil comments dataset for toxicity classification, they contain offensive content.

Generating modified comments (sensitive attribute transfer):

Make sure to download the train.csv file from the Civil comments dataset to Code/Datasets/Kaggle_Toxicity and set up a virtual environment with the required packages:

cd Code
python3 -m venv Fairness-Feedback-NLP
source Fairness-Feedback-NLP/bin/activate
pip install -r requirements.txt

Then, run

cd Code
chmod +x Generation.sh
./Generation.sh 

to train the style transfer pipeline and generate pairs using style transfer and word replacement. The resulting generated pairs can be found in the corresponding folders in Code/generations. Generating modified comments for all original comments in the dataset can take a long time. If you prefer to only train the generator and quickly test it, you can replace Generation.sh with Generation_Quick.sh. If you also wish to use GPT-3 generation, run Generation_GPT.sh instead. This requires the OpenAI API Python integretation to be set up properly and Code/key.txt replaced with the appropriate API key. Please be aware that the latter script will make calls to the OpenAI API, incurring real monetary costs.

After Generation.sh has finished, the results from Table 1 in the paper can be (partially) replicated via

cd Code
chmod +x Tables.sh
./Tables.sh

If you used Generation_GPT.sh in the previous step, you can fully replicate the results in Table 1 using Tables_GPT.sh instead of Tables.sh.

Citing this work

@inproceedings{
dorner2023humanguided,
title={Human-Guided Fair Classification for Natural Language Processing},
author={Florian E. Dorner and Momchil Peychev and Nikola Konstantinov and Naman Goel and Elliott Ash and Martin Vechev},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=N_g8TT9Cy7f}
}