A collection of experiments to assess the quality of explanations for detectors of machine-generated text.
git clone --recurse-submodules repo_url
cd repo_dir
Models
cd repo_dir
wget https://openaipublic.azureedge.net/gpt-2/detector-models/v1/detector-base.pt models/radford et al/detector-base.pt
Cache
Unzip explanation_cache.zip
to explanation_cache
. The filenames contain the SHA256 hash of the input string. See fi_explainer.py
.
Python
Create a .venv and activate it.
pip install -r requirements.txt
python -m spacy download en_core_web_sm
python -m spacy download en_core_web_lg
You may want to intall pytorch manually.
Make sure to select the .venv in all notebooks.
Dataset
Run dataset_sampling.ipynb
to obtain the base dataset from Guo et al. 2023 (CC-BY-SA).
The individual notebooks derive additional synthetic datasets (CC-BY-SA).
All detectors are extended to support masked input:
- In-domain fine-tuned RoBERTa in Guo et al. 2023: detector_guo.py
- Out-of-domain fine-tuned RoBERTa of Soliman 2019 et al. / Radford et al. 2018: detector_radford.py
- Zero-shot method in Mitchell et al. 2023 with surrogate model as proposed in Mireshghallah et al. 2023: detector_detectgpt.py
SHAP is used as-is.
Forks of LIME and Anchor are provided as submodules.
- Addition of a budget limiting the number of samples used during search to cap runtime (200 samples per candidate during search, unlimited samples in final "best of each size" round)
- DistillBERT was replaced with DistillRoBERTA and the mask probability adjusted to increase the coherence of perturbations
- Changes to the rendering functions for the user-study (used to share JS and CSS scope with LIME)
- Cosmetic changes to the bar-charts for the user-study
The explanations are provided as a zip file. All experiments are designed so that any subset of the dataset can be processed in parallel by executing the notebooks with different offsets.