Setup

A collection of experiments to assess the quality of explanations for detectors of machine-generated text.

Setup

git clone --recurse-submodules repo_url

cd repo_dir

Models

cd repo_dir

wget https://openaipublic.azureedge.net/gpt-2/detector-models/v1/detector-base.pt models/radford et al/detector-base.pt

Cache

Unzip explanation_cache.zip to explanation_cache. The filenames contain the SHA256 hash of the input string. See fi_explainer.py.

Python

Create a .venv and activate it.

pip install -r requirements.txt

python -m spacy download en_core_web_sm

python -m spacy download en_core_web_lg

You may want to intall pytorch manually.

Make sure to select the .venv in all notebooks.

Dataset

Run dataset_sampling.ipynb to obtain the base dataset from Guo et al. 2023 (CC-BY-SA). The individual notebooks derive additional synthetic datasets (CC-BY-SA).

Detectors

All detectors are extended to support masked input:

In-domain fine-tuned RoBERTa in Guo et al. 2023: detector_guo.py
Out-of-domain fine-tuned RoBERTa of Soliman 2019 et al. / Radford et al. 2018: detector_radford.py
Zero-shot method in Mitchell et al. 2023 with surrogate model as proposed in Mireshghallah et al. 2023: detector_detectgpt.py

Explanation Methods

SHAP is used as-is.

Forks of LIME and Anchor are provided as submodules.

Anchor

Addition of a budget limiting the number of samples used during search to cap runtime (200 samples per candidate during search, unlimited samples in final "best of each size" round)
DistillBERT was replaced with DistillRoBERTA and the mask probability adjusted to increase the coherence of perturbations
Changes to the rendering functions for the user-study (used to share JS and CSS scope with LIME)

LIME

Cosmetic changes to the bar-charts for the user-study

Experiments

The explanations are provided as a zip file. All experiments are designed so that any subset of the dataset can be processed in parallel by executing the notebooks with different offsets.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
anchor @ 33e2cbe		anchor @ 33e2cbe
dataset		dataset
detector_base		detector_base
lime @ dff8394		lime @ dff8394
results		results
survey @ a23cc50		survey @ a23cc50
transformers @ 3ddce1d		transformers @ 3ddce1d
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
benchmark.ipynb		benchmark.ipynb
continuity.ipynb		continuity.ipynb
contrastivity.ipynb		contrastivity.ipynb
dataset_sampling.ipynb		dataset_sampling.ipynb
detector_detectgpt.py		detector_detectgpt.py
detector_dummy.py		detector_dummy.py
detector_guo.py		detector_guo.py
detector_radford.py		detector_radford.py
document_selection_user_study.ipynb		document_selection_user_study.ipynb
explainer_wrappers.py		explainer_wrappers.py
explanation_cache.zip		explanation_cache.zip
export_user_study.ipynb		export_user_study.ipynb
fi_explainer.py		fi_explainer.py
masking_strategy.ipynb		masking_strategy.ipynb
masking_test_results.pkl		masking_test_results.pkl
pointing_game.ipynb		pointing_game.ipynb
pointing_game_util.py		pointing_game_util.py
requirements.txt		requirements.txt
run_user.py		run_user.py
simulatability.ipynb		simulatability.ipynb
token_removal.ipynb		token_removal.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setup

Detectors

Explanation Methods

Anchor

LIME

Experiments

About

Releases

Packages

Languages

License

loris3/evaluation_explanation_quality

Folders and files

Latest commit

History

Repository files navigation

Setup

Detectors

Explanation Methods

Anchor

LIME

Experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages