This is the official repository for our NeurIPS 2023 paper, "Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense".
- (Oct 2023) We have added a new setting to utilize the ShareGPT corpus in retrieval-based detection experiments. In
dipper_paraphrases/detect_retrieval.py
, use the flag--retrieval_corpus sharegpt
. - (May 2023) The training data for DIPPER is now available here!
- (May 2023) The non-contextual ablated version of DIPPER (Section 6.1 of paper) is now available on the HuggingFace model hub! (link)
- (April 2023) We have now released our paraphraser DIPPER on the HuggingFace model hub (link)!
- (April 2023) Benchmark data, preprocessed paraphrases and scripts to reproduce the paper's experiments are now available!
Requirements
Since DIPPER is a 11B parameter model, please use a GPU with at least 40GB of memory to reproduce the experiments in the paper. Lower precision approximations or DeepSpeed optimizations may also be fine on lower memory GPUs, but we have not tested them in our experiments.
# required (for paraphrasing)
pip install torch transformers sklearn nltk
pip install --editable .
# optional (needed for some detection experiments)
pip install openai rankgen retriv sentencepiece
Model Download
DIPPER from HuggingFace
HuggingFace Model Hub links -
full model: https://huggingface.co/kalpeshk2011/dipper-paraphraser-xxl
ablated model without context: https://huggingface.co/kalpeshk2011/dipper-paraphraser-xxl-no-context
Script: dipper_paraphrases/paraphrase_minimal.py
DIPPER manual download
Checkpoint: https://drive.google.com/file/d/1LJJ1P5X2An0kMn8WAAAJBmxBuNS-5GiK/view?usp=sharing
To run this downloaded model, in dipper_paraphrases/paraphrase_minimal.py
, uncomment the line dp = DipperParaphraser(model="...")
and specify your model checkpoint path.
SIM model
You could optionally download the SIM model from Wieting et al. 2021 for calculating semantic similarity of the paraphrased outputs. Download the two files in this link and place them in dipper_paraphrases/sim
.
T5X versions
Please see our official Google Research release here: https://github.com/google-research/google-research/tree/master/dipper
Verify DIPPER is working
Please run the script dipper_paraphrases/paraphrase_minimal.py
and compare the outputs with sample_outputs.md
. The greedy decoded outputs should exactly match, while the top_p samples will have some differences from the sample outputs but have higher diversity.
(IMPORTANT) paraphraser differences from paper
There are two minor differences between the actual model and the paper's description:
-
Our model uses
<sent> ... </sent>
tags instead of<p> ... </p>
tags. -
The lexical and order diversity codes used by the actual model correspond to "similarity" rather than "diversity". For a diversity of X, please use the control code value
100 - X
. In other words, L60-O60 in the paper corresponds tolex = 40, order = 40
as the control code input to the model.
This is all documented in our minimal sample script to run DIPPER, dipper_paraphrases/paraphrase_minimal.py
, and also in footnote 6 in our paper.
Inference / Detection dataset: Download the folders open-generation-data
and lfqa-data
from this Google Drive link. Place them in your root folder. Reproducing the experiments in the paper has three steps. We have already done Step 1 and Step 2 and added preprocessed data to Google Drive link.
Training dataset for DIPPER (not needed for inference/detection experiments): Download the ZIP file from this link. The specific files used for training were par3/translator_pair/sents_[123]/train_ctrl_ctx_*
as well as par3/gt_translator/sents_[123]/train_ctrl_ctx_*
.
Step 1: Generating text from large language models
Use the scripts dipper_paraphrases/generate_gpt2.py
, or dipper_paraphrases/generate_gpt3.py
, or dipper_paraphrases/generate_opt.py
as shown below,
# for no watermarking
python dipper_paraphrases/generate_gpt2.py --strength 0.0 --dataset lfqa-data/inputs.jsonl --output_dir lfqa-data
# for including watermarking
python dipper_paraphrases/generate_gpt2.py --strength 2.0 --dataset lfqa-data/inputs.jsonl --output_dir lfqa-data
You can speed this up by parallelizing it across multiple GPUs on SLURM using the code below. Please read the script before using parallelization, it will likely need modifications depending on your specific SLURM setup.
python dipper_paraphrases/parallel/schedule.py --command "python dipper_paraphrases/generate_gpt2.py --strength 0.0 --dataset lfqa-data/inputs.jsonl --output_dir lfqa-data" --partition gpu-preempt --num_shards 8
# after completion
python dipper_paraphrases/parallel/merge.py --input_pattern "lfqa-data/gpt2_xl_strength_2.0_frac_0.5_300_len_top_p_0.9.jsonl.shard_*"
Step 2: Paraphrasing text generated by large language models
Use the scripts dipper_paraphrases/paraphrase.py
as shown below,
python dipper_paraphrases/paraphrase.py --output_file lfqa-data/gpt2_xl_strength_2.0_frac_0.5_300_len_top_p_0.9.jsonl --model kalpeshk2011/dipper-paraphraser-xxl
You can also parallelize this in a manner identical to Stage 1.
Step 3: Run AI-text detectors
Use any of the scripts to run various detectors: dipper_paraphrases/detect_*.py
as follows. Each script caches the processed data (such as API calls) and will run a lot quicker the next time. Note that the GPTZero and OpenAI experiments need access to API keys, see dipper_paraphrases/utils.py
for details.
python dipper_paraphrases/detect_watermark.py --output_file lfqa-data/gpt2_xl_strength_2.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/watermark_cache.json
python dipper_paraphrases/detect_openai.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/openai_cache.json
python dipper_paraphrases/detect_gptzero.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/gptzero_cache.json
python dipper_paraphrases/detect_detectgpt.py --base_model "facebook/opt-13b" --output_file lfqa-data/opt_13b_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/detectgpt_cache_opt.json
python dipper_paraphrases/detect_rankgen.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --detector_cache lfqa-data/rankgen_cache.json
python dipper_paraphrases/detect_retrieval.py --output_file lfqa-data/gpt2_xl_strength_0.0_frac_0.5_300_len_top_p_0.9.jsonl_pp --retrieval_corpus pooled --technique bm25
We recommend reporting true positive rates at a false positive rate of 1% instead of ROC curves, as discussed in the paper. This will be printed by the script. Nevertheless, the full ROC curves will be stored in roc_plots
, use dipper_paraphrases/plot_roc.py
to plot them.
Since DetectGPT takes a while to run, it may be helpful to shard the DetectGPT experiments using the parallel scripts of the previous two steps. Use dipper_paraphrases/parallel/merge_json.py
to merge the cache. Set --base_model none
to ignore loading the LLM and just rely on cached results. Also, don't forget the --base_model
flag in DetectGPT runs, see the code for more details.
For the scaled retrieval experiments, please see dipper_paraphrases/detect_retrieval_scale_*.py
. Please contact me if you want the raw data accompanying this experiment (email me at [email protected]).
If you found the code, model or paper useful please cite:
@article{krishna2023paraphrasing,
title={Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense},
author={Krishna, Kalpesh and Song, Yixiao and Karpinska, Marzena and Wieting, John and Iyyer, Mohit},
journal={arXiv preprint arXiv:2303.13408},
year={2023}
}