generalized_paraphrase_identification

Implementation of the paper 'GAPX: Generalized Autoregressive Paraphrase-identification X'

NeurIPS 2022

An ensemble model for paraphrase identification robust to distribution shift.

Requirements

Please download the following paraphrase identification datasets:

To train and evaluate a paraphrase identification model, run:

python run.py --source_dataset [QQP, PIT, PAWS] --option [naive, robust]

Here we implemented a simplified version from the paper, where for the discriminative model, we use BART instead of RoBERTa

You should expect to see something similar to this (f1/acc/auc):

Command	QQP->QQP	QQP->WMT	QQP->PAWS	QQP->PIT
`python run.py --source_dataset QQP --option naive`	83.4/83.5/91.2	66.7/66.8/74.2	44.7/49.8/57.1	63.6/66.5/82.0
`python run.py --source_dataset QQP --option robust`	83.1/83.2/88.4	74.4/74.7/79.3	56.6/56.9/59.5	62.3/63.6/73.5

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
utils		utils
README.md		README.md
models.py		models.py
requirements.txt		requirements.txt
run.py		run.py
testing.py		testing.py
training.py		training.py