This paper has been accepted by EMNLP2023.
- python==3.7
- pytorch==1.11.0
- transformers==4.28.1
- scipy==1.7.3
- scikit-learn==1.0.2
- numpy==1.21.5
Download the benchmark datasets and put them under the directory ./data. Modify corresponding paths in load_dataset.py if necessary.
Setting | Dataset | Val | Test | Source | Link |
---|---|---|---|---|---|
Inconsistency Detection (SUMMAC Benchmark) |
CoGenSum | 1281 | 400 | C | https://github.com/tingofurro/summac |
SummEval | 850 | 850 | C | ||
FRANK | 671 | 1575 | C+X | ||
Polytope | 634 | 634 | C | ||
FactCC | 931 | 503 | C | ||
XSumFaith | 1250 | 1250 | C | ||
Faithfulness Rating | FRANKCNN | - | 1250 | C | https://github.com/NJUNLP/CoP |
QAGSCNN | - | 235 | C | ||
SummEval | - | 1600 | C | https://github.com/Yale-LILY/SummEval | |
FRANKXSUM | - | 996 | X | https://github.com/NJUNLP/CoP | |
QAGSXSUM | - | 239 | X |
Calculate the probabilities based on a foundation language model by:
CUDA_VISIBLE_DEVICES=0 python3 main.py
The results will be saved under the directory ./output, or can be downloaded with this link.
Then, the summary-level and system-level performances of FFLM can be calculated as follows:
python3 summary-level-evaluation.py --file_path xxx
python3 system-level-evaluation.py --file_path xxx
@article{jia2023fflm,
title={Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model},
author={Qi Jia, Siyu Ren, Yizhu Liu, Kenny Q. Zhu},
jbooktitle={Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing},
year={2023}
}