This repository was constructed by team Athene for the FEVER shared task 1. The system reached the third rank in the overall results and first rank on the evidence recall sub-task
This repository builds upon the baseline system repository developed by the FEVER shared task organizers:
This is an accompanying repository for our FEVER Workshop paper at EMNLP 2018. For more information see the paper: UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification
Please use the following citation:
title={UKP-Athene: Multi-Sentence Textual Entailment for Claim Verification},
author={Hanselowski, Andreas and Zhang, Hao and Li, Zile and Sorokin, Daniil and Schiller, Benjamin and Schulz, Claudia and Gurevych, Iryna},
journal={arXiv preprint arXiv:1809.01479},
This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
- Python 3.6
- AllenNLP
- TensorFlow
- Download and install Anaconda (
- Create a Python Environment and activate it:
conda create -n fever python=3.6
source activate fever
- Install the required dependencies
pip install -r requirements.txt
- Download NLTK Punkt Tokenizer
python -c "import nltk;'punkt')"
- Proceed with downloading the data set, the embeddings, the models and the evidence data
Download the FEVER dataset from the website of the FEVER share task into the data directory
mkdir data
mkdir data/fever-data
#To replicate the paper, download paper_dev and paper_test files. These are concatenated for the shared task
wget -O data/fever-data/train.jsonl
wget -O data/fever-data/dev.jsonl
wget -O data/fever-data/test.jsonl
Download pretrained GloVe Vectors
unzip -d data/glove
gzip data/glove/*.txt
Download pretrained Wiki FastText Vectors
mkdir -p data/fasttext
unzip -d data/fasttext
The data preparation consists of three steps: (1) downloading the articles from Wikipedia, (2) indexing these for the evidence retrieval and (3) performing the negative sampling for training .
Download the pre-processed Wikipedia articles and unzip it into the data folder.
unzip -d data
Construct an SQLite Database (go grab a coffee while this runs)
PYTHONPATH=src python src/scripts/ data/wiki-pages data/fever/fever.db
Download the datasets already processed through document retrieval, the pre-trained sentence selection ESIM model and the pre-trained claim verification ESIM models here. Download the files as followings:
mkdir -p model/no_attention_glove/rte_checkpoints/
mkdir -p model/esim_0/rte_checkpoints/
mkdir -p model/esim_0/sentence_retrieval_ensemble/
unzip -d model/no_attention_glove/rte_checkpoints/
unzip -d model/esim_0/rte_checkpoints/
unzip -d model/esim_0/sentence_retrieval_ensemble/
unzip -d data/fever/
PYTHONPATH=src python src/script/athene/
Launch the pipeline with optional mode arguments:
PYTHONPATH=src python src/script/athene/ [--mode <mode>]
All possible modes are as followings:
Modes | Description |
Default option. Run the complete pipeline with both training and predicting phases. |
Skip the document retrieval sub-task. With both training and predicting phases. In this case, the datasets already processed by document retrieval are needed. |
Train and predict only for the RTE sub-task. In this case, the datasets already processed by sentence retrieval are needed. |
Run the all 3 sub-tasks, but only with predicting phase in sentence retrieval and RTE. Only the test set is processed by the document retrieval and sentence retrieval sub-tasks. |
Skip the document retrieval sub-task, and only with predicting phase. The test set processed by the document retrieval is needed, and only the test set is processed by sentence retrieval sub-tasks. |
Predict for only the RTE sub-task. |
Run the all 3 sub-tasks, but only with predicting phase in sentence retrieval and RTE. All 3 datasets are processed by the document retrieval and sentence retrieval sub-tasks. |
Skip the document retrieval sub-task, and only with predicting phase. All 3 datasets processed by the document retrieval are needed. All 3 datasets are processed by sentence retrieval sub-tasks. |
Another variation of the ESIM model is configured through the config file in the conf folder.
To run the models:
PYTHONPATH=src python src/scripts/athene/ --config conf/<config_file> [--mode <mode>]
The config file regarding the file paths and the hyper parameters is src/athene/utils/ The descriptions of each field are followings:
Field | Description |
model_name | Name of the RTE model. Used as part of the path to save the trained RTE model. |
glove_path | Path to the pre-trained GloVe word embedding. Either point to the glove.6B.300d.txt.gz or the glove.6B.300d.txt file. |
fasttext_path | Path to the pre-trained FastText word embedding. Should point to the wiki.en.bin file. |
ckpt_folder | Path to the checkpoint folder for the trained RTE model. Default as model/<model_name>/rte_checkpoints. |
db_path | Path to the FEVER database file. |
dataset_folder | Path to the dataset folder. |
raw_training_set | Path to the original training set file. |
raw_dev_set | Path to the original development set file. |
raw_test_set | Path to the original test set file. |
training_doc_file | Path to the training set with predicted pages, i.e. the output of the training set through document retrieval sub-task. |
dev_doc_file | Path to the development set with predicted pages, i.e. the output of the development set through document retrieval sub-task. |
test_doc_file | Path to the test set with predicted pages, i.e. the output of the test set through document retrieval sub-task. |
training_set_file | Path to the training set with predicted evidences, i.e. the output of the training set through sentence retrieval sub-task. |
dev_set_file | Path to the development set with predicted evidences, i.e. the output of the development set through sentence retrieval sub-task. |
test_set_file | Path to the test set with predicted evidences, i.e. the output of the test set through sentence retrieval sub-task. |
document_k_wiki | The maximal number of candidate pages for each claim in the document retrieval sub-task. |
document_parallel | Whether to perform the document retrieval sub-task parallel. True or False. |
document_add_claim | Whether to append the original claim to the query to the MediaWiki API in the document retrieval sub-task. True or False. |
submission_file | Path to the final submission file. |
estimator_name | The name of the RTE estimator referring to src/athene/rte/utils/ |
max_sentences | The maximal number of predicted evidences for each claim. |
max_sentence_size | The maximal length of each predicted evidence. The words that exceed the maximal length are truncated. |
max_claim_size | The maximal length of each claim. The words that exceed the maximal length are truncated. |
seed | Random seed of the RTE sub-task. |
name | The prefix of the checkpoint files for the RTE sub-task. The checkpoint files will be saved in the <ckpt_folder>. |
'esim_hyper_param' field contains the hyper parameters regarding the ESIM based model in the RTE sub-task. The descriptions of several special parameters are followings:
Field | Description |
num_neurons | The number of neurons for each layer in the model. The first 2 numbers refer to the numbers of neurons of the two bidirectional RNNs in the ESIM model. |
pos_weight | The positive weights of the 3 classes for the weighted loss. The order is Supported, Refuted, Not Enough Info. |
max_checks_no_progress | Early stopping policy. Stop training if no improvement in the last x epochs. |
trainable | Whether to fine tune the word embeddings. True or False. |
'sentence_retrieval_ensemble_param' field contains the hyper parameters regarding the ESIM based model in the sentence retrieval sub-task. The descriptions of several special parameters are followings:
Field | Description |
num_model | The number of models to ensemble. |
tf_random_state | The random seeds for the models to ensemble. |
num_negatives | The number of negative sampling, i.e. false evidences, for each claim in the training phase. |
c_max_length | The maximal length of each claim. The words that exceed the maximal length are truncated. |
s_max_length | The maximal length of each candidate evidence sentence. The words that exceed the maximal length are truncated. |
reserve_embed | Whether to reserve slots in the word embeddings for unseen words. True or False. |
model_path | Path to the folder for the checkpoint files of the ensemble models. |
Configurations can be exported into json files. To export the current config set, run the script:
PYTHONPATH=src python src/scripts/athene/ <path/to/output/json>
To use exported configurations, launch the pipeline with argument:
PYTHONPATH=src python src/scripts/athene/ --config <path/to/output/json>
If you have any questions regarding the code, please, don't hesitate to contact the authors or report an issue.
- <lastname>
- Apache License Version 2.0