This repository is the official implementation of the ACL 2020 Paper Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading.
EMT+entailment
achieves new state-of-the-art results on ShARC conversational machine reading benchmark (Mar 2020).
If you find our code useful, please cite our paper as follows:
@article{gao-etal-2020-explicit,
title={Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading},
author={Yifan Gao and Chien-Sheng Wu and Shafiq R. Joty and Caiming Xiong and Richard Socher and Irwin King and Michael R. Lyu and Steven C. H. Hoi},
journal={ArXiv},
year={2020},
volume={abs/2005.12484}
}
Main environment (
PYT_EMT
)
conda create -n emt python=3.6
conda install pytorch==1.0.1 cudatoolkit=10.0 -c pytorch
conda install spacy==2.0.16 scikit-learn
python -m spacy download en_core_web_lg && python -m spacy download en_core_web_md
pip install pytorch-pretrained-bert==0.4.0 editdistance==0.5.2
UniLM question generation environment (
PYT_QG
)
# create conda environment
conda create -n qg python=3.6
conda install pytorch==1.1 cudatoolkit=10.0 -c pytorch
conda install spacy==2.0.16 scikit-learn
python -m spacy download en_core_web_lg && python -m spacy download en_core_web_md
pip install editdistance==0.5.2
# install apex
git clone -q https://github.com/NVIDIA/apex.git
cd apex
git reset --hard 1603407bf49c7fc3da74fceb6a6c7b47fece2ef8
python setup.py install --cuda_ext --cpp_ext
cd ..
# setup unilm
cd qg
pip install --editable .
Download ShARC data
mkdir data
cd data
wget --quiet https://sharc-data.github.io/data/sharc1-official.zip
unzip sharc1-official.zip
rm sharc1-official.zip
mv sharc1-official sharc
Download BERT, UniLM
mkdir pretrained_models
# BERT
wget --quiet https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz -O pretrained_models/bert-base-uncased.tar.gz
wget --quiet https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt -O pretrained_models/bert-base-uncased-vocab.txt
wget --quiet https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt -O pretrained_models/bert-large-cased-vocab.txt
wget --quiet https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased.tar.gz -O pretrained_models/bert-large-cased.tar.gz
# UniLM
wget --quiet https://unilm.blob.core.windows.net/ckpt/unilm1-large-cased.bin -O pretrained_models/unilmv1-large-cased.bin
cd pretrained_models
tar -zxvf bert-large-cased.tar.gz
rm bert-large-cased.tar.gz
You can also download our pretrained models and our dev set predictions:
- Decision Making Model: dm.pt
- Question Generation Model: qg.bin
- Predictions on the dev set: dev.preds.json
We would now set up our directories like this:
.
└── dm
└── ...
└── qg
└── ...
└── README.md
└── preprocess_dm.py
└── preprocess_qg.py
└── train_dm.sh
└── train_qg.sh
└── inference_e2e.sh
└── inference_oracle_qg.sh
└── data
└── sharc ...
└── pretrained_models
└── bert/unilm ...
└── dm.pt
└── qg.bin
└── dev.preds.json
preprocess decision making
PYT_EMT preprocess_dm.py
preprocess question generation
PYT_QG preprocess_qg.py
Decision Making + Underspecified Span Extraction
Configue PYT_EMT
in train_dm.sh
first, and run
mkdir -p saved_models
./train_dm.sh <GPU_ID>
The trained decision making model should be at saved_models/lew_10_lsw_0.6/seed_28/best.pt
by default.
The decision making predictions should be at saved_models/lew_10_lsw_0.6/seed_28/dev.preds.json
by default.
Question Generation
Configue PYT_QG
in train_qg.sh
first, and run
mkdir -p saved_models
./train_qg.sh <GPU_ID>
The trained question generation model should be at saved_models/unilm_16_0.00002_20/model.20.bin
by default.
Note: Because the dataset is relatively small (~20k), the results are highly dependent on your environment and the random seed. To replicate our results in the paper, you can use our pretrained models.
To evaluate EMT on the end-to-end task, configue PYT_QG
in inference_e2e.sh
, and run
./inference_e2e.sh <GPU_ID> pretrained_models/dm.pt pretrained_models/qg.bin path/to/bert/base/uncased/pt
Our model achieves the following performance on the development set using our pre-trained models:
Micro Acc. | Macro Acc. | BLEU1 | BLEU4 |
---|---|---|---|
73.22 | 78.28 | 67.48 | 53.2 |
You can replace dm.pt
& qg.bin
with your trained models to get your own results.
To evaluate EMT on the oracle QG task, configue PYT_QG
in inference_oracle_qg.sh
, and run,run
./inference_oracle_qg.sh <GPU_ID> <path-to-dev.preds.json> <path-to-trained-qg-model>
# OR, use our pretrained QG model and our dev predicted data
./inference_oracle_qg.sh <GPU_ID> pretrained_models pretrained_models/qg.bin
Oracle question generation results on the Dev. set:
BLEU1 | BLEU4 |
---|---|
63.50 | 48.65 |
Portions of the source code are based on the E3 project.
The work was done when the first author was an intern at Salesforce Research.
If you have any issue, please open an issue or contact [email protected]