Human reading comprehension belongs to cognitive psychology field. Roughly, there are three different comprehension types, i.e., literal comprehension (字面理解), inference comprehension (推断理解) and critical comprehension (评价理解).
For machine reading comprehension (mrc), Deep read: A reading comprehension system in ACL 1999 gives the first study. Towards the Machine Comprehension of Text: An Essay by Microsoft gives a review. EMNLP 2014 best paper Modeling Biological Processes for Reading Comprehension proposes feature engineering based symbolic models. After that, lots of deep learning models appear. Tencent AI part 1 illustrates building blocks of mrc deep learning models. Tencent AI Part 2 proposes their new Dual Ask-Answer Network. bAbI datasets from Facebook gives the ai-complete concept. Neural Machine Reading Comprehension: Methods and Trends presents a new review about MRC.
- Passage
- Single or multiple
- Question
- Cloze or query
- Candidate
- Multiple choice (e.x., Co-Matching, HMA) and Opinion questions (DuReader)
- Answer
- Extraction or generation
-
BiDAF from AllenNLP, baseline for MS-MARCO
- Attention flow layer: context-to-query attention (i.e., which query words are most relevant to each context word,
softmax(row)
) and query-to-context attention (i.e., which context words have the closest similarity to one of the query word,softmax(max(column))
), based on similarity matrix - Similarity function
- Model structure
- Official implementation
- Model illustration
- BiDAF + Self attention + ELMo
- Attention flow layer: context-to-query attention (i.e., which query words are most relevant to each context word,
-
R-Net for MS-MARCO
- Core layer 1: gated (applied to passage word and attention-pooling of question) attention-based recurrent network matches passage and question to obtain question-aware passage representation
- Core layer 2: self-matching layer to aggregate the passage information
- Model structure
- Implementation: pytorch, tensorflow
-
S-Net for MS-MARCO
- Step 1: extracts evidence snippets by matching question and passage via pointer network. Add passage ranking as an additional task to conduct multi-task learning.
- Step 2: generate the answer by synthesizing the passage, question and evidence snippets via seq2seq. Evidence snippets are labeled as features.
-
- Separable convolution + self-attention (Each position as a query to match all positions as keys)
- Data augmentation via backtranslation
- Model structure
- Implementation
-
- Three loss for multiple answer span
- Average loss
- Weighted average loss
- Minimum value of the loss
- Combine passage ranking as multi-task learning
- As answer span can occur in multiple passages, pointwise sigmoid function instead of softmax function is used
- Minimum risk training
- Direct optimize the evaluation metric instead of maximizing MLE
- Prediction is only single answer span
- Three loss for multiple answer span
-
V-Net from Baidu NLP for MS-MARCO
- Embedding layer
- Character-level embedding
- 1D CNN (BiDAF)
- Last hidden states of BiRNN (R-Net)
- It is useful to OOV tokens
- Word-level embedding
- GloVe pre-trained embedding is used frequently
- Features
- Binary and weighted word-in-question (without stop-words) feature (FastQA)
- POS tag
- Query type
- Character-level embedding
- Encoding layer
- Concatenation of forword and backword hidden states of BiRNN (BiDAF)
[convolution-layer * # + self-attention layer + feed-forward layer]
(QANet)
- Context-query attention layer
- Context and query similarity matrix (BiDAF, QANet)
- Model layer
- BiRNN (BiDAF)
- Gated attention-based recurrent network (R-Net)
- Passage self-matching
[convolution-layer * # + self-attention layer + feed-forward layer]
(QANet)
- Output layer
- Direct output (BiDAF, QANet)
- Pointer network (R-Net)
- Simplify seq2seq mechanism
- It only points at the probability of elements and get a permutation of inputs
- Not all pointers is necessary, for mrc and summarization, for example, only two pointers is needed
- Multiple option
- Cloze
- English
- Chinese
- Question answering
- English
- Extractive
-
Single-hop (i.e., single document) reasoning
- SQuAD, extractive dataset
- SQuAD 2.0, for unanswerable questions
- Google Natural Questions contains short and long answers
-
Multi-hop reasoning
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
- Data source: English Wikipedia dump
- Provides supporting facts to support explainable reasoning
- Novel question type: comparison question, which contains yes/no question
- Baseline model: Simple and Effective Multi-Paragraph Reading Comprehension, official code and code by HotpotQA
- TriviaQA
- Has much longer context (2895 tokens per context on average) and may contain several paragraphs
- Much noisier than SQuAD due to the lack of human labeling
- Possible that the context is not related to the answer at all, as it is crawled by key words
- SearchQA
- CoQA: A Conversational Question Answering Challenge
- QuAC: Question Answering in Context
- AI2 Reasoning Challenge
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
-
- Generative
- Multi-hop reasoning
- MS-MARCO
- Data source: Bing queries. 10 passages per query. Download link for v1 and v2
- Answers are human-generated
- Give passage candidates, and annotate which passage is correct
- Baseline model: BiDAF
- MS-MARCO
- Multi-hop reasoning
- Extractive
- Chinese
- DuReader
- Data source: real anonymized user queries, contains more than 300K questions, 1.4M evident documents and human generated answers.
- Baseline model: BiDAF and Match-LSTM
- Advanced model: V-Net
- CMRC 2018
- DuReader
- English
- Exact Match
- Clean the text: remove a, an, the, whitespace, punctuation and lowercase
- Implementation
- F1
- It measures the portion of overlap tokens between the predicted answer and groundtruth
- Implementation
- BLEU
- ROUGE-L
-
Naturali video version, text version
- Data preprocess, implementation
- Model
-
Zhuiyi video, text 1 and text 2
- Data preprocess
- Filter out query or answer in None
- Context normalization, i.e., lowercase, punctuation
- Answer length limit, context length limit (threshold is determined by statistics)
- Data augmentation i.e., back-translation or similar QA data
- Training data quality, e.g., same query type has different answer format,
1963 year
,1990 year stop the usage
- Feature engineering
- Query type
- Who, when, where, how, number, why, how long
- ELMo
- Word level
- Query type
- Model (based on R-Net)
- Training
- Born-Again Neural Network, teacher = student
- Data preprocess
- Learning to ask (i.e., neural questioner)
- Open domain QA
- DrQA, see danqi's PHD thesis for detailed information
- DS-QA
- R^3
- Search
- Summarization -> KBQA -> MRC
- Context
- Model
- Syntax integration
- Syntax features include POS tags, NER results, linearized PCFG tree tags cannot give additional benefits (from discussion of R-Net).
- Transfer learning
- Word embedding: GloVe is better than word2vec
- Language model: CoVe, ELMo
- Unanswerable question type
- Add padding position
- Trainable bias
- Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
- Based on SQuAD, train answer and question generation network
gen
- For new dataset, use
gen
to generate new question, answer datasets, then train new MRC model based on these datasets
- Based on SQuAD, train answer and question generation network