Introduction

Human reading comprehension belongs to cognitive psychology field. Roughly, there are three different comprehension types, i.e., literal comprehension (字面理解), inference comprehension (推断理解) and critical comprehension (评价理解).

For machine reading comprehension (mrc), Deep read: A reading comprehension system in ACL 1999 gives the first study. Towards the Machine Comprehension of Text: An Essay by Microsoft gives a review. EMNLP 2014 best paper Modeling Biological Processes for Reading Comprehension proposes feature engineering based symbolic models. After that, lots of deep learning models appear. Tencent AI part 1 illustrates building blocks of mrc deep learning models. Tencent AI Part 2 proposes their new Dual Ask-Answer Network. bAbI datasets from Facebook gives the ai-complete concept. Neural Machine Reading Comprehension: Methods and Trends presents a new review about MRC.

MRC components:

Passage
- Single or multiple
Question
- Cloze or query
Candidate
- Multiple choice (e.x., Co-Matching, HMA) and Opinion questions (DuReader)
Answer
- Extraction or generation

Deep learning Models

Model list

BiDAF from AllenNLP, baseline for MS-MARCO
- Attention flow layer: context-to-query attention (i.e., which query words are most relevant to each context word, softmax(row)) and query-to-context attention (i.e., which context words have the closest similarity to one of the query word, softmax(max(column))), based on similarity matrix
- Similarity function
- Model structure
- Official implementation
- Model illustration
- BiDAF + Self attention + ELMo
R-Net for MS-MARCO
- Core layer 1: gated (applied to passage word and attention-pooling of question) attention-based recurrent network matches passage and question to obtain question-aware passage representation
- Core layer 2: self-matching layer to aggregate the passage information
- Model structure
- Implementation: pytorch, tensorflow
S-Net for MS-MARCO
- Step 1: extracts evidence snippets by matching question and passage via pointer network. Add passage ranking as an additional task to conduct multi-task learning.
- Step 2: generate the answer by synthesizing the passage, question and evidence snippets via seq2seq. Evidence snippets are labeled as features.
QANet
- Separable convolution + self-attention (Each position as a query to match all positions as keys)
- Data augmentation via backtranslation
- Model structure
- Implementation
Multi-answer Multi-task
- Three loss for multiple answer span
  - Average loss
  - Weighted average loss
  - Minimum value of the loss
- Combine passage ranking as multi-task learning
  - As answer span can occur in multiple passages, pointwise sigmoid function instead of softmax function is used
- Minimum risk training
  - Direct optimize the evaluation metric instead of maximizing MLE
- Prediction is only single answer span
Match-LSTM
U-Net
- Illustration
Dual Ask-Answer Network
Gated Self-Matching Networks
V-Net from Baidu NLP for MS-MARCO
FastQA, comment
Documentqa
Model reviews part 1 and part 2

Model structure

Embedding layer
- Character-level embedding
  - 1D CNN (BiDAF)
  - Last hidden states of BiRNN (R-Net)
  - It is useful to OOV tokens
- Word-level embedding
  - GloVe pre-trained embedding is used frequently
- Features
  - Binary and weighted word-in-question (without stop-words) feature (FastQA)
  - POS tag
  - Query type
Encoding layer
- Concatenation of forword and backword hidden states of BiRNN (BiDAF)
- [convolution-layer * # + self-attention layer + feed-forward layer] (QANet)
Context-query attention layer
- Context and query similarity matrix (BiDAF, QANet)
Model layer
- BiRNN (BiDAF)
- Gated attention-based recurrent network (R-Net)
- Passage self-matching
- [convolution-layer * # + self-attention layer + feed-forward layer] (QANet)
Output layer
- Direct output (BiDAF, QANet)
- Pointer network (R-Net)
  - Simplify seq2seq mechanism
  - It only points at the probability of elements and get a permutation of inputs
  - Not all pointers is necessary, for mrc and summarization, for example, only two pointers is needed

Dataset

Multiple option
- MCTest
Cloze
- English
  - CNN/Daily-Mail
  - CBT
- Chinese
  - PeopleDaily/CFT
Question answering
- English
  - Extractive
    - Single-hop (i.e., single document) reasoning
      - SQuAD, extractive dataset
      - SQuAD 2.0, for unanswerable questions
      - Google Natural Questions contains short and long answers
    - Multi-hop reasoning
      - HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
        
        Data source: English Wikipedia dump
        
        Provides supporting facts to support explainable reasoning
        
        Novel question type: comparison question, which contains yes/no question
        
        Baseline model: Simple and Effective Multi-Paragraph Reading Comprehension, official code and code by HotpotQA
      - TriviaQA
        
        Has much longer context (2895 tokens per context on average) and may contain several paragraphs
        
        Much noisier than SQuAD due to the lack of human labeling
        
        Possible that the context is not related to the answer at all, as it is crawled by key words
      - SearchQA
      - CoQA: A Conversational Question Answering Challenge
      - QuAC: Question Answering in Context
      - AI2 Reasoning Challenge
  - Generative
    - Multi-hop reasoning
      - MS-MARCO
        
        Data source: Bing queries. 10 passages per query. Download link for v1 and v2
        
        Answers are human-generated
        
        Give passage candidates, and annotate which passage is correct
        
        Baseline model: BiDAF
- Chinese
  - DuReader
    - Data source: real anonymized user queries, contains more than 300K questions, 1.4M evident documents and human generated answers.
    - Baseline model: BiDAF and Match-LSTM
    - Advanced model: V-Net
  - CMRC 2018

Evaluation metrics

Exact Match
- Clean the text: remove a, an, the, whitespace, punctuation and lowercase
- Implementation
F1
- It measures the portion of overlap tokens between the predicted answer and groundtruth
- Implementation
BLEU
ROUGE-L

In action

Sougou MRC Toolkit, paper
2019 Dureader competition
2018 DuReader competition summary
Naturali video version, text version
- Data preprocess, implementation
- Model
Paperweekly seminar
Zhuiyi video, text 1 and text 2
- Data preprocess
  - Filter out query or answer in None
  - Context normalization, i.e., lowercase, punctuation
  - Answer length limit, context length limit (threshold is determined by statistics)
  - Data augmentation i.e., back-translation or similar QA data
  - Training data quality, e.g., same query type has different answer format, 1963 year, 1990 year stop the usage
- Feature engineering
  - Query type
    - Who, when, where, how, number, why, how long
  - ELMo
    - Word level
- Model (based on R-Net)
  - Embedding
    - ELMo only (without word2vec)
    - POS embedding
    - Query type embedding
    - Binary word-in-question feature
  - Encoding
    - Multi-layer BiGRU
  - Context-query attention
    - Gated-dropout (filtering useful message) for query
  - Prediction
    - Pointer network
    - Probability = start * stop
- Training
  - Born-Again Neural Network, teacher = student

Applications

Learning to ask (i.e., neural questioner)
Open domain QA
- DrQA, see danqi's PHD thesis for detailed information
- DS-QA
- R^3
Search
- Summarization -> KBQA -> MRC

Take-home messages

Context
- Minimal context in ACL 2018
Model
- Make model as simple as possible, see FastQA
- Speed up using QRNN, SRU or [Skim-RNN](Neural Speed Reading via Skim-RNN)
Syntax integration
- Syntax features include POS tags, NER results, linearized PCFG tree tags cannot give additional benefits (from discussion of R-Net).
Transfer learning
- Word embedding: GloVe is better than word2vec
- Language model: CoVe, ELMo
Unanswerable question type
- Add padding position
- Trainable bias
- Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension
  - Based on SQuAD, train answer and question generation network gen
  - For new dataset, use gen to generate new question, answer datasets, then train new MRC model based on these datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introduction

Deep learning Models

Model list

Model structure

Dataset

Evaluation metrics

In action

Applications

Take-home messages

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

Deep learning Models

Model list

Model structure

Dataset

Evaluation metrics

In action

Applications

Take-home messages