Skip to content

ACM Research 2020 | Analyzing text summarization with five different extractive and abstractive models.

Notifications You must be signed in to change notification settings

ACM-Research/NLP-Summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural Language Processing for Text Summarization

Poster

img

Objective

As Natural Language Processing models and tools are becoming more powerful and precise, automated text summarization to extract critical information out of bodies of text has become more practical and important in a world surrounded by information. Researchers in NLP are constantly refining their tools and models to becoming the state of the art, but summarization models are often pigeonholed into a theme of text. We seek to analyze how different models perform summarization on varying corpora of text and to understand the comparative strengths and weaknesses that impact summarization performance.

Models

Summarization models can be broken into two types depending on how they generate summaries.

  • Extractive summarization is a type of summarization where words within the body of text are chosen to be part of the summary. For example, extractive summarization is similar to highlighting important information when reading a paper and combining the highlighted parts of text to create a summary.

  • Abstractive summarization models learn an internal language representation of the text to create its own unique human-like summarizations. For example, reading a piece of text and writing a summary in your own words is a type of abstractive summarization.

The models we will analyze include:

  • BERT (extractive)
  • spaCy (extractive)
  • t5 (abstractive)
  • ERNIE (abstractive)
  • Pegasus (abstractive)

Tools/Resources

Text Evaluation

ROUGE - Recall-Oriented Understudy for Gisting Evaluation. Generally it is a set of metrics for evaluating summarization of texts

pyRouge - Python wrapper for the ROUGE summarization evaluation package.

Text Statistical Analysis

textstat - Python package to calculate statistics from text to determine readability, complexity and grade level of a particular corpus.

Natural Language Processing

spaCy - Open source package used to perform information extraction and build natural language understanding systems for NLP analysis.

Contributors

About

ACM Research 2020 | Analyzing text summarization with five different extractive and abstractive models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published