Automatically generate headlines to short articles

This project attempts to reproduce the results in the paper: Generating News Headlines with Recurrent Neural Networks

How to run

Software

The code is running with jupyter notebook
Install Keras
pip install python-Levenshtein

Data

It is assumed that you already have training and test data. The data is made from many examples (I'm using 684K examples), each example is made from the text from the start of the article, which I call description (or desc), and the text of the original headline (or head). The texts should be already tokenized and the tokens separated by spaces.

Once you have the data ready save it in a python pickle file as a tuple: (heads, descs, keywords) were heads is a list of all the head strings, descs is a list of all the article strings in the same order and length as heads. I ignore the keywords information so you can place None.

Build a vocabulary of words

The vocabulary-embedding notebook describes how a dictionary is built for the tokens and how an initial embedding matrix is built from GloVe

Train a model

train notebook describes how a model is trained on the data using Keras

Use model to generate new headlines

predict generate headlines by the trained model and showes the attention weights used to pick words from the description. The text generation includes a feature which was not described in the original paper, it allows for words that are outside the training vocabulary to be copied from the description to the generated headline.

Examples of headlines generated

Good (cherry picking) examples of headlines generated

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
attention_weights.png		attention_weights.png
cherry_picking.png		cherry_picking.png
cherry_picking1.png		cherry_picking1.png
predict.ipynb		predict.ipynb
train.ipynb		train.ipynb
vocabulary-embedding.ipynb		vocabulary-embedding.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatically generate headlines to short articles

How to run

Software

Data

Build a vocabulary of words

Train a model

Use model to generate new headlines

Examples of headlines generated

Examples of attention weights

About

Releases

Packages

Contributors 2

Languages

License

udibr/headlines

Folders and files

Latest commit

History

Repository files navigation

Automatically generate headlines to short articles

How to run

Software

Data

Build a vocabulary of words

Train a model

Use model to generate new headlines

Examples of headlines generated

Examples of attention weights

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages