Overview and benchmark of traditional and deep learning models in text classification

Original post: https://ahmedbesbes.com/overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification.html

This article is an extension of a previous one I wrote when I was experimenting sentiment analysis on twitter data. Back in the time, I explored a simple model: a two-layer feed-forward neural network trained on keras. The input tweets were represented as document vectors resulting from a weighted average of the embeddings of the words composing the tweet.

The embedding I used was a word2vec model I trained from scratch on the corpus using gensim. The task was a binary classification and I was able with this setting to achieve 79% accuracy.

The goal of this post is to explore other NLP models trained on the same dataset and then benchmark their respective performance on a given test set.

We'll go through different models: from simple ones relying on a bag-of-word representation to a heavy machinery deploying convolutional/recurrent networks: We'll see if we'll score more than 79% accuracy!

Here are the models that have been tested:

Logistic regression with word ngrams
Logistic regression with character ngrams
Logistic regression with word and character ngrams
Recurrent neural network (bidirectional GRU) without pre-trained embeddings
Recurrent neural network (bidirectional GRU) with GloVe pre-trained embeddings
Multi channel Convolutional Neural Network
RNN (Bidirectional GRU) + CNN model

By the end of this post, you will have a boilerplate code for each of these NLP techniques. It'll help you kickstart your NLP project and eventually achieve state-of-the art results (some of these models are really powerful).

Here's a sneak peak of the final result:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
images/article_5		images/article_5
models		models
predictions		predictions
README.md		README.md
article_5.ipynb		article_5.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview and benchmark of traditional and deep learning models in text classification

About

Releases

Packages

Languages

ahmedbesbes/overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification

Folders and files

Latest commit

History

Repository files navigation

Overview and benchmark of traditional and deep learning models in text classification

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages