Skip to content

Latest commit

 

History

History
99 lines (59 loc) · 3.51 KB

README.md

File metadata and controls

99 lines (59 loc) · 3.51 KB

Twitter-sentiment-analysis Language_support Last_commit Workflow Tensorflow_version

A sentiment analysis model trained using a Kaggle GPU. Sentiment140 Dataset, with 1.6 million tweets.

**Deployed on my personal Docker Hub repository: Click here

**Kaggle Notebook link: Kaggle notebook

Dataset (Sentiment140+GloVe)

  • Train/test split : 90% / 10%
  • Size : 1.6M samples
  • Link : Dataset

Model

  • Model type : Sequential, RNN, Binary classification
  • Optimizer : Adam
  • Loss function : Binary cross entropy
  • Outputs : Sentiment score [0;1]
  • Thresholds (fine-tuned): >=0.625 ---> "Positive", <0.625 ----> "Negative"
  • Best validation accuracy : 83%
  • F1-score : 0.8340
  • Version : 4
Metric Score
Precision Negative: 0.84; Positive: 0.82
Recall Negative: 0.82; Positive: 0.84
F-1 score Negative: 0.83; Positive: 0.83

Training

  • Training epochs : initially 50, but 22 with early stopping and a patience factor = 10
  • Training environment : Kaggle GPU

Architecture

Model_architecture

Inferences (with Tensorflow Serving REST API)

Inference example

Some results using Power BI + Python

Positive tweets

Positives

Negative tweets

Negatives

Data by country (when available)

Country

Useful scripts and notebooks

Notebooks

Training notebook

How inferences were made on our dataset

Data cleaning notebook

Data exploration notebook

Scripts

Link to the Tensorflow Sevring script

**There's also a useful script (command line runner) that converts .h5 models to TF SavedModel format here Args

Data collection (tweets about Messi and Ronaldo)

NOTE: Executing these scripts requires a developer account, as well as a bearer_token stored into a text file whose path is manually given into the code, or exported as an environment variable

Libraries

  • Deep Learning Framework : Tensorflow 2.6 or higher
  • Data visualization : Pandas, Seaborn, Matplotlib
  • Regular expressions builder : re
  • NLP library : NLTK
  • Train/test splitting, classification_report : Scikit-learn