Twitter Sentiment Analysis

This project performs sentiment analysis on Twitter data using machine learning techniques. It includes data preprocessing, feature extraction, and classification to predict whether a tweet has a positive or negative sentiment.

Introduction

Sentiment analysis, also known as opinion mining, is the process of determining the sentiment or emotional tone expressed in a piece of text, such as a tweet. In this project, we analyze tweets from Twitter to classify them as either positive or negative sentiment.

Data

We used a Twitter dataset containing 1.6 million tweets with labels indicating their sentiment (0 for negative and 1 for positive). The dataset includes various features, including tweet text, timestamp, and user information.

Preprocessing

Removed URLs from tweet text.
Removed punctuation marks from tweet text.
Removed stopwords (common words like "the," "and," "is") from tweet text.
Cleaned repeated characters (e.g., "loooove" becomes "love").
Removed numbers from tweet text.
Removed rare words that do not contribute significantly to sentiment analysis.
Removed special characters and extra white spaces.
Tokenized the text and performed stemming and lemmatization.

Feature Extraction

We used the TF-IDF (Term Frequency-Inverse Document Frequency) vectorization technique to convert the preprocessed text into numerical features. This technique helps in creating feature vectors for machine learning models.

Model Training and Evaluation

We trained a Bernoulli Naive Bayes (BNB) classifier using the TF-IDF vectors as features. The model was evaluated using classification metrics and a ROC curve. The confusion matrix and ROC curve are visualized in the README for performance assessment.

Usage

Clone this repository:

git clone https://github.com/supriya811106/Twitter-Sentiment-Analysis.git

Install the required dependencies (see the Dependencies section).
Run the Jupyter Notebook or Python script to preprocess the data, extract features, train the model, and perform sentiment analysis on new tweets.
Modify the code as needed for your specific use case.

Dependencies

Python 3.x
Jupyter Notebook (optional, for running the project interactively)
Pandas
Numpy
Scikit-learn
Matplotlib
Seaborn
NLTK (Natural Language Toolkit)
WordCloud
RegexpTokenizer
TfidfVectorizer
BernoulliNB

You can install these dependencies using pip:

pip install pandas numpy scikit-learn matplotlib seaborn nltk wordcloud

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
Untitled.ipynb		Untitled.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment Analysis

Table of Contents

Introduction

Data

Preprocessing

Feature Extraction

Model Training and Evaluation

Usage

Dependencies

About

Releases

Packages

Languages

supriya811106/Twitter-Sentiment-Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis

Table of Contents

Introduction

Data

Preprocessing

Feature Extraction

Model Training and Evaluation

Usage

Dependencies

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages