Social Media Analytics

The training data from “Influencers in Social Networks” dataset from Kaggle was used to identify key predictors of social influence in Twitter.You can access the dataset from: https://www.kaggle.com/c/predict-who-is-more-influential-in-a-social-network/overview. The dataset has variables like retweets sent, following count, mentions sent, posts etc. for each person based on their Twitter activity. The model with the highest accuracy identified the top 5 predictors of social influence in Twitter.

Analysis

I first performed data wrangling on ~5k records to remove N/A and missing values. Following which, I found the correlation to better understand the strenght of relationship between the variables.
Using the training set, I created machine learning models in Python. I applied 4 models: Logistic Regression, Random Forest, K-Nearest Neighbours and XGBoost. Among these models,XGBoost gave the highest accuracy of 81% along with highest precision, recall and f-measure of 79%.
The key predictors of social influence according to XGBoost model are: A/B_listed_count, A/B_follower_count, A/B_network_feature_3, A/B_retweets_received and A/B_network_feature_2.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
README.md		README.md
Social Media Analytics for Twitter.ipynb		Social Media Analytics for Twitter.ipynb
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Social Media Analytics

Analysis

About

Releases

Packages

Languages

License

shimonyagrawal/Social-Media-Analytics-Twitter

Folders and files

Latest commit

History

Repository files navigation

Social Media Analytics

Analysis

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages