Skip to content

smdp2000/Projects_On_Data_Analytics_And_Visualizations_With_R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project 1 - EDA and visualization on the Annual Vital Statistics Report-CRS for the years 2011 through 2016.

Report Link : http://www.ejanma.karnataka.gov.in/frmVitalSat.aspx

  • Packages Used : tabulizer,reshape,ggplot2
  • Key Tasks
    • Extracted data accurately from the given PDF files done semi-automatically, i.e., without having to re-type the data.
    • Load the data into Rstudio
    • Compute basic statistics for the data using R (i.e., min, max, mean, median, mode, variance, std deviation, IQR, etc.)
    • Detected any outliers in the chosen data
    • Produced different plots using R - simple scatter plots, bar graphs, line graphs, histograms etc.

Project 2 - IPL 2019 Data Analysis.

Dataset Link : https://www.kaggle.com/nowke9/ipldata and also extracted batsman and baller rankings from cricbuzz

  • Packages Used : tabulizer, dplyr, ggplot2, reshape2, magittr, tidyr
  • Key Tasks
    • Extracted data for individual player with corresponding run in each match.
    • Descriptive statistics and coefficient of variance of top 10 players.
    • Descriptive and inferential statistics of IPL 2019 and plots.

Project 3 - Time Series Analysis of UK Driver Death Dataset

Dataset - inbuilt timeseries dataset of uk driver death in r

  • Packages Used : ggplot2, Metrics, forecast, reshape
  • Key Tasks
    • Built a timeSeries object with the data.
    • Ploted the yearly mean values.
    • Decomposed the time series using stl function.
    • Obtain residue after removing trend and seasonality.
    • Built a model using HoltWinters model for the period upto about 75% of the data.
    • Predicted the values for the next 25% of the time.
    • Built an ARIMA model for the period up to about 75% of the data.
    • Plotted time series plots.
    • Found out ARIMA works better than Holtwinters.

Project 4 - Text Analytics on Demonitization Twitter Data

Dataset - https://www.kaggle.com/arathee2/demonetization-in-india-twitter-data

  • Packages Used : ggplot2, readr, tm, wordcloud, plyr, lubridate, syuzhet
  • Key Tasks
    • Preprocessed 15000 tweets to tokenize, lemmitize, count word frequencies.
    • Find top common most occuring words
    • Performed Sentimental Analysis on tweet data
    • Created cluster into group of related messsages
    • Created word cloud and other visualizations
    • Conducted test of hypothesis

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages