Project 1 - EDA and visualization on the Annual Vital Statistics Report-CRS for the years 2011 through 2016.
Report Link :
- Packages Used : tabulizer,reshape,ggplot2
- Key Tasks
- Extracted data accurately from the given PDF files done semi-automatically, i.e., without having to re-type the data.
- Load the data into Rstudio
- Compute basic statistics for the data using R (i.e., min, max, mean, median, mode, variance, std deviation, IQR, etc.)
- Detected any outliers in the chosen data
- Produced different plots using R - simple scatter plots, bar graphs, line graphs, histograms etc.
Dataset Link : and also extracted batsman and baller rankings from cricbuzz
- Packages Used : tabulizer, dplyr, ggplot2, reshape2, magittr, tidyr
- Key Tasks
- Extracted data for individual player with corresponding run in each match.
- Descriptive statistics and coefficient of variance of top 10 players.
- Descriptive and inferential statistics of IPL 2019 and plots.
Dataset - inbuilt timeseries dataset of uk driver death in r
- Packages Used : ggplot2, Metrics, forecast, reshape
- Key Tasks
- Built a timeSeries object with the data.
- Ploted the yearly mean values.
- Decomposed the time series using stl function.
- Obtain residue after removing trend and seasonality.
- Built a model using HoltWinters model for the period upto about 75% of the data.
- Predicted the values for the next 25% of the time.
- Built an ARIMA model for the period up to about 75% of the data.
- Plotted time series plots.
- Found out ARIMA works better than Holtwinters.
Dataset -
- Packages Used : ggplot2, readr, tm, wordcloud, plyr, lubridate, syuzhet
- Key Tasks
- Preprocessed 15000 tweets to tokenize, lemmitize, count word frequencies.
- Find top common most occuring words
- Performed Sentimental Analysis on tweet data
- Created cluster into group of related messsages
- Created word cloud and other visualizations
- Conducted test of hypothesis