This repo contains my projects on Data Cleaning and Manipulation, I have covered diverse topics under each project, You can see the description for each project below.
1)A New Era of Data Analysis in Baseball
In this notebook, we're going to wrangle, analyze, and visualize Statcast data to compare two baseball players named Aaron Judge and Giancarlo Stanton, we will use data visualizations like scatter plot, KDE, 2D histogram and python 'def' method to create functions for our 2D histogram.
2)A Visual History of Nobel Prize Winners
A very interesting project I worked on, over here I analyzed the past Nobel Prize winners and tried to draw insights like 'How many males and females won the prize', 'How many of the winners were from USA', 'How many won the prize more than once' and 'How dominant were the winners when it came to country / gender', for answering this questions we used various data maniupulation techniques like group by, value counts and also used Data Visualization techniques like Line plot, lmplot.
In this project we performed AB testing on a popular game called as Cookie Cats, during the game players occasionally encounter gates that force them to wait a non-trivial amount of time or make an in-app purchase to progress. In this project we analyzed by AB-testing whether is it better to keep the gate at level 30 or level 40 by analyzing various factor of performance.
4)Dr. Semmelweis and the Discovery of Handwashing
In this project we analyze the popular Discovery of Handwashing by Dr. Semmelweis and how it helped reduce number of death among infants and preganant women, we used visualizationa and various mathematical functions inorder to conclude how much the discovery helped reducec the death rate.
One of the first and basic project done by me, used simple data manipulation techniques to analyze colors of lego blocks and also the different lego sets build over the years.
6)Exploring the Bitcoin Cryptocurrency Market
In this project we explore the data of Bitcoin Market, we see the market capitalization of top companies and analyze how volatile is the bitcoin market. We further analyzed to see at how much cost did the companies begin and how quickly their rates plunged.
7)Exploring the evolution of Linux
In this notebook, we analyzed the evolution of a very famous open-source project – the Linux kernel. The Linux kernel is the heart of some Linux distributions like Debian, Ubuntu or CentOS.
We get some first insights into the work of the development efforts by
- Identifying the TOP 10 contributors and
- Visualizing the commits over the years.
8)Exporing the Ames Iowa dataset
In this project we explored the breath alcohol tests from Ames Iowa. We analyzed what time of the day/moth were the tests mostly cocnducted. We also tried to find out if there was any pattern in the time of which the tests were conducted.
9)PROJECT WHICH DEBTS ARE WORTH THE BANK'S EFFORT
In this project I analysed different recovery procedures take by the bank for various loan categories, I tried to find out if the money invested in making this procedures run is actually giving the return to the banks using Statistical Tests and Exploratory graphical analysis in this project.
10)TV, Halftime Shows, and the Big Game
A very interesting project. I analysed all the performers that have performed in Super Bowl and analysed various parameters like band or singers who have performed more than once, number of songs performed during halftime, how has the viewership evolved of super bowl and also how likely are user to stay till the end of the match.
11)The GitHub History of the Scala Language
Scala is an open source project. Open source projects have the advantage that their entire development histories -- who made changes, what was changed, code reviews, etc. -- publicly available.
In this project I read, cleaned, and visualized the real world project repository of Scala that spans data from a version control system (Git) as well as a project hosting site (GitHub). We found out who had the most influence on its development and who were the experts.
In this project I used data from the Government of India website regarding the suicide rates and tried to analyse differenet parameters which I have explained in my blog here. I have used web scrapping to as well as data manipulation to bring out the insights for this project
One of the project which I enjoyed working on throughly. I used web scraping to get raw data from a cricket stats website. Later I formatted that data and tried to analyse it and gathered some really cool insights which you can read here