Skip to content
/ ml Public

A 60 days+ streak of daily learning of ML/DL/Maths concepts through projects

Notifications You must be signed in to change notification settings

mandliya/ml

Repository files navigation

alt tag

Current Status Stats
Total Machine Learning Projects 35
Current Daily Streak 76
Last Streak Dates 06/23/2019 - 07/02/2019
Current Streak Dates 04/13/2020 - 06/27/2020
Daily Log Progress daily_log.md

On break till 07/06. I will re-start the new streak then.

Machine Learning and Deep Learning Projects

Hands on Machine Learning

No. Project Description Notebook Notes
1. The Machine Learning Landscape The basics of machine learning terminology, types and challenges To be updated Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
2. End to End Machine Learning Project In this project we will go through an example project end to end, pretending to be a recently hired data scientist in a real estate company.Here are the main steps you will go through:
  1. Look at the big picture.
  2. Get the data.
  3. Discover and visualize the data to gain insights.
  4. Prepare the data for Machine Learning algorithms.
  5. Select a model and train it.
  6. Fine-tune your model.
  7. Present your solution.
  8. Launch, monitor, and maintain your system
End_to_end_machine_learning_project.ipynb Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
3. Classification In this project we will go through concepts of Classification by building a digit classifier using MNIST dataset. We will learn concepts of performance measurement for classfication (e.g. Confusion Matrix, Precision and Recall, The ROC curve etc) Classification.ipynb Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow
3.4 Classification Exercise Buiding a Spam classifier spam_classifier.ipynb Source: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Data Wrangling

No. Project Description Notebook Notes
1. Data Wrangling using Quandl Api We retrieve financial data of a stock using quandl api and do basic data analysis using plain vanilla python. data_wrangling_using_api.ipynb -
2. Pandas from scratch This notebook takes an in-depth look at Pandas, the swiss army knife for data analysis.
  • Exploring pandas data-structures (Series and DataFrame) in detail
  • We fetch Google's stock data and perform various data analysis on it which includes reading data from various sources, filter, visualize, and apply statistics on top of it.
pandas_handson.ipynb -
3. Never on a Friday or Turning Tuesday A simple exploratory data analysis of stock market data to determine if Tuesdays are Turning Tuesdays. never_on_a_friday.ipynb -
4. Handling missing values in pandas This notebook provides a good overview of how pandas handle missing values and explores functions it provides to handle missing data. handling_missing_data.ipynb
5. Mini-Project: Data Wrangling and Transformation with Pandas In this mini-project we explore multiple datasets (movie, cast, release) to do an extensive data exploration, analysis and visualtion. data_wrangling_transformations_movie.ipynb -
6. Data Wrangling with JSON This notebook helps understanding Panda's JSON functionality. It also has some challenges which require some fun data-wrangling (e.g. missing values etc). Mini_Project_Wrangling_Json_Exercise.ipynb -
7. 67 years of Lego An exploratory data analysis of fun dataset on every single lego block that has ever been built. Lot of good pandas aggregation lego_analysis.ipynb Source: Datacamp
8. Explore the crypto-currency Bitcoin market In this notebook we do an in-depth analysis of crypto-currency market cap analysis, and visualize top gainers and losers in a fun way! This analysis tells you how risky or profitable this market is currently. cryptocurrency_analysis.ipynb -
9. Discovery of Handwashing This notebook tells the story of discovery of handwashing, and how Dr. Ignaz Semmelweis brought down the deaths of women who just gave birth caused childbed fever . discovery_of_handwashing.ipynb -
10. Exploring evaluation of linux This notebook does exploratory data analysis on Linux git commit history. A good lot of pandas! exploring_the_evaluation_of_linux.ipynb -
11. The github history of Scala Language This notebook explore the pull requests of Scala language project on github and does interesting analysis of pull requests based on authors, year, months etc EDA_scala_history.ipynb Source: Datacamp
12. Who is drunk and when in Ames, Iowa Ames, Iowa is home to Iowa State University. Ames has had its fair share of alcohol-related incidents. (For example, Google 'VEISHEA riots 2014'). In this notebook, we analyze and visualize some breath alcohol test data from Ames that is published by the State of Iowa. EDA_Ames_iowa_drinking.ipynb Source: Datacamp
13. Working with strings in Pandas In this notebooks, we explore string manipulation and easy analysis string operations on pandas. Working_With_Strings.ipynb Source:Python Data Science Handbook
14. A New Era of Data Analysis in Baseball In this notebook, we will use Statcast data to compare the home runs of two of baseball's brightest (and largest) stars, Aaron Judge (6'7") and Giancarlo Stanton (6'6"), both of whom now play for the New York Yankees. data_analysis_in_pandas.ipynb source: Datacamp
15. Name Game: Gender prediction using sound A fun analysis of NYT's authors dataset of Children’s Picture Books. We analyze the gender distribution of authors to see if there have been changes over years based on author's names and how they sound using nysiis algorithm. name_game.ipynb source: Datacamp
16. Exploring the Titanic Dataset using Pandas An exploratory analysis of Titanic Dataset from Kaggle, few tips to get summary statistics. Exploring_titanic_dataset_using_pandas.ipynb source: Pandas Docs
17. Risk and Returns: The Sharpe Ratio In this notebook we learn about Sharpe Ratio by calculating it for the stocks of Amazon and Facebook, to figure which one is better investment. We use S&P 500 as the benchmark which measures the performance of 500 largest stocks in the US. risk_and_returns_the_sharpe_ratio.ipynb source: Datacamp
18. Generating Keywords for Google Ads A quick project to learn Pandas by generating keywords for google ad campaign . generating_keywords_for_google_ads source: Datacamp

SQL

No. Project Description Notebook Notes
1. SQL Spark at scale In this notebook, we work through a series of exercises using Spark SQL and familiarize ourselves with how SQL works with spark. Mini_Project_SQL_with_Spark.ipynb One of the ways to use this notebook is to try domino trial, create a pyspark workspace and launch this notebook, as we need a pyspark environment.

Mathematics

No. Project Description Notebook Notes
1. Linear Algebra Basics In this notebook, we explore basic concepts of Linear Algebra. Linear_Algebra_Basics.ipynb Source: Introduction to Linear Algebra for Applied Machine Learning with Python.
2. Probability and Random Processes A list of basics of Probability concepts. Probability and Random Processes.ipynb
2. Counting A primer A list of basics of counting concepts. Counting.ipynb

Time Series

No. Project Description Notebook Notes
1. Working with time series in python This notebook teaches basics of time series analysis. We take a fun dataset of Seattle's Fremont Bridge bicycle data and Google's stock data to visualize, understand and work through dates and time in Python Time_series_basics.ipynb. Data is fetched directly from web.

Kaggle

No. Project Description Notebook Notes
1. Titanic: Machine Learning from Disaster This notebook has the walk-through of Kaggle's iconic Titanic problem, learning from the best kernels there. Also this a solution of exercise 2 of chapter 3 of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow titanic_competition.ipynb. This notebook is downloaded from Kaggle's kernel.

Mini Projects

No. Project Description Notebook Notes
1. Predicting Credit Card Approvals In this notebook, we will build an automatic credit card approval predictor using machine learning techniques, just like the real banks do. We explore the data, clean it, impute it, and then apply logistic regression to predict the credit card approval. Predicting Credit Card Approvals Source: Datacamp
2. Find Movie Similarity From Plot Summaries In this notebook, we will quantify the similarity of movies based on their plot summaries available on IMDb and Wikipedia, then separate them into groups, also known as clusters. We'll create a dendrogram to represent how closely the movies are related to each other.. Find Movie Similarity From Plot Summaries Source: Datacamp
3. Reducing traffic mortality in the USA In this notebook, we do a deep data analysis, data wrangling, plotting, dimensionality reduction, and unsupervised clustering on the data collected by the National Highway Traffic Safety Administration and the National Association of Insurance Commissioners Reducing traffic mortality in the USA Source: Datacamp
3. Word Frequency in Moby Dick A fun mini-project in which we perform basic NLP tasks using requests, BeautifulSoup and nltk library Word Frequency in Moby Dick Source: Datacamp

Courses

No. Project Description Notebook Notes
1. Statistical Thiking in Python This course principles of statistical inference. In this course, you will start building the foundation you need to think statistically, speak the language of your data, and understand what your data is telling you. The foundations of statistical thinking took decades to build, but can be grasped much faster today with the help of computers. With the power of Python-based tools, you will rapidly get up-to-speed and begin thinking statistically by the end of this course. Statistical Thinking Part 1 Datacamp
2. CS 224N NLP with Deep Learning The Stanford course of Natural Language Processing using Deep Learning Lecture-1-Introduction-and-Word-Vectors.ipynb , Word2Vec_from_scratch -

Algorithms from Scratch

No. Project Description Notebook Notes
1. Linear Regression This notebook is everything Linear Regression, all the concepts about it. Linear Regression -