AI/ML Training

A carefully-curated repository of Artificial Intelligence (AI), Machine Learning (ML), and Data Science (DS) literature with practical implementations using Jupyter Notebook "skill-builders".

Tip of the Day

Check out our new Reading Rack where we review the important literature in each sub-field of ML. Here we tackle 10 papers a week so that we quickly learn the current state-of-the-art. We will also add tutorials to see the papers applied to real problems using python or matlab code.

1. Project Overview

The Data Science Boot Camp (DSBC) team has spent years developing tools and training materials for Applied Math and Data Science. In this project you will find the following tools.

"5 Questions a Data Scientist Should Ask Their Customer" is a one-page list of questions any data scientist should have with them to define the problem they are trying to solve and set expectations with their customer.
"Notes on ..." these are cheat sheets on various topics of math for Machine Learning, e.g. algebra, probability theory, etc., as well as Machine Learning topics, e.g. linear regression, hypothesis testing, etc.
"Machine Learning flow-charts" will help you navigate the enormous number of algorithms. These will help you select what algorithm or plot you need to complete your task, based on the data that you have and the desired outcome or story you are trying to tell.
"Cheat-Sheets" are for various Python toolboxes, e.g. Numpy, MatPlotLib, SciKitLearn, Keras, etc.
"10 steps to Data Science" is a series of notebooks to teach you the most common tools used in Data Science.
"10 steps to Machine Learning" is a series of notebooks to teach you some advanced tools used in Machine Learning.
"Python in 2 days" is a series of notebooks to get you started in Juypter notebooks and Python.
"Machine Learning in 1 day" is a series of notebooks focused on a basic toolbox for Machine Learning.
"Deep Learning in 1 day" is a series of notebooks focused on advanced CNNs, RNNs, GANs and Transformers.
"Examples" is a series of notebooks that most folks will find useful for a variety of real-life applications of Data Science.

2. The DSBC Approach

The DSBC team teaches AI/ML, and more broadly Data Science, with two approaches:

2.1 "Top-Down" Approach

In the Top-Down approach, you don't need to know the math, or be a deep expert in Python. We will teach you the tools, using industry best practices and rules-of-thumb, so that you will be a solid contributing member of a Data Science team.

To get started, we recommend the following self-paced tutorials performed in order:

2.2 "Bottom-Up" Approach

In the Bottom-Up approach, we assume that you already have a solid foundation in math and/or computer science. We will teach you the algorithms, both how to manipulate and optimize them for your application. With these powerful skills, you will be a technical leader enabling the full potential of a Data Science team.

To get started, we recommend that you review the following:

3. New to Python and Jupyter Notebooks?

Have you heard about Jupyter Notebooks, but don't know how to get started? Here is a quick tutorial.

4. Additional Resources

AML links

Google Resources colab.research.google.com - Google Colab Free Jupyter Notebooks and GPUs - great for sharing and training. Spun up resources only last for 12 hours though.

https://research.google.com/seedbank/seeds - Collection of Interactive Machine Learning Examples and Open Sourced Research
https://developers.google.com/machine-learning/guides/ - ML Rules Guide for ML Projects

Research Paper Sites

http://www.arxiv-sanity.com/ - Developed by Andrej Karpathy (Tesla Director of AI) - Nice concise display of the latest research papers in ML and AI. Much better than arxiv.
https://openai.com/research/#publications - Research done by OpenAI
https://www.microsoft.com/en-us/research/research-area/artificial-intelligence/publications/ - Microsoft Research Publications

Datasets

The OG - https://archive.ics.uci.edu/ml/index.php
Kaggle - https://www.kaggle.com/datasets
Great List of datasets - https://medium.com/datadriveninvestor/the-50-best-public-datasets-for-machine-learning-d80e9f030279

Conference Videos

http://scaledml.org/2019/ - Scaled ML - can see all previous years talks. Great speakers like Andrej Karpathy, Jeff Dean, Yangqing Jia, Francois Chollet, etc.
https://www.ieee-security.org/TC/SPW2018/DLS/# - Deep Learning and Security Workshop

Training Curriculum - All Practical based fast.ai - free, and very good. machinelearningmastery.com - Jason Brownlee does a great job breaking down the latest algorithms and model implementations. His guides are excellent as well.

https://aws.amazon.com/training/learning-paths/machine-learning/
https://www.deeplearning.ai/ - Andrew Ng is amazing.
https://www.udacity.com/school-of-ai - great but the most expensive of this list.
https://www.coursera.org/learn/machine-learning also consider the deep learning specialization
https://www.dataquest.io/ - $33.25/month
https://www.datacamp.com/ - $29/month

GitHub Pages

Facebook Research - https://github.com/facebookresearch
OpenAI - https://github.com/openai
OpenAI Gym is great for reinforcement learning.
UPenn Research Lab - https://github.com/EpistasisLab
ML Cheatsheet - https://github.com/bfortuner/ml-cheatsheet
Udacity Deep Learning Nanodegree Code - https://github.com/udacity/deep-learning
Good Character Level Language Model in Tensorflow - https://github.com/sherjilozair/char-rnn-tensorflow

Peformance Benchmarks

Other links

https://www.thinkful.com/blog/what-is-data-science/

5. License

This project is licensed under the MIT License - see the LICENSE.md file for details.

6. Acknowledgments

This repo was built using material from our private industry and academic experience, as well as material borrowed from:

Kao - UCLA ECE 239AS.
Eaton - UPenn CIS 419/519.
Ungar - UPenn CIS 520.
Ng - Stanford CS 229.
Andrew Ng - Machine Learning Yearning
Robert Tibshirani and Trevor Hastie - An Introduction to Statistical Learning with Applications in R.
Jake VanderPlas - Python Data Science Handbook.
Jason Brownlee - Machine Learning Mastery.
Towards Data Science (various articles).
Medium (various articles).
Randy Olson's data analysis and machine learning projects.
Many thanks to Andreas Mueller for some of his examples in the Machine Learning section. We drew inspiration from several of his excellent examples.
Many thanks to Kaggle for the datasets.
Numerous others that we cannot remember.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
books		books
cheat_sheets		cheat_sheets
media		media
notebooks		notebooks
reading_rack		reading_rack
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI/ML Training

Tip of the Day

1. Project Overview

2. The DSBC Approach

2.1 "Top-Down" Approach

2.2 "Bottom-Up" Approach

3. New to Python and Jupyter Notebooks?

4. Additional Resources

5. License

6. Acknowledgments

About

Releases

Packages

Languages

License

strates-git/ml_training

Folders and files

Latest commit

History

Repository files navigation

AI/ML Training

Tip of the Day

1. Project Overview

2. The DSBC Approach

2.1 "Top-Down" Approach

2.2 "Bottom-Up" Approach

3. New to Python and Jupyter Notebooks?

4. Additional Resources

5. License

6. Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages