CSX460

This repository contains materials for Practical Machine Learning with R (CSX460) at the University of California, Berkeley. The most recent class is/was Spring 2015.

Course Description

This course provides an introduction to machine learning using R, the open source, statistical programming language. Once a niche set of tools for statisticians, programmers and quants, machine learning (sometimes also called data mining or statistical learning) has spread in popularity to a wide variety of applications and disciplines. This course teaches the fundamentals of machine learning without delving into the theory. The course will teach practical aspects of machine learning so that the students will be able to apply lessons to solve problems using machine learning in their own fields.

Course Learning Objectives

Students of this class will learn:

Fundamental concepts in ML
The differnece between supervised, unsupervised, semi-supervised, adaptive/reinforcement learning
The three prerequisites of ML algorithms/models
- Loss function
- Restricted class of functions
- Search methodology for training
How to evaluate and compare ML model performance
How to pre-process data and build features
How to train ML models for prediction, categorization and recommendations
How to apply ML models on new data
How to use resampling techniques to calculate model performance
What the bootstrap is and how it works?
What Bagging is and how and why it improves model performance
What Boosting is and how and why it improves model performance
How to implement/deploy ML models for use by a wider audience
How to frame questions to be answered using ML techniques
Collaborate in a group using tools for collaborative/social programming
Generate high quality, graphical and textual results

Intended Audience

Anyone who wishes to learn the fundamentals of machine learning
Anyone who wants to learn about using R to build, evaluate or deploy machine learning models.
Scientists, engineers, business analysts, research who explore and analyze data and wish to present their findings in well-formatted textual and graphical forms. Anyone wishing to get hands-on experience building machine learning models.

Prerequisites

Experience programming in at least one high-level programming language such as BASIC, PASCAL, C, Java, Python, Perl, or Ruby.
Familiarity with R such as that gained through the Programming with R course.
Basic knowledge of statistics as covered in a first-semester undergraduate statistics course. There will be some coverage of basic statistical techniques as part of covering core elements of the Machine Learning.
Personal laptop for completing in class assignments.

Text/Required Reading

Reading Requirements for the Course

**Applied Predicative Modeling**  
ISBN-13: 978-1461468486 ISBN-10: 1461468485 
Kuhn, Max and Johnson, Kjell
Springer Science+Business
2013

Google Group

There is an google group for this class: CSX460

Class Syllabus

Current Term: Spring 2016

This provides a session by session overview of CS-X460 (Practical Machine Learning).

1. Introduction to R, setting up the ML developers environment

Welcome
Class Book, Materials, etc.
Setting up your environment
- Installing R/R Studio
- Installing git and using Github
Installing packages from CRAN and Github
Overview of Maching Learning

Reading:

Chapters 1-2 of Applied Predictive Modeling

Exercise(s):

Finish in-class exercises

2. Fundamentals of Machine Learning / Linear Regression

Building First Models
Supervised, unsupervised, and semi-supervised
Regression and classification
Measuring model error(s)
Machine learning prerequisites
Algorithm types
Data processing

Reading:

APM Chapters 3.2-3.7, skim 3.8, Chapters 6.2 and 6.3
Optional:
- Introduction to dplyr
- Introduction to data.table

R Packages:

General awesomeness: magrittr
Reading data: readr, data.table::fread
- From the web: httr, rvest
Changing data orientation: tidyr
Data Manipulation: dplyr, data.table

Exercise(s):

Finish in-class exercises

3. Linear Regression / Logistic Regression

Reading;

APM Chapter 4 "Over Fitting and Model Tuning"
APM Chapter 12.2 "Logisitic Regression""

Exercise(s):

Finish in-class exercises

4. Resammpling Techniques Binomial Classification Metrics

READING:

APM Chapter 5 "Measuring Performance in Regression Models" (esp. 5.2 "The Variance Bias Trade-Off") (5 pages)
APM Chapter 11 "Measuring Performance in Classification Models" (~20 pages)
AMM Chapter 7.4 "K-Nearest Neighbors" (regression) (2 pages)
APM Chapter 13.5 "K-Nearest Neighbors" (classification) (3 pages)

Exercise(s):

Finish in-class exercise(s)

5. Advanced Techniques: Partitioning Methods

K Nearest Neighbors
Decision Trees/Recursive Partitioning

Reading:

APM Chapter 8.1-8.5 "Regression Trees and Rule-Based Models" (25 pages)
APM Chapter 14.1-14.5 "Classification Trees and Rule-Based"

Exercise(s):

Finish in class exercises.

6. Advanced Techniques

Bagging
Bagged Trees / Random Forests
Exercises

Reading:

Exercise(s):

7. Advanced Techniques: Boosting

Boosting
Neural Networks
Support Vector Machines
Exercises

Reading:

Exercise(s):

8. Deployment

Diving into the data lake
Optimization
Delivery and Production

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
01-introduction		01-introduction
02-ml-fundamentals		02-ml-fundamentals
03-logistic-regression		03-logistic-regression
04-resampling		04-resampling
05-decision-trees		05-decision-trees
06-model-improvements		06-model-improvements
data		data
.gitignore		.gitignore
CSX460.Rproj		CSX460.Rproj
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSX460

Course Description

Course Learning Objectives

Intended Audience

Prerequisites

Text/Required Reading

Google Group

Class Syllabus

1. Introduction to R, setting up the ML developers environment

2. Fundamentals of Machine Learning / Linear Regression

3. Linear Regression / Logistic Regression

4. Resammpling Techniques Binomial Classification Metrics

READING:

5. Advanced Techniques: Partitioning Methods

6. Advanced Techniques

7. Advanced Techniques: Boosting

8. Deployment

About

Releases

Packages

Languages

srjames90/CSX460

Folders and files

Latest commit

History

Repository files navigation

CSX460

Course Description

Course Learning Objectives

Intended Audience

Prerequisites

Text/Required Reading

Google Group

Class Syllabus

1. Introduction to R, setting up the ML developers environment

2. Fundamentals of Machine Learning / Linear Regression

3. Linear Regression / Logistic Regression

4. Resammpling Techniques Binomial Classification Metrics

READING:

5. Advanced Techniques: Partitioning Methods

6. Advanced Techniques

7. Advanced Techniques: Boosting

8. Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages