By Brad Boehmke 🚀
This repository contains additional resources for the UC BANA 4080 Data Mining course. The following is a truncated syllabus; for the full syllabus along with complete course content please visit the online course content in Canvas.
Welcome to Data Mining! This course provides an intensive, hands-on introduction to data mining and analysis techniques. You will learn the fundamental skills required to extract informative attributes, relationships, and patterns from datasets. You will gain hands-on experience with exploratory data analysis, data visualization, unsupervised learning techniques such as clustering and dimension reduction, and supervised learning techniques such as linear regression, regularized regression, decision trees, random forests, and more! You will also be exposed to some more advanced topics such as ensembling techniques, deep learning, model stacking, and model interpretation. Together, this will provide you with a solid foundation of tools and techniques applied in organizations to aid modern day data-driven decision making.
Upon successfully completing this course, you will be able to:
- Apply data wrangling techniques to manipulate and prepare data for analysis.
- Use exploratory data analysis and visualization to provide descriptive insights of data.
- Apply common unsupervised learning algorithms to find common groupings of observations and features in a given dataset.
- Describe and apply a sound analytic modeling process.
- Apply, compare, and contrast various predictive modeling techniques.
- Have the resources and understanding to continue advancing your data mining and analysis capabilities.
This course is split into two main sections - Data Wrangling and Machine Learning. The data wrangling section will provide you the fundamental skills required to acquire, munge, transform, manipulate, and visualize data in a computing environment that fosters reproducibility. The primary course material for this section is provided via this free online book.
The second section focused on machine learning section will expose you to several algorithms to identify hidden patterns and relationships within your data. The primary course material for this part of the course is provided via this free online book. There will also be recorded lectures and additional supplementary resources provided via Canvas.
Module | Description |
---|---|
DATA WRANGLING | |
1 | Introduction |
R fundamentals & the Rstudio IDE | |
Deeper understanding of vectors | |
2 | Reproducible Documents and Importing Data |
Managing your workflow and reproducibility | |
Data structures & importing data | |
3 | Tidy Data and Data Manipulation |
Data manipulation & summarization | |
Tidy data | |
4 | Relational Data and More Tidyverse Packages |
Relational data | |
Leveraging the Tidyverse to text & date-time data | |
5 | Data Visualization & Exploration |
Data visualization | |
Exploratory data analysis | |
6 | Creating Efficient Code in R |
Control statements & iteration | |
Writing functions | |
7 | Mid-term Project |
MACHINE LEARNING | |
8 | Introduction to Applied Modeling |
Introduction to machine learning | |
First model with Tidymodels | |
9 | First Regression Models |
Simple linear regression | |
Multiple linear regression | |
10 | More Modeling Processes |
Feature engineering | |
Resampling | |
11 | Classification & Regularization |
Logistic regression | |
Regularized regression | |
12 | Hyperparameter Tuning & Non-linearity |
Hyperparameter tuning | |
Multivariate adaptive regression splines | |
13 | Tree-based Models |
Decision trees | |
Bagging | |
Random forests | |
14 | Unsupervised learning |
Clustering | |
Dimension reduction | |
15 | Final Project |
This course is split into two main sections - Data Wrangling and Machine Learning. The data wrangling section will provide you the fundamental skills required to acquire, munge, transform, manipulate, and visualize data in a computing environment that fosters reproducibility. The primary course material for this section is provided via this Bookdown resource 📕. The second section focused on machine learning section will expose you to several algorithms to identify hidden patterns and relationships within your data. The primary course material for this part of the course is provided via this Bookdown resource 📕.