Skip to content

bradleyboehmke/uc-bana-4080

Repository files navigation

UC BANA 4080 Data Mining

By Brad Boehmke 🚀

This repository contains additional resources for the UC BANA 4080 Data Mining course. The following is a truncated syllabus; for the full syllabus along with complete course content please visit the online course content in Canvas.

Course Description

Welcome to Data Mining! This course provides an intensive, hands-on introduction to data mining and analysis techniques. You will learn the fundamental skills required to extract informative attributes, relationships, and patterns from datasets. You will gain hands-on experience with exploratory data analysis, data visualization, unsupervised learning techniques such as clustering and dimension reduction, and supervised learning techniques such as linear regression, regularized regression, decision trees, random forests, and more! You will also be exposed to some more advanced topics such as ensembling techniques, deep learning, model stacking, and model interpretation. Together, this will provide you with a solid foundation of tools and techniques applied in organizations to aid modern day data-driven decision making.

Learning Objectives

Upon successfully completing this course, you will be able to:

  • Apply data wrangling techniques to manipulate and prepare data for analysis.
  • Use exploratory data analysis and visualization to provide descriptive insights of data.
  • Apply common unsupervised learning algorithms to find common groupings of observations and features in a given dataset.
  • Describe and apply a sound analytic modeling process.
  • Apply, compare, and contrast various predictive modeling techniques.
  • Have the resources and understanding to continue advancing your data mining and analysis capabilities.

Material

This course is split into two main sections - Data Wrangling and Machine Learning. The data wrangling section will provide you the fundamental skills required to acquire, munge, transform, manipulate, and visualize data in a computing environment that fosters reproducibility. The primary course material for this section is provided via this free online book.

The second section focused on machine learning section will expose you to several algorithms to identify hidden patterns and relationships within your data. The primary course material for this part of the course is provided via this free online book. There will also be recorded lectures and additional supplementary resources provided via Canvas.

Content Covered

Module Description
DATA WRANGLING
1 Introduction
R fundamentals & the Rstudio IDE
Deeper understanding of vectors
2 Reproducible Documents and Importing Data
Managing your workflow and reproducibility
Data structures & importing data
3 Tidy Data and Data Manipulation
Data manipulation & summarization
Tidy data
4 Relational Data and More Tidyverse Packages
Relational data
Leveraging the Tidyverse to text & date-time data
5 Data Visualization & Exploration
Data visualization
Exploratory data analysis
6 Creating Efficient Code in R
Control statements & iteration
Writing functions
7 Mid-term Project
MACHINE LEARNING
8 Introduction to Applied Modeling
Introduction to machine learning
First model with Tidymodels
9 First Regression Models
Simple linear regression
Multiple linear regression
10 More Modeling Processes
Feature engineering
Resampling
11 Classification & Regularization
Logistic regression
Regularized regression
12 Hyperparameter Tuning & Non-linearity
Hyperparameter tuning
Multivariate adaptive regression splines
13 Tree-based Models
Decision trees
Bagging
Random forests
14 Unsupervised learning
Clustering
Dimension reduction
15 Final Project

Getting Started

This course is split into two main sections - Data Wrangling and Machine Learning. The data wrangling section will provide you the fundamental skills required to acquire, munge, transform, manipulate, and visualize data in a computing environment that fosters reproducibility. The primary course material for this section is provided via this Bookdown resource 📕. The second section focused on machine learning section will expose you to several algorithms to identify hidden patterns and relationships within your data. The primary course material for this part of the course is provided via this Bookdown resource 📕.

About

Additional resources for the UC BANA 4080 Data Mining course

Resources

License

Stars

Watchers

Forks