Foundations and Applications of Data Mining

Created by Min Zhang for course assignments of INF553 from University of Southern California.

Introduction

Data mining is a foundational piece of the data analytics skill set. At a high level, it allows the analyst to discover patterns in data, and transform it into a usable product. In this repository, we release code for utilizing data mining and machine learning algorithms to analyze very large real world data sets in this course, such as Yelp dataset, real time Twitter data. Emphasis on Map Reduce.

Installation

The code has been tested with Python 2.7 or 3.5. You may need to have java 8.0 and install pyspark package.

To check your java version:

jave --version

To install pyspark for Python:

sudo pip install pyspark

Data

The data is collected/created by instructor of the course INF553.

Content

Topic	Algorithm	Content
Data Explore	-	Preprocess and explore Yelp dataset to be familiar with Spark
Frequent Itemset	Aprior, SON	Find frequent itemset in Yelp dataset
Recommendation System	Min-Hash, LSH	Build item-based, user-based collaborative filtering and content-based recommendation systems
Community Detection	Girvan-Newman	Detect communities in graph
Clustering	K-Means, BFR	Cluster dataset with various distance measurements
Streaming	Bloom Filtering, Flajolet-Martin, Reservoir Sampling	Filtering, counting and sampling streaming data such as Twitter stream data

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clustering		clustering
community_detection		community_detection
data_explore		data_explore
frequent_itemset		frequent_itemset
recommendation_system		recommendation_system
streaming		streaming
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Foundations and Applications of Data Mining

Introduction

Installation

Data

Content

About

Releases

Packages

Languages

minzhang-1/DataMining

Folders and files

Latest commit

History

Repository files navigation

Foundations and Applications of Data Mining

Introduction

Installation

Data

Content

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages