Final Project of ML Techniques

Problem Description

In this project, we are going to play with data from Book-Crossing dataset, which contains information of many user IDs, book descriptions and book ratings (integer value from 1 to 10). Instructor also provides the implicit book rating data (value 0) and hope you can make the best use of them if possible. Your goal is to predict the book rating, and you should notice that different tracks have different criteria. Besides the original Book-Crossing dataset, we additionally provide the external data crawled by TAs, which are the descriptions of some books (not all the books). Per course policy, you are NOT allowed to use any other data.

Data Description

There are several files which contain all the information about the task:

user.csv: The user ids and corresponding demographic data
- User-ID: user IDs which have been anonymized
- Location: demographic data (may contain NULL-value)
- Age: demographic data (may contain NULL-value)
book ratings train/test.csv: Containing the book rating information
- User_ID: user IDs which have been anoymized
- ISBN: International Standard Book Number which you can find some description through this (just like the Book ID)
- Book-Rating: the book ratings range from 1 to 10 (Note that the test data would not have this value)
implicit ratings.csv: Containing the book rating information
- User_ID: user IDs which have been anoymized
- ISBN: International Standard Book Number which you can find some description through this (just like the Book ID)
- Book-Rating: all the book ratings are implicit, that is 0
books.csv: Containing content information about books
- ISBN: International Standard Book Number
- Book-Title: content-based information
- Book-Author: content-based information
- Year-Of-Publication: content-based information – Publisher: content-based information
- Image-URL-S: URLs linking to cover images (small size)
- Image-URL-M: URLs linking to cover images (medium size) – Image-URL-L: URLs linking to cover images (large size)
- Book-Description: TA-crawled descriptions of books
subimssion.csv: Our book-rating predictions for testing samples
- Book-Rating: the predicted book rating

Evaluation

The submission website has two tracks of competition, each evaluated with a different goodness measure

Track 1: Mean Absolute Error(MAE)
Track 2: Mean Absolute Percentage Error(MAPE)

Model

Decision Tree
Random Forest
Adative Boosting

Submission

Code: Final Project- Book Rating Prediction0626.ipynb
file: submission.csv

Result

Track1: Rank 5/21
Track2: Rank 8/21

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Final Project- Book Rating Prediction0626.ipynb		Final Project- Book Rating Prediction0626.ipynb
Machine-Learning-Techniques-final-report.pdf		Machine-Learning-Techniques-final-report.pdf
README.md		README.md
book_ratings_test.csv		book_ratings_test.csv
book_ratings_train.csv		book_ratings_train.csv
books.csv		books.csv
books_final.csv		books_final.csv
books_new.csv		books_new.csv
project.pdf		project.pdf
submission.csv		submission.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project of ML Techniques

Problem Description

Data Description

Evaluation

Model

Submission

Result

For other details, please refer to the file project.pdf

About

Releases

Packages

Languages

pcchencode/Final-Project-of-ML-Techniques

Folders and files

Latest commit

History

Repository files navigation

Final Project of ML Techniques

Problem Description

Data Description

Evaluation

Model

Submission

Result

For other details, please refer to the file project.pdf

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages