Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add product recommendation for automl tables notebook #2257

Merged
merged 4 commits into from
Sep 18, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions tables/automl/notebooks/music_recommendation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Product Recommendation with AutoML Tables
[AutoML Tables](https://cloud.google.com/automl-tables/) is a service for automating data proprocessing, model selection and training, and prediction for structured data. This tutorial demonstrates how AutoML Tables can be used to create product recommendations for users given a history of past user-product interactions.

## Problem
For online retailers, one key problem to solve is how to get the right products in front of customers to lead to a conversion. Often, these retailers will have huge product catalogs and a diverse pool of users. Additionally, it's typical for there to be plenty of noisy implicit feedback, and comparitively little explicit feedback. For example, in this notebook we will demonstrate how recommendations can be made to thousands of users from a catalog containing millions of songs. Although there is no information about users explicitly liking songs, the dataset does log every time a user listens to a song.

## Approach
A very common approach to solving product recommendation problems is to use matrix factorization (MF) as seen [in this solution](https://cloud.google.com/solutions/machine-learning/recommendation-system-tensorflow-overview). At a high level, MF is generally accomplished by creating a user-by-item matrix where each value is some sort of signal for similarity, such as a rating or view count, between the user and item if the pairing exists in the dataset. Depending on the approach, a number of matrices are then learned such that their product has similar values to the original matrix where pairs exist, and the values of unseen user-item pairs can be interpretted as predicted similarity scores. Although MF as it has been described cannot be done using AutoML tables, there is [literature](https://arxiv.org/abs/1708.05031) that argues that an equivalent does exist for deep learning. Better yet, this deep learning approach allows user and item features to be included in model training!

In this notebook, we use AutoML Tables to train a binary classification model that takes user features and item features from a `(user, item)` pair as input, and outputs a predicted label and similarity score. The label for a sample is 1 if a user has listened to the song more than twice. Once this model is trained, we show how it can be used to construct a lookup table for user-item similarity by predicting a score for every possible pair, and how this table can be used to make recommendations for a user.

### Alternative Approaches
As the number of `(user, item)` pairs grows exponentially with the number of unique users and items, this lookup table approach may not be optimal for extremely large datasets. One workaround would be to train a model that learns to embed users and songs in the same embedding space, and use a nearest-neighbors algorithm to get recommendations for users. Unfortunately, AutoML Tables does not expose any feature for training and using embeddings, so a [custom ML model](https://github.com/GoogleCloudPlatform/professional-services/tree/master/examples/cloudml-collaborative-filtering) would need to be used instead.

Another recommendation approach that is worth mentioning is [using extreme multiclass classification](https://ai.google/research/pubs/pub45530), as that also circumvents storing every possible pair of users and songs. Unfortunately, AutoML Tables does not support the multiclass classification of more than [100 classes](https://cloud.google.com/automl-tables/docs/prepare#target-requirements).

Loading