Skip to content

Latest commit

 

History

History
99 lines (71 loc) · 5.44 KB

README.md

File metadata and controls

99 lines (71 loc) · 5.44 KB

PetFinder 6000

Table of Contents

  1. Description
  2. Recommender System
  3. Cloud Architecture & MLOps
  4. File Structure
  5. Team

Description

A multi-modal recommender system hosted (was) on Amazon Web Services (AWS) that recommends users cats from Cat Welfare Society they would most likely adopt.

Recommender System

This portion outlines the insights gleaned from the dataset, methodologies employed, model performance and some sample results.

The Application

App preview The app was first designed using Amplify Studio and then deployed subsequently on AWS Amplify. We had 404 cat profiles that had to manually scraped and cropped (this was especially painful)

Back to top

Exploratory Data Analysis

Adopter attributes Attributes of the adopters that registered on the app, as boxplots

Cat attributes Attributes of the cats that were scraped from Cat Welfare Society, as boxplots

Power law at play That's the Power law at play here! You can see the sharp drop-off in interaction, making data sparse (usual RecSys shenanigans)

Back to top

Methodologies

Metrics used Other than the usual F1, NDCG, NCRR, 'Distributional coverage' and 'Serendipity' was also implemented with understanding from: https://eugeneyan.com/writing/serendipity-and-accuracy-in-recommender-systems/

Cold-start To combat cold-start issues where we could not recommend effectively to new users who have not rated any cats, we retrieved embeddings from users and searched for another existing user with high cosine similarity as the cold-start user and showed them the same recommendations until they have rated some cats.

Back to top

Results

Model performance Across the models, it can be seen that 'vanilla' CF models such as WMF, BPR worked really well. But multi-modal models such as VBPR performed relatively well for our combined HarmonicMean metric, which was expected as people likely had strong visual preferences when it comes to cats. Text models did not work as well, which can be an indication that people tend to not pay heed to description as much as visuals. Sample results A sample result for a random adopter.

Back to top

Cloud Architecture & MLOps

This portion outlines the cloud architecture and pipelines that was deployed on AWS.

Back to top

ML Lifecycle and pipelines (Zoomed out)

General architecture General architecture of the ML Lifecycle

Pipelines overview Overview of data pipelines

Back to top

Pipelines (Granular)

Data collection & preparation Data collection & Preparation

Model training Model training

Rank generation Rank generation: This is where the magic happens.

Back to top

File structure

Folder Details
model Notebooks for model training, hyperparameter tuning, evaluation and inference. Includes Dockerfiles for custom training and inference images.
pre-processing Notebooks for data pre-processing
pipelines Scripts and notebooks for creating processing, training and deployment pipelines
process_new_user Scripts and notebook for creating lambda function to pull generated rankings from S3

Back to top

Team

This project was done with my teammates, Ruo Xi, Shu Xian, Jun Yi and Adrian in fulfilment of our MITB Programme (Artificial Intelligence), and I could never have done it without them! Notable libraries used were: Cornac, recommenders

Back to top