A multi-modal recommender system hosted (was) on Amazon Web Services (AWS) that recommends users cats from Cat Welfare Society they would most likely adopt.
This portion outlines the insights gleaned from the dataset, methodologies employed, model performance and some sample results.
The app was first designed using Amplify Studio and then deployed subsequently on AWS Amplify. We had 404 cat profiles that had to manually scraped and cropped (this was especially painful)
Attributes of the adopters that registered on the app, as boxplots
Attributes of the cats that were scraped from Cat Welfare Society, as boxplots
That's the Power law at play here! You can see the sharp drop-off in interaction, making data sparse (usual RecSys shenanigans)
Other than the usual F1, NDCG, NCRR, 'Distributional coverage' and 'Serendipity' was also implemented with understanding from: https://eugeneyan.com/writing/serendipity-and-accuracy-in-recommender-systems/
To combat cold-start issues where we could not recommend effectively to new users who have not rated any cats, we retrieved embeddings from users and searched for another existing user with high cosine similarity as the cold-start user and showed them the same recommendations until they have rated some cats.
Across the models, it can be seen that 'vanilla' CF models such as WMF, BPR worked really well. But multi-modal models such as VBPR performed relatively well for our combined HarmonicMean metric, which was expected as people likely had strong visual preferences when it comes to cats. Text models did not work as well, which can be an indication that people tend to not pay heed to description as much as visuals. A sample result for a random adopter.
This portion outlines the cloud architecture and pipelines that was deployed on AWS.
General architecture of the ML Lifecycle
Rank generation: This is where the magic happens.
Folder | Details |
---|---|
model | Notebooks for model training, hyperparameter tuning, evaluation and inference. Includes Dockerfiles for custom training and inference images. |
pre-processing | Notebooks for data pre-processing |
pipelines | Scripts and notebooks for creating processing, training and deployment pipelines |
process_new_user | Scripts and notebook for creating lambda function to pull generated rankings from S3 |
This project was done with my teammates, Ruo Xi, Shu Xian, Jun Yi and Adrian in fulfilment of our MITB Programme (Artificial Intelligence), and I could never have done it without them! Notable libraries used were: Cornac, recommenders