Skip to content

kmalik9/Teamu_RecSys

Repository files navigation

Teamu Recommendation System: Two-Tower + DLRM

Welcome! This repository showcases the Teamu Recommendation System, designed to deliver highly personalized content suggestions on a social productivity platform. Built with a Two-Tower + DLRM (Deep Learning Recommendation Model) approach, this system combines efficient candidate generation and advanced ranking to optimize user engagement.


Table of Contents


Project Overview

The Teamu Recommendation System powers personalized content suggestions on a social productivity app, enabling efficient and scalable recommendations across millions of users. Built on Google Cloud Platform, this system leverages Vertex AI, BigQuery, and GCS to provide rapid candidate retrieval and accurate ranking. With a dual model approach, Teamu is designed to maximize engagement, improve content relevance, and enable effective scaling.

System Objectives

  1. Maximize Personalized Content Delivery: Provide tailored recommendations to users based on their interests, activity history, and preferences.
  2. Generate AI-Driven Post Ideas: Leverage embeddings for AI-generated content ideas, providing users with high-ranking post suggestions that align with community needs.
  3. Leverage AI for Deep Learning Recommendations: Match user and post embeddings to deliver relevant, engaging content across various interaction types.
  4. Ensure System Scalability: Scale efficiently to handle new users and posts, even in cold-start scenarios.

System Architecture

Cloud Architecture Diagram

image|500

Model Architecture Diagram

image|500

Two-Tower Model

The Two-Tower model handles candidate generation, creating separate embeddings for users and posts:

  • User Tower: Encodes user data like passions, project titles, and bio.
  • Post Tower: Encodes post data, excluding comments, focusing on titles and descriptions.
  • Similarity Measure: Utilizes dot product to compute similarity scores between user and post embeddings.
  • Loss Function: Optimizes with softmax loss for relevant candidate generation.

DLRM (Deep Learning Recommendation Model)

The DLRM ranks candidates generated by the Two-Tower model using both sparse and dense features:

  • Sparse Features: Metrics like view counts, login frequency, and CTR.
  • Dense Features: Embeddings from the Two-Tower model, providing in-depth ranking signals.
  • Wide and Deep Learning: Uses linear models for low-order features and deep learning for high-order interactions, creating a balance between model complexity and interpretability.

Features and Design Decisions

  • Scalable Deployment: Supports high-volume traffic with Vertex AI pipelines and GCS for storage, alongside BigQuery for fast feature retrieval.
  • Real-Time Data Ingestion: Utilizes Pub/Sub to monitor data intake from Supabase pg_cron-scheduled data deliveries, enabling efficient feature updates.
  • Efficient Storage and Retrieval: Embeddings are stored as 32-dimensional vectors in GCS (vector_bucket/user_embeddings and vector_bucket/post_embeddings).
  • Cold-Start Handling: Ensures recommendations are available for new users and posts, maintaining relevancy with minimal historical data.

Tools and Technologies

  • Languages: Python, SQL
  • Libraries: TensorFlow, TF Recommenders, Pandas, NumPy, Transformers, and others
  • Databases: Supabase (PostgreSQL), Google Cloud Storage (TFRecords, embedding storage), BigQuery (for scalable feature engineering)
  • Cloud Services: Vertex AI (pipelines, training, deployment), Pub/Sub, Kubeflow, pg_cron

Data Pipeline

Overview

A scalable data pipeline powers the recommendation system, managing high-volume data to keep recommendations relevant and personalized.

Key Steps

  1. Data Ingestion: Interaction data (e.g., views, votes, comments) flows into BigQuery.
  2. Feature Engineering:
    • Sparse Features: Interaction counts and login frequencies.
    • Dense Features: Embeddings generated by the Two-Tower model, along with additional metrics for ranking.
  3. Storage: Embeddings and interaction data are stored in GCS and BigQuery for rapid retrieval.

Model Training and Evaluation

Two-Tower Model

  • Objective: Optimizes for softmax loss, enhancing relevance through dot-product similarity.
  • Training Framework: TensorFlow
  • Evaluation Metric: Measures candidate relevance through recall metrics.

DLRM

  • Objective: Minimizes cross-entropy loss to maximize ranking accuracy.
  • Features: Uses both sparse (interaction counts) and dense (embeddings) features.
  • Evaluation Metrics: Tracks AUC and CTR to evaluate ranking effectiveness on interaction data.

Offline and Online Testing

The system is validated with offline recall and AUC metrics, alongside plans for A/B testing to measure real-world impact.

Deployment

The model is deployed via Vertex AI, served through a REST API with both batch and real-time recommendations. TensorFlow Serving is used to manage models, and a pipeline orchestrates real-time data ingestion.

  • Model Artifacts: Stored in model_bucket on GCS for streamlined deployment and version management.
  • Scalability: Managed with Kubernetes for auto-scaling and load balancing, adapting to changing traffic demands.

Future Enhancements

  1. Cloud Composer (Airflow): For advanced DAG orchestration, automating ETL processes.
  2. Dataflow (Apache Beam): To handle massive, real-time, streaming data pipelines.
  3. Vertex AI Feature Store: Centralized feature management across training, testing, and serving environments.
  4. Mini-Batch Clustering and ScaNN: Enhanced candidate retrieval for massive datasets through clustering for diversifying content, and ScaNN for candidate reduction, before DLRM ranking.

Releases

No releases published

Packages

No packages published