Welcome! This repository showcases the Teamu Recommendation System, designed to deliver highly personalized content suggestions on a social productivity platform. Built with a Two-Tower + DLRM (Deep Learning Recommendation Model) approach, this system combines efficient candidate generation and advanced ranking to optimize user engagement.
- Project Overview
- System Objectives
- System Architecture
- Features and Design Decisions
- Tools and Technologies
- Data Pipeline
- Model Training and Evaluation
- Deployment
- Future Enhancements
The Teamu Recommendation System powers personalized content suggestions on a social productivity app, enabling efficient and scalable recommendations across millions of users. Built on Google Cloud Platform, this system leverages Vertex AI, BigQuery, and GCS to provide rapid candidate retrieval and accurate ranking. With a dual model approach, Teamu is designed to maximize engagement, improve content relevance, and enable effective scaling.
- Maximize Personalized Content Delivery: Provide tailored recommendations to users based on their interests, activity history, and preferences.
- Generate AI-Driven Post Ideas: Leverage embeddings for AI-generated content ideas, providing users with high-ranking post suggestions that align with community needs.
- Leverage AI for Deep Learning Recommendations: Match user and post embeddings to deliver relevant, engaging content across various interaction types.
- Ensure System Scalability: Scale efficiently to handle new users and posts, even in cold-start scenarios.
The Two-Tower model handles candidate generation, creating separate embeddings for users and posts:
- User Tower: Encodes user data like passions, project titles, and bio.
- Post Tower: Encodes post data, excluding comments, focusing on titles and descriptions.
- Similarity Measure: Utilizes dot product to compute similarity scores between user and post embeddings.
- Loss Function: Optimizes with softmax loss for relevant candidate generation.
The DLRM ranks candidates generated by the Two-Tower model using both sparse and dense features:
- Sparse Features: Metrics like view counts, login frequency, and CTR.
- Dense Features: Embeddings from the Two-Tower model, providing in-depth ranking signals.
- Wide and Deep Learning: Uses linear models for low-order features and deep learning for high-order interactions, creating a balance between model complexity and interpretability.
- Scalable Deployment: Supports high-volume traffic with Vertex AI pipelines and GCS for storage, alongside BigQuery for fast feature retrieval.
- Real-Time Data Ingestion: Utilizes Pub/Sub to monitor data intake from Supabase pg_cron-scheduled data deliveries, enabling efficient feature updates.
- Efficient Storage and Retrieval: Embeddings are stored as 32-dimensional vectors in GCS (
vector_bucket/user_embeddings
andvector_bucket/post_embeddings
). - Cold-Start Handling: Ensures recommendations are available for new users and posts, maintaining relevancy with minimal historical data.
- Languages: Python, SQL
- Libraries: TensorFlow, TF Recommenders, Pandas, NumPy, Transformers, and others
- Databases: Supabase (PostgreSQL), Google Cloud Storage (TFRecords, embedding storage), BigQuery (for scalable feature engineering)
- Cloud Services: Vertex AI (pipelines, training, deployment), Pub/Sub, Kubeflow, pg_cron
A scalable data pipeline powers the recommendation system, managing high-volume data to keep recommendations relevant and personalized.
- Data Ingestion: Interaction data (e.g., views, votes, comments) flows into BigQuery.
- Feature Engineering:
- Sparse Features: Interaction counts and login frequencies.
- Dense Features: Embeddings generated by the Two-Tower model, along with additional metrics for ranking.
- Storage: Embeddings and interaction data are stored in GCS and BigQuery for rapid retrieval.
- Objective: Optimizes for softmax loss, enhancing relevance through dot-product similarity.
- Training Framework: TensorFlow
- Evaluation Metric: Measures candidate relevance through recall metrics.
- Objective: Minimizes cross-entropy loss to maximize ranking accuracy.
- Features: Uses both sparse (interaction counts) and dense (embeddings) features.
- Evaluation Metrics: Tracks AUC and CTR to evaluate ranking effectiveness on interaction data.
The system is validated with offline recall and AUC metrics, alongside plans for A/B testing to measure real-world impact.
The model is deployed via Vertex AI, served through a REST API with both batch and real-time recommendations. TensorFlow Serving is used to manage models, and a pipeline orchestrates real-time data ingestion.
- Model Artifacts: Stored in
model_bucket
on GCS for streamlined deployment and version management. - Scalability: Managed with Kubernetes for auto-scaling and load balancing, adapting to changing traffic demands.
- Cloud Composer (Airflow): For advanced DAG orchestration, automating ETL processes.
- Dataflow (Apache Beam): To handle massive, real-time, streaming data pipelines.
- Vertex AI Feature Store: Centralized feature management across training, testing, and serving environments.
- Mini-Batch Clustering and ScaNN: Enhanced candidate retrieval for massive datasets through clustering for diversifying content, and ScaNN for candidate reduction, before DLRM ranking.