🚀 Installation and Usage Guide

This guide will help you set up and run a machine learning pipeline that includes feature engineering, model training, and deployment using Hopsworks and OpenAI.

📑 Table of Contents

📋 Prerequisites
🎯 Getting Started
⚡️ Running the H&M Personalized Recommender
🤖 Running the ML Pipelines in GitHub Actions
🌐 Live Demo
☁️ Deploying the Streamlit App

📋 Prerequisites

Local Tools

You'll need the following tools installed locally:

Tool	Version	Purpose	Installation Link
Python	3.11	Programming language runtime	Download
uv	≥ 0.4.30	Python package installer and virtual environment manager	Download
GNU Make	≥ 3.81	Build automation tool	Download
Git	≥2.44.0	Version control	Download

Cloud Services

The project requires access to these cloud services:

Service	Purpose	Cost	Required Credentials	Setup Guide
Hopsworks	AI Lakehouse for feature store, model registry, and serving	Free tier available	`HOPSWORKS_API_KEY`	Create API Key
GitHub Actions	Compute & Automation	Free for public repos	-	-
OpenAI API	LLM API for recommender system	Pay-per-use	`OPENAI_API_KEY`	Quick Start Guide

🎯 Getting Started

1. Clone the Repository

Start by cloning the repository and navigating to the project directory:

git clone https://github.com/decodingml/personalized-recommender-course.git
cd personalized-recommender-course

Next, we have to prepare your Python environment and its adjacent dependencies.

2. Installation

Set up the project environment by running the following:

make install

Test that you have Python 3.11.8 installed in your new uv environment:

uv run python --version
# Output: Python 3.11.8

This command will:

Create a virtual environment using uv
Activate the virtual environment
Install all dependencies from pyproject.toml

Note

Normally, uv will pick the right Python version mentioned in .python-version and install it automatically if it is not on your system. If you are having any issues, explicitly install the right Python version by running make install-python

3. Environment Configuration

Before running any components:

Create your environment file:
```
cp .env.example .env
```
Open .env and configure the required credentials following the inline comments and the recommendations from the Cloud Services section.

⚡️ Running the H&M Personalized Recommender

Notebooks

For instructions on exploring the Notebooks, check out the 📚 Course section from the main README.

Running the ML Pipelines

You can run the entire pipeline at once or execute individual components.

Running Everything in One Go (Quick)

Execute all the ML pipelines in a sequence:

make all

It will take ~1.5 hours to run, depending on your machine.

This runs the following steps:

Feature engineering
Retrieval model training
Ranking model training
Candidate embeddings creation
Inference pipeline deployment
Materialization job scheduling

View results in Hopsworks Serverless: Data Science → Deployments

Start the Streamlit UI:

make start-ui

Accessible at http://localhost:8501/

Running Individual Components (Recommended)

Each component can be run separately:

Feature Engineering

make feature-engineering

It will take ~1 hour to run, depending on your machine.

View results in Hopsworks Serverless: Feature Store → Feature Groups

Retrieval Model Training

make train-retrieval

View results in Hopsworks Serverless: Data Science → Model Registry

Ranking Model Training

make train-ranking

View results in Hopsworks Serverless: Data Science → Model Registry

Embeddings Creation

make create-embeddings

View results in Hopsworks Serverless: Feature Store → Feature Groups

Deployment Creation

make create-deployments

View results in Hopsworks Serverless: Data Science → Deployments

Start the Streamlit UI:

make start-ui

Accessible at http://localhost:8501/

Important

The demo is in 0-cost mode, which means that when there is no traffic, the deployment scales to 0 instances. The first time you interact with it, give it 1-2 minutes to warm up to 1+ instances. Afterward, everything will become smoother.

Materialization Job Scheduling

make schedule-materialization-jobs

View results in Hopsworks Serverless: Compute → Ingestions

Deployment Creation with LLM Ranking (Optional)

Optional step to replace the standard deployments (created in Step 5) with the ones powered by LLMs:

make create-deployments-llm-ranking

NOTE: If the script fails, go to Hopsworks Serverless: Data Science → Deployments, forcefully stop all the deployments and run again.

Warning

The LLM Ranking deployment overrides the deployment from 5. Deployment Creation

Start the Streamlit UI that interfaces the LLM deployment:

make start-ui-llm-ranking

Accessible at http://localhost:8501/

Warning

The Streamlit UI command is compatible only with its corresponding deployment. For example, running the deployment from 5. Deployment Creation and make start-ui-llm-ranking won't work.

Clean Up Resources

Remove all created resources from Hopsworks Serverless:

make clean-hopsworks-resources

🚨 Important Notes

Ensure UV is properly installed and configured before running any commands
All notebooks are executed using IPython through the UV virtual environment
Components should be run in the specified order when executing individually

🤖 Running the ML Pipelines in GitHub Actions

This project supports running ML pipelines automatically through GitHub Actions, providing an alternative to local or Colab execution.

Note

This is handy when getting network errors, such as timeouts, on your local machine. GitHub Actions has an enterprise-level network that will run your ML pipelines smoothly.

Pipeline Triggers

The ML pipelines can be triggered in three ways:

Manual trigger through GitHub UI
Scheduled execution (configurable)
On push to main branch (configurable)

Setup Process

1. Fork Repository

Create your own copy of the repository to access GitHub Actions:

# Use GitHub's UI to fork the repository
https://github.com/original-repo/name → Your-Username/name

📚 GitHub Fork Guide

2. Configure Secrets

Set up required environment variables as GitHub Actions secrets:

Option A: Using GitHub UI

Navigate to: Repository → Settings → Secrets and variables → Actions
Click "New repository secret"
Add required secrets:
- HOPSWORKS_API_KEY
- OPENAI_API_KEY

📚 Set up GitHub Actions Secrets Guide

Option B: Using GitHub CLI

If you have GitHub CLI installed, instead of settings the GitHub Actions secrets manually, you can set them by running the following:

gh secret set HOPSWORKS_API_KEY
gh secret set OPENAI_API_KEY

3. Execute Pipeline

Manual Execution

Go to Actions → ML Pipelines
Click "Run workflow"
Select branch (default: main)
Click "Run workflow"

After triggering the pipeline, you will see it running, signaled by a yellow circle. Click on it to see the progress.

After it is finished, it should look like this:

Automated Execution

Another option is to run the ML pipelines automatically on a schedule or when new commits are pushed to the main branch.

Edit .github/workflows/ml_pipelines.yaml to enable automatic triggers:

name: ML Pipelines

on:
  # schedule: # Uncomment to run the pipelines every 2 hours. All the pipelines take ~1.5 hours to run.
  #   - cron: '0 */2 * * *'
  # push: # Uncomment to run pipelines on every new commit to main
  #   branches:
  #     - main
  workflow_dispatch:  # Allows manual triggering from GitHub UI

Monitoring & Results

Pipeline Progress
- View real-time execution in Actions tab
- Each step shows detailed logs and status
Output Verification
- Access results in Hopsworks Serverless
- Check Feature Groups, Feature Views, Model Registry, and Deployments

⚠️ Important Notes

Full pipeline execution takes approximately 1.5 hours
Ensure sufficient GitHub Actions minutes available
Monitor usage when enabling automated triggers

🌐 Live Demo

Try out our deployed H&M real-time personalized recommender to see what you'll learn to build by the end of this course: 💻 Live H&M Recommender Streamlit Demo

Important

The demo is in 0-cost mode, which means that when there is no traffic, the deployment scales to 0 instances. The first time you interact with it, give it 1-2 minutes to warm up to 1+ instances. Afterward, everything will become smoother.

☁️ Deploying the Streamlit App

Deploying a Streamlit App to their cloud is free and straightforward after the GitHub repository is set in right place:

uv.lock - installing Python dependencies
packages.txt - installing system dependencies
streamlit_app.py - entrypoint to the Streamlit application

Deployment Steps

1. Repository Setup

Fork the repository if you haven't already:

# Use GitHub's UI to fork the repository
https://github.com/original-repo/name → Your-Username/name

📚 GitHub Fork Guide

2. Streamlit Cloud Setup

Create a free account on Streamlit Cloud
Navigate to New App Deployment
Configure deployment settings:

Setting	Configuration	Description
App Type		Select "Deploy a public app from GitHub"
Main Settings		Configure your repository
Advanced Settings		Set Python 3.11 and `HOPSWORKS_API_KEY`

⚠️ Important Notes

Ensure all required files are present in your repository
Python version must be set to 3.11
HOPSWORKS_API_KEY must be configured in environment variables
Repository must be public for free tier deployment

📚 More on Streamlit Cloud deployments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INSTALL_AND_USAGE.md

INSTALL_AND_USAGE.md

🚀 Installation and Usage Guide

📑 Table of Contents

📋 Prerequisites

Local Tools

Cloud Services

🎯 Getting Started

1. Clone the Repository

2. Installation

3. Environment Configuration

⚡️ Running the H&M Personalized Recommender

Notebooks

Running the ML Pipelines

Running Everything in One Go (Quick)

Running Individual Components (Recommended)

Clean Up Resources

🚨 Important Notes

🤖 Running the ML Pipelines in GitHub Actions

Pipeline Triggers

Setup Process

1. Fork Repository

2. Configure Secrets

3. Execute Pipeline

Manual Execution

Automated Execution

Monitoring & Results

⚠️ Important Notes

🌐 Live Demo

☁️ Deploying the Streamlit App

Deployment Steps

1. Repository Setup

2. Streamlit Cloud Setup

⚠️ Important Notes

Files

INSTALL_AND_USAGE.md

Latest commit

History

INSTALL_AND_USAGE.md

File metadata and controls

🚀 Installation and Usage Guide

📑 Table of Contents

📋 Prerequisites

Local Tools

Cloud Services

🎯 Getting Started

1. Clone the Repository

2. Installation

3. Environment Configuration

⚡️ Running the H&M Personalized Recommender

Notebooks

Running the ML Pipelines

Running Everything in One Go (Quick)

Running Individual Components (Recommended)

Clean Up Resources

🚨 Important Notes

🤖 Running the ML Pipelines in GitHub Actions

Pipeline Triggers

Setup Process

1. Fork Repository

2. Configure Secrets

3. Execute Pipeline

Manual Execution

Automated Execution

Monitoring & Results

⚠️ Important Notes

🌐 Live Demo

☁️ Deploying the Streamlit App

Deployment Steps

1. Repository Setup

2. Streamlit Cloud Setup

⚠️ Important Notes