laplace is an end-to-end ML framework to train and predict on neurally-enhanced graphs for recommendation.
The pipeline is designed for self-supervised edge prediction on heterogenous graphs.
- Multi-step, hybrid recommendation pipeline:
- Candidate Selection:
- Integrating LightGCN recommendations (can be ran on its own aswell)
- Multiple, custom heuristics
- Strategies can be mixed and matched
- Ranking: GraphConvolutional network prediction on candidate edges
- Candidate Selection:
- Works on Heterogenous graphs
- User based training, validation and test splitting
- N-hop neighborhood aggregation
- Node Features
- Works on any number of node types
- Advanced preprocessing of tabular data into graphs
- Neo4j integration for better visualization and handling of large graphs.
Install the environment with:
conda env create -n fashion --file environment.yml
Activate the environment:
conda activate fashion
Download the required data.
- Upload your data to a server. You should have a seperate file for:
- articles.parquet
- customers.parquet
- transactions_splitted.parquet
-
Create an
.env
file in the root directory with aDATA_HOST_URL
variable -
Run the following script from terminal:
python run-download-data.py fashion
Currently the system works with the
- H&M Fashion Recommendation Kaggle challenge dataset.
python run-download-data.py fashion
- Movielens dataset
python run-download-data.py movielens
To run the pipeline there are four steps required:
- Adjust Config file under
config.py -> link_pred_config
andconfig.py -> preprocessing_config
- Run Preprocessing with
run_preprocessing.py
- Run Training
run_pipeline.py
- Save results Inference
run_submission
Step 1: Prepocessing
Preprocessing turns tabular data into a graph and (optionally) loads it into a neo4j
database.
- First download data as defined in 'Get Started'
- Set preprocessing configurations in
config.py -> preprocessing_config
- Run
run_preprocessing.py
Data will be saved under data/derived
.
Note on neo4j:
It is recommended to use neo4j, it is the officially supported database of laplace, by setting these parameters in config.py
:
preprocessing_config.save_to_neo4j = True
link_pred_config.neo4j = True
You can view the graph and run queries after running the preprocessing pipeline (it automatically starts neo4j server).
However, if neo4j stops running you can restart it with neo4j start
in the terminal. More info on neo4j.
Step 2: Training
- Set training configurations in
config.py -> link_pred_config
- run training with
run_pipeline.py
Step 3: Get Inference
- Run inference by launching
run_submission.py
wandb is integrated into laplace.
- Create an
.env
file in the root of the project. Add your wandb api key:WANDB_API_KEY=12345random678letters91011example121314
- You can configure the sweep under
sweep.yaml
- Then run
run_sweep.py
! Some sweep parameters are overwritten under run_sweep.py
:white_large_square Benchmark different implementation :white_large_square Additional matchers