This repository contains the code for my submission to the Recsys Challenge 2018 (team name 'Team Radboud'). Recommendations are generated from a bipartite graph representation (playlists and tracks) by running random walks over the graph.
The graph is built from the Million Playlist Dataset:
Million Playlist Dataset, official website hosted at https://recsys-challenge.spotify.com/
Requirements:
- python >= 3.6
- numpy
- scipy
- tqdm
- spotipy (and Spotify API credentials)
- whoosh
First, download the metadata using utils/get_metadata.py
. Note that this script requires the Spotify API credentials to be set in the environment (os.environ
):
'SPOTIPY_CLIENT_ID': your client id
'SPOTIPY_CLIENT_SECRET': your client secret
Then, build the graph using utils/build_graph.py
. Both these operations will take a long time, the -quick flag can be passed to either script to generate a very small graph from the first MPD files for a quick check if everything is working. The graph is needed as input to run_mpd.py
and run_challenge.py
.
run_mpd.py
runs the random walk methods on a validation set taken from the MPD, this file is used for experimentation. run_challenge.py
runs the methods on the challenge set and generates a csv in the format described on the challenge page.
The final score on the leaderboard was generated by running the command:
python run_challenge.py <challenge_set.json> <graph.npz> <name_index> -alpha 0.96 -N 100000 -n_p 100000 -seed 1 -switch_d_prune