RecSys Challenge 2018

Scripts for the RecSys Challenge 2018 from the D2KLab team.

Install Dependencies

pip install -r requirements.txt

Dataset

We converted the original JSON files in an equivalent CSV version.

python evaluation/mpd2csv.py --mpd_path /path/to/mpd --out_path dataset
python evaluation/challenge2csv.py --challenge_path /path/to/challenge.json --out_path dataset

We have divided the MPD dataset in training, validation and test sets. The validation and test sets mirror the characteristics of the official challenge set.

python evaluation/split.py --path dataset --input_playlists playlists.csv --input_items items.csv --output_playlists playlists_training_validation.csv --output_items items_training_validation.csv --output_playlists_split playlists_test.csv --output_playlists_split_pid playlists_test_pid.csv --output_items_split items_test.csv --output_items_split_x items_test_x.csv --output_items_split_y items_test_y.csv --scale 1000
python evaluation/split.py --path dataset --input_playlists playlists_training_validation.csv --input_items items_training_validation.csv --output_playlists playlists_training.csv --output_items items_training.csv --output_playlists_split playlists_validation.csv --output_playlists_split_pid playlists_validation_pid.csv --output_items_split items_validation.csv --output_items_split_x items_validation_x.csv --output_items_split_y items_validation_y.csv --scale 1000

Embeddings

We rely on 2 set of embeddings that we use as input of a Neural Network.

Word2Rec

Embeddings representing the tracks/albums/artists based on their co-occurrence in playlists.

python main.py word2rec_item word2rec.csv --w2r models/embs/1M/word2rec_dry.w2v --dataset dataset
python main.py word2rec_album word2rec_album.csv --w2r models/embs/1M/word2rec_dry_albums.w2v --dataset dataset
python main.py word2rec_artist word2rec_artist.csv --w2r models/embs/1M/word2rec_dry_artists.w2v --dataset dataset

Title2Rec

FastText embeddings representing the playlists' titles, computed on cluster of playlists based on the word2rec embeddings of their tracks.

python main.py title2rec title2rec.csv
python main.py title2rec_embs models/fast_text/title2rec.npy

Creative Track features

mpd_uri_topics and spotify_uri_features.pickle are pickle files containing features extracted from song lyrics such as the dominant topics, the emotions, the style and so on.

For using these features (only for Creative Track):

download the 2 files in this folder;
unzip them in <projectpath>\models\lyrics.

RNN

Add --lyrics in order to include the Creative Track features.

Training

python recommender/mpd_rnn.py --data_path=dataset --model=optimal --save_path=/path/to/model/optimal --embs=models/embs/1M --title_embs=models/fast_text/title2rec.npy

The trained models that have been used for the submissions are available at: http://eventmedia.eurecom.fr/recsys2018/

Generation

python recommender/mpd_rnn.py --data_path=dataset --model=optimal --save_path=/path/to/model --embs=models/embs/1M --title_embs=models/fast_text/title2rec.npy --smooth=linear --sample_file=solution.csv --is_dry=False

Ensemble

Ensemble allows to combine predictions of a set of RNN configurations to increase the accuracy. To use it, you need to put the submission files inside a folder submissions/dry and it will try all possible combinations of the submissions files and save them into files.

python recommender/ensemble.py

The following configurations have been combined, corresponding to the models of http://eventmedia.eurecom.fr/recsys2018/.

Main:

rnn_1M_300_e1.csv
rnn_1M_300_e2.csv
rnn_1M_400_e1.csv
rnn_1M_400_e2.csv
rnn_1M_e1.csv
rnn_1M_e2.csv

Creative:

rnn_1M_400_emotion_e2.csv
rnn_1M_400_emotion_e1.csv
rnn_1M_400_fuzzy_e1.csv
rnn_1M_400_fuzzy_e2.csv
rnn_1M_400_e1.csv
rnn_1M_300_e1.csv
rnn_1M_400_e2.csv
rnn_1M_e1.csv
rnn_1M_e2.csv

The naming has to be interpreted as:

rnn_1M_d_creative_epoch.csv

d = dimension of the embedding vector

d = 300 when using track, album and artist embeddings from the word2vec model

d = 300 + 100 = 400 when also using title2rec embedding vectors

creative = 'emotion' when using emotion detection features

creative = 'fuzzy' when using emotion, PoS tagging, and sentiment features

rnn_1M_epoch.csv correspond to the RNN using the SmallConfig rather than the OptimalConfig (see mpd_rnn.py for the detailed list of hyper-parameters)

Name		Name	Last commit message	Last commit date
Latest commit History 174 Commits
evaluation		evaluation
lyrics_features		lyrics_features
paper		paper
preprocessing		preprocessing
recommender		recommender
titles		titles
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RecSys Challenge 2018

Install Dependencies

Dataset

Embeddings

Word2Rec

Title2Rec

Creative Track features

RNN

Training

Generation

Ensemble

About

Releases

Packages

Contributors 7

Languages

License

D2KLab/recsys18_challenge

Folders and files

Latest commit

History

Repository files navigation

RecSys Challenge 2018

Install Dependencies

Dataset

Embeddings

Word2Rec

Title2Rec

Creative Track features

RNN

Training

Generation

Ensemble

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages