Skip to content

MaxHam/MDS_Spotify-Semantic-Search

Repository files navigation

Music Similarity Search

Table of contents

Motivation

This is a project for the "Modern Database Systems" lecture, held at the Technische Hochschule Köln. Aim of the project is to find a use case where modern NoSQL databases outperform SQL databases. We decided to build a music similarity search engine, where you can search for a song and get similar songs back. The similarity is based on the lyrics and the audio features of the song. The project is based on the Spotify Dataset from Kaggle.

Requirements

  • Node.js
  • Docker
  • Python 3.8
  • Spotify API credentials

Installation

  1. Install the requirements with
pip install -r requirements.txt
  1. Download the dataset from Kaggle

  2. Unzip the dataset and place it in the data directory

  3. Clean the dataset with

python3 clean_data.py
  1. Create an .env file in the webapp directory with the following content. You need this to enable the Spotify API. You need to set up a spotify application. Look here for more information.
SPOTIFY_CLIENT_ID=your_spotify_client_id
SPOTIFY_CLIENT_SECRET=your_spotify_client_secret
PORT=5001
  1. First time using this project create the Weaviate.io & vectorizer server containers and start the web server with
docker-compose up
  1. Import the dataset into Weaviate.io with
python weaviate_import.py
  1. Go to localhost:3000 and enjoy listening to songs!

Benchmark

  1. If not already done, repeat all steps mentioned in Installation to initiliaze the vector database

  2. Clean the dataset with clean_sql_data and create the required .sql script

python3 clean_sql_data.py
  1. Upload the dataset to the sql server
python3 sql_import.py
  1. Run the benchmark script and see the results in the console or in the benchmark_results.csv file
python3 benchmark.py

Dataset

Spotify Dataset

Authors

  • Max Hammer
  • Dennis Goessler