Towards Training Music Taggers on Synthetic Data

This repository contains code accompanying our paper "Towards Training Music Taggers on Synthetic Data" (Kroher et al, 2024), which explores the potential of training music tagging system on synthetically generated music excerpts.

Our paper is available on ArXiv .

Co-authors: Nadine Kroher, Stephen Manangu and Aggelos Pikrakis

Overview

This initial study uses the GTZAN dataset, which is available for download here.

We compare different ways of incorporating synthetically generated music excerpts into the training process:

training on synthetic data only
adding synthetic to real data
adding synthetic data via domain adaptation
training on synthetic data and using transfer learning to real data
training on synthetic data and fine-tuning on real data

In order to generate synthetic music excerpts, we implemented the following pipeline:

Leverage OpenAI's GPT-3 turbo to create genre-specific textual descriptions (i.e. "A lively instrumental Country track featuring twangy guitars and upbeat fiddle melodies, perfect for a barn dance.")
Use these descriptions as text prompts to guide Meta's medium-sized MusicGen model to generate genre-specific music excerpts
Train and evaluate the MusiCNN architecture for the tagging task using the different strategies outlined above.

For more detail and results, please read the paper.

Installation

Create a virtual environment and install the dependencies:

python -m venv .venv

source .venv/bin/activate

pip install -r requirements.txt

Reproducing the results

Follow these steps to reproduce the results from our paper:

Download the GTZAN Dataset: Obtain the GTZAN music genre dataset, which contains 10 genres of music. You can download it here: https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification
Generating music descriptions Create 10,000 text prompts for each genre using a large language model (we used ChatGPT 3.5 turbo). Use the generate_descriptions.py script to generate text prompts using OpenAI's GPT-3 turbo.
Generating synthetic data Use the generated text prompts to create genre-specific synthetic music using MusicGen's medium-sized model. This will result in a dataset of 100K audio files in total. MusicGen is available via the audiocraft library. Use generate_synth_GTZAN.py to generate the synthetic data.
Data Splits: Our code uses the artist-filtered splits of the GTZAN dataset which we added in the folder artist_filtered_splits.
Extract Features: The MusiCNN architecture takes the log-mel spectogram as input which you can extract and store with the extract_features.py script.
- --source-data: Path to the synthetic data.
- --target-data: Path to the real data (GTZAN validation split).
Model Training Use the train.py script to train the model with the following arguments:
- --source-data: Synthetic data embeddings.
- --target-data: Human/GTZAN data embeddings.
- --mode: Type of training (src_only train on synthetic data only, trg_only train on real data only, both train on synthetic and real data, DA train on synthetic and real data with supervised domain adaptation, TL employ Transfer Learning from synthetic to real data, and FT train on synthetic data and fine-tune to real data).

PS: The modes TL and FT both assume that you have already trained a model on the source (real-world) data already

Visualising and analysing the results

You can run plot_embeddings.py to visualize an intermediate representation with and without domain adaptation

The plots should look like this:

Run plot_confusion_matrices.py script (pass the source and target data paths together with the model type) to plot the confusion matrix.

Citation

If you are going to use this work as part of your research, please cite following paper:

@inproceedings{kroher2024Towards,
  title={Towards training music taggers on synthetic data},
  author={N. Kroher and S. Manangu and A. Pkrakis},
  booktitle={Proceedings of the 21st International Conference on Content-based Multimedia Indexing},
  year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
artist_filtered_splits		artist_filtered_splits
.gitignore		.gitignore
README.md		README.md
check_features.py		check_features.py
classify_gtzan.py		classify_gtzan.py
extract_features.py		extract_features.py
generate_descriptions.py		generate_descriptions.py
generate_synth_GTZAN.py		generate_synth_GTZAN.py
music_tagger.py		music_tagger.py
plot_confusion_matrices.py		plot_confusion_matrices.py
plot_embeddings.py		plot_embeddings.py
predict.py		predict.py
predict_da.py		predict_da.py
requirements.txt		requirements.txt
train.py		train.py
with_da.png		with_da.png
without_da.png		without_da.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Towards Training Music Taggers on Synthetic Data

Overview

Installation

Reproducing the results

Visualising and analysing the results

Citation

About

Releases

Packages

Languages

NadineKroher/music-tagging-synthetic-data-cbmi-2024

Folders and files

Latest commit

History

Repository files navigation

Towards Training Music Taggers on Synthetic Data

Overview

Installation

Reproducing the results

Visualising and analysing the results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages