(c) 2017 by Thomas Lidy, TU Wien - http://ifs.tuwien.ac.at/~lidy
This is a set of tutorials showing how to use Deep learning algorithms for music analysis and retrieval problems. More specifically, we use Convolutional Neural Networks to classify (categorize) music into genres.
It uses Python 2.7 as the programming language with the popular Keras and Theano Deep Learning libraries underneath.
For the tutorials, we use iPython / Jupyter notebook, which allows to program and execute Python code interactively in the browser.
If you do not want to install anything, you can simply view the tutorials' content in your browser, by clicking on the tutorial's filenames listed below in the GIT file listing (above, resp. on https://github.com/tuwien-musicir/DL_MIR_Tutorial ).
The tutorial will open in your browser for viewing.
If you want to follow the Tutorials by actually executing the code on your computer, please install first the pre-requisites as described below.
After that, to run the tutorials go into the DL_MIR_Tutorial
folder and start from the command line:
ipython notebook
or jupyter notebook
Your web browser will open showing a list of files. Start the tutorials one after another by clicking on the following:
Music_genre_classification.ipynb
This tutorial shows how music is categorized into 1 of 10 music genres using the GTZAN music collection (see below).
It includes audio and data preprocessing for Deep Learning and creating and training different architectures and parameters of a Convolutional Neural Network. It also includes techniques such as Batch Normalization, ReLU Activation and Dropout.
Note: On most Mac and Linux systems Python is already pre-installed. Check with python --version
on the command line whether you have Python 2.7.x installed.
Otherwise install Python 2.7 from https://www.python.org/download/releases/2.7/
(on Windows leave out sudo
)
sudo pip install ipython
Try if you can open
ipython notebook
on the command line. Otherwise try to install:
sudo pip install jupyter
Then download or clone the Tutorials from this GIT repository:
git clone https://github.com/tuwien-musicir/DL_MIR_Tutorial.git
or download https://github.com/tuwien-musicir/DL_MIR_Tutorial/archive/master.zip
unzip it and rename the folder to DL_MIR_Tutorial
.
Install the remaining Python libraries needed:
Either by:
sudo pip install Keras==1.2.1 Theano==0.8.2 scikit-learn>=0.17 pandas librosa
or, if you downloaded or cloned this repository, by:
cd DL_MIR_Tutorial
sudo pip install -r requirements.txt
If you want to use audio formats other than .wav files (e.g. .mp3, .flac, .au, .mp4), you have to install FFMPEG on you computer:
- Linux: install
ffmpeg
, viasudo apt-get install ffmpeg
)- for Ubuntu 14.04: see http://fcorti.com/2014/04/22/ffmpeg-ubuntu-14-04-lts
- Mac: download FFMPeg for Mac: http://ffmpegmac.net or if you use brew, execute:
brew install ffmpeg
- Windows: download FFMpeg.exe from https://github.com/tuwien-musicir/rp_extract/blob/master/bin/external/win/ffmpeg.exe
Make sure that the exectuable is in a PATH found by the system.
Since we use Theano as the Deep Learning computation backend, but Keras is configured to use TensorFlow by default, we have to change this in the keras.json
configuration file, which is in the .keras
folder of the user's HOME directory.
Copy the keras.json
included in the DL_MIR_Tutorial
to one of the following target directories (you can overwrite an existing file):
- Windows:
C:\Users\<user>\.keras\
- Mac:
/Users/<user>/.keras
- Linux:
/home/<user>/.keras
An alternantive is to change these 2 lines in your keras.json
file to the following:
{
"image_dim_ordering": "th",
"backend": "theano"
}
See https://keras.io/backend/ for details or http://ankivil.com/installing-keras-theano-and-dependencies-on-windows-10/ for a step by step guide.
If you want to train your neural networks on your GPU, also install the following (not needed for the tutorials):
- NVidia drivers
- CUDA
- cuDNN (optional, for further speedup)
To permanently configure Keras/Theano to use the GPU place a file .theanorc
in your home directory with the following content:
[global]
device = gpu
floatX = float32
mode=FAST_RUN
To check whether Python, Keras and Theano were installed correctly, do:
python test_keras.py
If everything is installed correctly, it should print Using Theano backend.
If the GPU is configured correctly, it should also print Using gpu device 0: GeForce GTX 980 Ti
or similar.
The following helper Python libraries are used in these tutorials:
audiofile_read.py
andrp_extract.py
: by Thomas Lidy and Alexander Schindler, taken from the RP_extract git repositorywavio.py
: by Warren Weckesser
The data sets we use in the tutorials are from the following sources:
-
GTZAN music genre data set: by George Tzanetakis 1000 audio files with 30 sec. each, across 10 music genres, 100 audio files each
-
GTZAN music speech data set: (currently not used) by George Tzanetakis Collected for the purposes of music/speech discrimination. 128 tracks, each 30 seconds long. Each class (music or speech) has 64 examples in 22050Hz Mono 16-bit WAV audio format.
both data sets available from: http://marsyasweb.appspot.com/download/data_sets/