This code can be used to reproduce the results presented in the NIPS ML4Audio workshop paper: https://arxiv.org/abs/1712.00254. Feel free to get in touch if you spot errors, have questions, or would like to discuss follow-up results ([email protected]).
To get training:
- download the esc50 dataset from: https://github.com/karoldvl/ESC-50 or https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YDEPUT.
- install the necessary dependencies as listed below.
- clone the GitHub repository, and install the library by running
pip install .
in the folder containing the setup.py file. - process the ESC50 dataset by running (./) the
make_esc50_data.sh
file in theExperiments
folder after making it executable (chmod +x) and specifying the necessary paths. - train the MSTmodel for each of the folds by running the
mst_learning.sh
file in theExperiments
folder (…). Provide the path to thefeatures
subfolder that is created during processing of the dataset. - train the classifiers for each of the folds by running the
classifier_training.sh
file in theExperiments
folder (…). Provide the path to thefeatures
subfolder that is created during processing of the dataset. - use the
retrieve_fold_results
function in theUtilities
folder to collect and average the results over the folds. Plotting was done using standard matplotlib functions.
Dependencies
- librosa (tested with version 0.5.0), pyaudio, pydub, numpy, pandas, scipy, tensorflow (tested with version 1.2).
Notes:
- This is a work in progress and it’s not focussed on production environments (i.e. before using in any non-research application I would strongly advice to spend time on optimization).
- Without access to a GPU, training these models might be prohibitively expensive. I have not tried this.
- Multi-GPU training is not supported at the moment.
- The code has been kept somewhat modular so that you can change individual components (model, trainer, dataset) relatively easily for running new experiments.