VGGish: A VGG-like audio classification model

This repository provides a VGGish model, implemented in Keras with tensorflow backend. This repository is developed based on the model for AudioSet. For more details, please visit the slim version.

Pretrained weights in Keras h5py:

They should be downloaded into weights directory

Model with the top fully connected layers
Model without the top fully connected layers

Test it!

You can check it with a sub-set of the Speech Commands dataset

Download it and move a sub set (i.e. some folders) of the dataset.

The directory should look like this:

VGGish/dataset/data/class_01/*.wav
VGGish/dataset/data/class_02/*.wav
VGGish/dataset/data/class_03/*.wav

where class_01, class_02, and class_03 are classes from the Speech Commands dataset, and VGGish is the root of this project. You can check that the dataset is correctly placed, and see a summary by running python dataset/dataset_utils.py which, for yes/no categories, should output something like that:

        Dataset summary:

        training   ->  6358 examples
                   no ->  3130 examples
                  yes ->  3228 examples

        testing    ->   824 examples
                   no ->   405 examples
                  yes ->   419 examples

        validation ->   803 examples
                   no ->   406 examples
                  yes ->   397 examples

Then you can run it with python evaluation.py.

Here you can see an output example for yes/no categories

loading training data...
loading test data...
loading validation data...
Training...
Report for training
              precision    recall  f1-score   support

          no       0.94      0.99      0.96      2850
         yes       0.99      0.94      0.96      2990

    accuracy                           0.96      5840
   macro avg       0.96      0.96      0.96      5840
weighted avg       0.96      0.96      0.96      5840

Report for validation
              precision    recall  f1-score   support

          no       0.92      0.98      0.95       363
         yes       0.98      0.92      0.95       356

    accuracy                           0.95       719
   macro avg       0.95      0.95      0.95       719
weighted avg       0.95      0.95      0.95       719

Report for testing
              precision    recall  f1-score   support

          no       0.92      0.98      0.95       386
         yes       0.98      0.92      0.95       397

    accuracy                           0.95       783
   macro avg       0.95      0.95      0.95       783
weighted avg       0.95      0.95      0.95       783

Notice that I removed some logging info from TF which is not relevant at this point.

Reference:

Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017
Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
dataset		dataset
weights		weights
.gitignore		.gitignore
README.MD		README.MD
evaluation.py		evaluation.py
mel_features.py		mel_features.py
preprocess_sound.py		preprocess_sound.py
requirements-cpu.txt		requirements-cpu.txt
requirements-gpu.txt		requirements-gpu.txt
vggish.py		vggish.py
vggish_inputs.py		vggish_inputs.py
vggish_params.py		vggish_params.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VGGish: A VGG-like audio classification model

Pretrained weights in Keras h5py:

Test it!

Reference:

About

Releases

Packages

Languages

rola93/VGGish

Folders and files

Latest commit

History

Repository files navigation

VGGish: A VGG-like audio classification model

Pretrained weights in Keras h5py:

Test it!

Reference:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages