-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-1210 ] Gluon Audio #13241
[MXNET-1210 ] Gluon Audio #13241
Changes from all commits
18f0634
03e956f
d278672
9a49683
f771728
73ebd25
e1a9c9a
a40890f
16bb51d
7096331
8d6fd4d
eabb682
9a24af5
4b62ee6
219b08e
e1a8037
21c7a4b
bbebd25
65f3491
ececde7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Urban Sounds classification in MXNet | ||
|
||
Urban Sounds Dataset: | ||
## Description | ||
The dataset contains 8732 wav files which are audio samples(<= 4s)) of street sounds like engine_idling, car_horn, children_playing, dog_barking and so on. | ||
The task is to classify these audio samples into one of the 10 labels. | ||
|
||
To be able to run this example: | ||
|
||
1. Download the dataset(train.zip, test.zip) required for this example from the location: | ||
**https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why this drive folder? Who owns this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is from an urban sounds challenge/contest hosted by https://datahack.analyticsvidhya.com/contest/practice-problem-urban-sound-classification/ |
||
|
||
|
||
2. Extract both the zip archives into the **current directory** - after unzipping you would get 2 new folders namely,\ | ||
**Train** and **Test** and two csv files - **train.csv**, **test.csv** | ||
|
||
3. Apache MXNet is installed on the machine. For instructions, go to the link: **https://mxnet.incubator.apache.org/install/** | ||
|
||
4. Librosa is installed. To install, use the commands | ||
`pip install librosa`, | ||
For more details, refer here: | ||
**https://librosa.github.io/librosa/install.html** |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
""" | ||
This module builds a model an MLP with a configurable output layer( number of units in the last layer). | ||
Users can pass any number of units in the last layer. SInce this dataset has 10 labels, | ||
the default value of num_labels = 10 | ||
""" | ||
import mxnet as mx | ||
from mxnet import gluon | ||
|
||
# Defining a neural network with number of labels | ||
def get_net(num_labels=10): | ||
net = gluon.nn.Sequential() | ||
with net.name_scope(): | ||
net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes) | ||
net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer | ||
net.add(gluon.nn.Dense(num_labels)) | ||
net.collect_params().initialize(mx.init.Normal(1.)) | ||
return net |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
""" Prediction module for Urban Sounds Classification | ||
""" | ||
import os | ||
import warnings | ||
import mxnet as mx | ||
from mxnet import nd | ||
from mxnet.gluon.contrib.data.audio.transforms import MFCC | ||
from model import get_net | ||
|
||
def predict(prediction_dir='./Test'): | ||
"""The function is used to run predictions on the audio files in the directory `pred_directory`. | ||
|
||
Parameters | ||
---------- | ||
net: | ||
The model that has been trained. | ||
prediction_dir: string, default ./Test | ||
The directory that contains the audio files on which predictions are to be made | ||
|
||
""" | ||
|
||
try: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can we move all these checks to a separate function so it does not confuse the actual prediction logic. |
||
import librosa | ||
except ImportError: | ||
warnings.warn("Librosa is not installed! please run the following command pip install librosa.") | ||
return | ||
|
||
if not os.path.exists(prediction_dir): | ||
warnings.warn("The directory on which predictions are to be made is not found!") | ||
return | ||
|
||
if len(os.listdir(prediction_dir)) == 0: | ||
warnings.warn("The directory on which predictions are to be made is empty! Exiting...") | ||
return | ||
|
||
# Loading synsets | ||
if not os.path.exists('./synset.txt'): | ||
warnings.warn("The synset or labels for the dataset do not exist. Please run the training script first.") | ||
return | ||
|
||
with open("./synset.txt", "r") as f: | ||
synset = [l.rstrip() for l in f] | ||
net = get_net(len(synset)) | ||
print("Trying to load the model with the saved parameters...") | ||
if not os.path.exists("./net.params"): | ||
warnings.warn("The model does not have any saved parameters... Cannot proceed! Train the model first") | ||
return | ||
|
||
net.load_parameters("./net.params") | ||
file_names = os.listdir(prediction_dir) | ||
full_file_names = [os.path.join(prediction_dir, item) for item in file_names] | ||
mfcc = MFCC() | ||
print("\nStarting predictions for audio files in ", prediction_dir, " ....\n") | ||
for filename in full_file_names: | ||
# Argument kaiser_fast to res_type is faster than 'kaiser_best'. To reduce the load time, passing kaiser_fast. | ||
X1, _ = librosa.load(filename, res_type='kaiser_fast') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add a comment on why kaiser fast |
||
transformed_test_data = mfcc(mx.nd.array(X1)) | ||
output = net(transformed_test_data.reshape((1, -1))) | ||
prediction = nd.argmax(output, axis=1) | ||
print(filename, " -> ", synset[(int)(prediction.asscalar())]) | ||
|
||
|
||
if __name__ == '__main__': | ||
try: | ||
import argparse | ||
parser = argparse.ArgumentParser(description="Urban Sounds clsssification example - MXNet") | ||
parser.add_argument('--pred', '-p', help="Enter the folder path that contains your audio files", type=str) | ||
args = parser.parse_args() | ||
pred_dir = args.pred | ||
|
||
except ImportError: | ||
warnings.warn("Argparse module not installed! passing default arguments.") | ||
pred_dir = './Test' | ||
predict(prediction_dir=pred_dir) | ||
print("Urban sounds classification Prediction DONE!") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
"""The module to run training on the Urban sounds dataset | ||
""" | ||
import os | ||
import time | ||
import warnings | ||
import mxnet as mx | ||
from mxnet import gluon, nd, autograd | ||
from mxnet.gluon.contrib.data.audio.datasets import AudioFolderDataset | ||
from mxnet.gluon.contrib.data.audio.transforms import MFCC | ||
import model | ||
|
||
def evaluate_accuracy(data_iterator, net): | ||
"""Function to evaluate accuracy of any data iterator passed to it as an argument""" | ||
acc = mx.metric.Accuracy() | ||
for _, (data, label) in enumerate(data_iterator): | ||
output = net(data) | ||
predictions = nd.argmax(output, axis=1) | ||
predictions = predictions.reshape((-1, 1)) | ||
acc.update(preds=predictions, labels=label) | ||
return acc.get()[1] | ||
|
||
|
||
def train(train_dir=None, train_csv=None, epochs=30, batch_size=32): | ||
"""The function responsible for running the training the model.""" | ||
try: | ||
import librosa | ||
except ImportError: | ||
warnings.warn("The dependency librosa is not installed. Cannot continue") | ||
return | ||
if not train_dir or not os.path.exists(train_dir) or not train_csv: | ||
warnings.warn("No train directory could be found ") | ||
return | ||
# Make a dataset from the local folder containing Audio data | ||
print("\nMaking an Audio Dataset...\n") | ||
tick = time.time() | ||
aud_dataset = AudioFolderDataset(train_dir, train_csv=train_csv, file_format='.wav', skip_rows=1) | ||
tock = time.time() | ||
|
||
print("Loading the dataset took ", (tock-tick), " seconds.") | ||
print("\n=======================================\n") | ||
print("Number of output classes = ", len(aud_dataset.synsets)) | ||
print("\nThe labels are : \n") | ||
print(aud_dataset.synsets) | ||
# Get the model to train | ||
net = model.get_net(len(aud_dataset.synsets)) | ||
print("\nNeural Network = \n") | ||
print(net) | ||
print("\nModel - Neural Network Generated!\n") | ||
print("=======================================\n") | ||
|
||
#Define the loss - Softmax CE Loss | ||
softmax_loss = gluon.loss.SoftmaxCELoss(from_logits=False, sparse_label=True) | ||
print("Loss function initialized!\n") | ||
print("=======================================\n") | ||
|
||
#Define the trainer with the optimizer | ||
trainer = gluon.Trainer(net.collect_params(), 'adadelta') | ||
print("Optimizer - Trainer function initialized!\n") | ||
print("=======================================\n") | ||
print("Loading the dataset to the Gluon's OOTB Dataloader...") | ||
|
||
#Getting the data loader out of the AudioDataset and passing the transform | ||
aud_transform = MFCC() | ||
tick = time.time() | ||
|
||
audio_train_loader = gluon.data.DataLoader(aud_dataset.transform_first(aud_transform), batch_size=32, shuffle=True) | ||
tock = time.time() | ||
print("Time taken to load data and apply transform here is ", (tock-tick), " seconds.") | ||
print("=======================================\n") | ||
|
||
|
||
print("Starting the training....\n") | ||
# Training loop | ||
tick = time.time() | ||
batch_size = batch_size | ||
num_examples = len(aud_dataset) | ||
|
||
for e in range(epochs): | ||
cumulative_loss = 0 | ||
for _, (data, label) in enumerate(audio_train_loader): | ||
with autograd.record(): | ||
output = net(data) | ||
loss = softmax_loss(output, label) | ||
loss.backward() | ||
|
||
trainer.step(batch_size) | ||
cumulative_loss += mx.nd.sum(loss).asscalar() | ||
|
||
if e%5 == 0: | ||
train_accuracy = evaluate_accuracy(audio_train_loader, net) | ||
print("Epoch %s. Loss: %s Train accuracy : %s " % (e, cumulative_loss/num_examples, train_accuracy)) | ||
print("\n------------------------------\n") | ||
|
||
train_accuracy = evaluate_accuracy(audio_train_loader, net) | ||
tock = time.time() | ||
print("\nFinal training accuracy: ", train_accuracy) | ||
|
||
print("Training the sound classification for ", epochs, " epochs, MLP model took ", (tock-tick), " seconds") | ||
print("====================== END ======================\n") | ||
|
||
print("Trying to save the model parameters here...") | ||
net.save_parameters("./net.params") | ||
print("Saved the model parameters in current directory.") | ||
|
||
|
||
if __name__ == '__main__': | ||
|
||
try: | ||
import argparse | ||
parser = argparse.ArgumentParser(description="Urban Sounds clsssification example - MXNet") | ||
parser.add_argument('--train', '-t', help="Enter the folder path that contains your audio files", type=str) | ||
parser.add_argument('--csv', '-c', help="Enter the filename of the csv that contains filename\ | ||
to label mapping", type=str) | ||
parser.add_argument('--epochs', '-e', help="Enter the number of epochs \ | ||
you would want to run the training for.", type=int) | ||
parser.add_argument('--batch_size', '-b', help="Enter the batch_size of data", type=int) | ||
args = parser.parse_args() | ||
|
||
if args: | ||
if args.train: | ||
training_dir = args.train | ||
else: | ||
training_dir = './Train' | ||
|
||
if args.csv: | ||
training_csv = args.csv | ||
else: | ||
training_csv = './train.csv' | ||
|
||
if args.epochs: | ||
eps = args.epochs | ||
else: | ||
eps = 30 | ||
|
||
if args.batch_size: | ||
batch_sz = args.batch_size | ||
else: | ||
batch_sz = 32 | ||
|
||
except ImportError as er: | ||
warnings.warn("Argument parsing module could not be imported \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. duplicated try/except across train and pred files. Should be moved out to common utils file? |
||
Passing default arguments.") | ||
training_dir = './Train' | ||
training_csv = './train.csv' | ||
eps = 30 | ||
batch_sz = 32 | ||
|
||
train(train_dir=training_dir, train_csv=training_csv, epochs=eps, batch_size=batch_sz) | ||
print("Urban sounds classification Training DONE!") |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,3 +22,5 @@ | |
from . import text | ||
|
||
from .sampler import * | ||
|
||
from . import audio |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
# coding: utf-8 | ||
# pylint: disable=wildcard-import | ||
"""Audio utilities.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be good to add what this dataset is and what is the problem being solved.