Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-1210 ] Gluon Audio #13241

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
18f0634
Added AudioFolderDataset and some popular audio transforms
gaurav-gireesh Nov 13, 2018
03e956f
Removed referencing in imports and added a check for ndarray in trans…
gaurav-gireesh Nov 13, 2018
d278672
Module import enclosed within try/catch
gaurav-gireesh Nov 13, 2018
9a49683
Reordering imports and fixing lint errors
gaurav-gireesh Nov 13, 2018
f771728
Reordering imports and fixing lint errors, ModuleNotFoundError not pr…
gaurav-gireesh Nov 13, 2018
73ebd25
removed dependency from sklearn
gaurav-gireesh Nov 13, 2018
e1a9c9a
Transforms - added Documentation of parameters
gaurav-gireesh Nov 14, 2018
a40890f
Organized imports
gaurav-gireesh Nov 14, 2018
16bb51d
Test cases for the transforms added
gaurav-gireesh Nov 14, 2018
7096331
Adding librosa in the ci install script
gaurav-gireesh Nov 14, 2018
8d6fd4d
Reverting the change to see if CI passes
gaurav-gireesh Nov 14, 2018
eabb682
Added an example on Urban Sounds dataset using AudioFolderDataset
gaurav-gireesh Nov 15, 2018
9a24af5
Adding prediction support in the example, Loader removed from the tra…
gaurav-gireesh Nov 15, 2018
4b62ee6
Modularized the example into train, model and predict. Added the logi…
gaurav-gireesh Nov 16, 2018
219b08e
removed unused imports
gaurav-gireesh Nov 16, 2018
e1a8037
Addressed PR Comments
gaurav-gireesh Nov 16, 2018
21c7a4b
synset.txt filename changed
gaurav-gireesh Nov 16, 2018
bbebd25
Parameter name updated
gaurav-gireesh Nov 16, 2018
65f3491
Updated README
gaurav-gireesh Nov 16, 2018
ececde7
Addressed PR comments
gaurav-gireesh Nov 17, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file.
222 changes: 222 additions & 0 deletions example/gluon/urban_sounds/urban_sounds.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,222 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

"""
Urban Sounds Dataset:

To be able to run this example:

1. Download the dataset(train.zip, test.zip) required for this example from the location:
**https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU**
2. Extract both the zip archives into the **current directory** -
after unzipping you would get 2 new folders namely,\
**Train** and **Test** and two csv files - **train_csv.csv**, **test_csv.csv**
3. Apache MXNet is installed on the machine. For instructions, go to the link:
**https://mxnet.incubator.apache.org/install/ **
4. Librosa is installed. To install, follow the instructions here:
**https://librosa.github.io/librosa/install.html**

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move these instructions to a README file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a README for the example. Thanks!

"""
import os
import time
import warnings
import mxnet as mx
from mxnet import gluon, nd, autograd
from mxnet.gluon.contrib.data.audio.datasets import AudioFolderDataset
from mxnet.gluon.contrib.data.audio.transforms import MFCC
try:
import argparse
except ImportError as er:
warnings.warn("Argument parsing module could not be imported and hence \
no arguments passed to the script can actually be parsed.")
try:
import librosa
except ImportError as er:
warnings.warn("ALibrosa module could not be imported and hence \
audio could not be loaded onto numpy array.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more concise warning would be good. Maybe it should fail on ImportError?
Also, what happens to argparse.ArgumentParser below when this import fails?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fails, I pass some default arguments. There is a check which returns if directories are empty.


# Defining a neural network with number of labels
def get_net(num_labels=10):
net = gluon.nn.Sequential()
with net.name_scope():
net.add(gluon.nn.Dense(256, activation="relu")) # 1st layer (256 nodes)
net.add(gluon.nn.Dense(256, activation="relu")) # 2nd hidden layer
net.add(gluon.nn.Dense(num_labels))
net.collect_params().initialize(mx.init.Normal(1.))
return net


# Defining a function to evaluate accuracy
def evaluate_accuracy(data_iterator, net):
acc = mx.metric.Accuracy()
for _, (data, label) in enumerate(data_iterator):
output = net(data)
predictions = nd.argmax(output, axis=1)
predictions = predictions.reshape((-1, 1))
acc.update(preds=predictions, labels=label)
return acc.get()[1]


def train(train_dir=None, pred_directory='./Test', train_csv=None, epochs=30, batch_size=32):
"""
The function responsible for running the training the model.
"""
if not train_dir or not os.path.exists(train_dir) or not train_csv:
warnings.warn("No train directory could be found ")
return
# Make a dataset from the local folder containing Audio data
print("\nMaking an Audio Dataset...\n")
tick = time.time()
aud_dataset = AudioFolderDataset('./Train', has_csv=True, train_csv='./train.csv', file_format='.wav', skip_rows=1)
tock = time.time()

print("Loading the dataset took ", (tock-tick), " seconds.")
print("\n=======================================\n")
print("Number of output classes = ", len(aud_dataset.synsets))
print("\nThe labels are : \n")
print(aud_dataset.synsets)
# Get the model to train
net = get_net(len(aud_dataset.synsets))
print("\nNeural Network = \n")
print(net)
print("\nModel - Neural Network Generated!\n")
print("=======================================\n")

#Define the loss - Softmax CE Loss
softmax_loss = gluon.loss.SoftmaxCELoss(from_logits=False, sparse_label=True)
print("Loss function initialized!\n")
print("=======================================\n")

#Define the trainer with the optimizer
trainer = gluon.Trainer(net.collect_params(), 'adadelta')
print("Optimizer - Trainer function initialized!\n")
print("=======================================\n")
print("Loading the dataset to the Gluon's OOTB Dataloader...")

#Getting the data loader out of the AudioDataset and passing the transform
aud_transform = gluon.data.vision.transforms.Compose([MFCC()])
tick = time.time()

audio_train_loader = gluon.data.DataLoader(aud_dataset.transform_first(aud_transform), batch_size=32, shuffle=True)
tock = time.time()
print("Time taken to load data and apply transform here is ", (tock-tick), " seconds.")
print("=======================================\n")


print("Starting the training....\n")
# Training loop
tick = time.time()
batch_size = batch_size
num_examples = len(aud_dataset)

for e in range(epochs):
cumulative_loss = 0
for _, (data, label) in enumerate(audio_train_loader):
with autograd.record():
output = net(data)
loss = softmax_loss(output, label)
loss.backward()
trainer.step(batch_size)
cumulative_loss += mx.nd.sum(loss).asscalar()

if e%5 == 0:
train_accuracy = evaluate_accuracy(audio_train_loader, net)
print("Epoch %s. Loss: %s Train accuracy : %s " % (e, cumulative_loss/num_examples, train_accuracy))
print("\n------------------------------\n")

train_accuracy = evaluate_accuracy(audio_train_loader, net)
tock = time.time()
print("\nFinal training accuracy: ", train_accuracy)

print("Training the sound classification for ", epochs, " epochs, MLP model took ", (tock-tick), " seconds")
print("====================== END ======================\n")
predict(net, aud_transform, aud_dataset.synsets, pred_directory=pred_directory)


def predict(net, audio_transform, synsets, pred_directory='./Test'):
"""
The function is used to run predictions on the audio files in the directory `pred_directory`

Parameters
----------
Keyword arguments that can be passed, which are utilized by librosa module are:
net: The model that has been trained.

pred_directory: string, default ./Test
The directory that contains the audio files on which predictions are to be made
"""
if not librosa:
warnings.warn("Librosa dependency not installed! Cnnot load the audio to make predictions. Exitting.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some typos

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected the typos.

return

if not os.path.exists(pred_directory):
warnings.warn("The directory on which predictions are to be made is not found!")
return

if len(os.listdir(pred_directory)) == 0:
warnings.warn("The directory on which predictions are to be made is empty! Exitting...")
return

file_names = os.listdir(pred_directory)
full_file_names = [os.path.join(pred_directory, item) for item in file_names]

print("\nStarting predictions for audio files in ", pred_directory, " ....\n")
for filename in full_file_names:
X1, _ = librosa.load(filename, res_type='kaiser_fast')
transformed_test_data = audio_transform(mx.nd.array(X1))
output = net(transformed_test_data.reshape((1, -1)))
prediction = nd.argmax(output, axis=1)
print(filename, " -> ", synsets[(int)(prediction.asscalar())])


if __name__ == '__main__':

parser = argparse.ArgumentParser(description="Urban Sounds clsssification example - MXNet")
parser.add_argument('--train', '-t', help="Enter the folder path that contains your audio files", type=str)
parser.add_argument('--csv', '-c', help="Enter the filename of the csv that contains filename\
to label mapping", type=str)
parser.add_argument('--epochs', '-e', help="Enter the number of epochs \
you would want to run the training for.", type=int)
parser.add_argument('--batch_size', '-b', help="Enter the batch_size of data", type=int)
parser.add_argument('--pred', '-p', help="Enter the folder path that contains your audio \
files for which you would want to make predictions on.", type=str)
args = parser.parse_args()
pred_directory = args.pred

if args:
if args.train:
train_dir = args.train
else:
train_dir = './Train'

if args.csv:
train_csv = args.csv
else:
train_csv = './train.csv'

if args.epochs:
epochs = args.epochs
else:
epochs = 35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default number of epochs is 35 here but but 30 in train() above. this should be same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Thanks.


if args.batch_size:
batch_size = args.batch_size
else:
batch_size = 32
train(train_dir=train_dir, train_csv=train_csv, epochs=epochs, batch_size=batch_size, pred_directory=pred_directory)
print("Urban sounds classification DONE!")
2 changes: 2 additions & 0 deletions python/mxnet/gluon/contrib/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,5 @@
from . import text

from .sampler import *

from . import audio
24 changes: 24 additions & 0 deletions python/mxnet/gluon/contrib/data/audio/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# coding: utf-8
# pylint: disable=wildcard-import
"""Audio utilities."""

from .datasets import *

from . import transforms
Loading