End-to-end Dialect Identification (implementation on MGB-3 Arabic dialect dataset)

Tensorflow implementation of End-to-End dialect identificaion in Arabic. If you are familiar with Language/Speaker identification/verification, it can be easily modified to another dialect, language or even speaker identification/verification tasks.

Requirement

Python, tested on 2.7.6
Tensorflow > v1.0
python library sox, tested on 1.3.2
python library librosa, tested on 0.5.1

Data list format

datalist consist of (location of wavfile) and (label in digit).

Example) "train.txt"

./data/wav/EGY/EGY000001.wav 0
./data/wav/EGY/EGY000002.wav 0
./data/wav/NOR/NOR000001.wav 4

Labels of Dialect:

Egytion (EGY) : 0
Gulf (GLF) : 1
Levantine(LAV): 2
Modern Standard Arabic (MSA) : 3
North African (NOR): 4

Dataset Augmentation

Augementation was done by two different method. First is random segment of the input utterance, and the other is perturbation by modifying speed and volume of speech.

Model definition

Simple description of the DNN model:

we used four 1-dimensional CNN (1d-CNN) layers (40x5 - 500x7 - 500x1 - 500x1 filter sizes with 1-2-1-1 strides and the number of filters is 500-500-500-3000) and two FC layers (1500-600) that are connected with a Global average pooling layer which averages the CNN outputs to produce a fixed output size of 3000x1.

End-to-end DID accuracy by epoch

End-to-end DID accuracy by epoch using augmented dataset

Performance comparison with and without Random Segmentation(RS)

Performance evaluation

Best performance is 73.39% on Accuracy. (Feb.28 2018)

for reference,

Conventional i-vector with SVM : 60.32%
Conventional i-vector with LDA and Cosine Distance : 62.60%
End-to-End model without dataset augmentation(MFCC): 65.55%
End-to-End model without dataset augmentation(FBANK): 64.81%
End-to-End model without dataset augmentation(Spectrogram): 57.57%

End-to-End model with volume perturbation(MFCC) : 67.49%
End-to-End model with speed perturbation(MFCC) : 70.51%

End-to-End model with speed and volume perturbation (MFCC) : 70.91%
End-to-End model with speed and volume perturbation (FBANK) : 71.92%
End-to-End model with speed and volume perturbation (Spectrogram) : 68.83%

End-to-End model with speed and volume perturbation+random segmention (MFCC) : 71.05%
End-to-End model with speed and volume perturbation+random segmention (FBANK) : 73.39%
End-to-End model with speed and volume perturbation+random segmention (Spectrogram) : 70.17%

Offline test

Offline test can be done in offline_test.ipynb code on our pretrained model. Specify wav file you want to identify Arabic dialect by modifying FILENAME variable.

FILENAME = ['/data/test/NOR_00001.wav']

Result can be shown like below bar plot of likelihood on 5 Arabic dialects.

Relevant publication

[1] Suwon Shon, Ahmed Ali, James Glass,
Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition,
Proc. Odyssey 2018 The Speaker and Language Recognition Workshop, 98-104
https://arxiv.org/abs/1803.04567

Citing

@inproceedings{Shon2018,
  author={Suwon Shon and Ahmed Ali and James Glass},
  title={Convolutional Neural Network and Language Embeddings for End-to-End Dialect Recognition	},
  year=2018,
  booktitle={Proc. Odyssey 2018 The Speaker and Language Recognition Workshop},
  pages={98--104},
  doi={10.21437/Odyssey.2018-14},
  url={http://dx.doi.org/10.21437/Odyssey.2018-14}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data		data
images		images
models		models
scripts		scripts
vardial2018		vardial2018
README.md		README.md
offline_test.ipynb		offline_test.ipynb
run.sh		run.sh
srun.sh		srun.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-end Dialect Identification (implementation on MGB-3 Arabic dialect dataset)

Requirement

Data list format

Dataset Augmentation

Model definition

Performance evaluation

Offline test

Relevant publication

Citing

About

Releases

Packages

Languages

swshon/dialectID_e2e

Folders and files

Latest commit

History

Repository files navigation

End-to-end Dialect Identification (implementation on MGB-3 Arabic dialect dataset)

Requirement

Data list format

Dataset Augmentation

Model definition

Performance evaluation

Offline test

Relevant publication

Citing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages