PyTorch and Kaldi Speaker Identification and Diarization

A speaker identification and diarization solution based on PyTorch and the VoxCeleb v2 example from Kaldi.

What is this

This work is a speaker identification system based on the Kaldi VoxCeleb v2 example. It enhances it by replacing the nnet3 based neural network with one implemented using the PyTorch machine learning framework. This allows an easier and more dynamic change of the network architecture.

In addition to speaker identification with VoxCeleb this project also adds the ability to run diarization tasks.

Setup

Make sure the requirements listed in What you need are given. The follow the steps described in How to Install.

What you need

Before you can run this make sure you have the required tools available. You need:

An Nvidia CUDA supporting graphic cart with more then 2 GB ram
A current linux distribution on an x86 computer
A fully operational installation of the kaldi framework
PyTorch with CUDA support
A copy of the VoxCeleb v1 and VoxCeleb v2 dataset
A copy of the MUSAN dataset
sox and ffmpeg for audio handling

How To Install

Follow these steps in order to be able to run this project. If something does not work or you don't understand something please open up an issue and ask I'll be happy to help:

Make sure Kaldi and CUDA are installed and work correctly.
Download this repo: git clone https://github.com/theScrabi/kaldi_voxceleb_pytorch
Enter the root directory of the project: cd kaldi_voxceleb_pytorch
Create a new Python virtual environment: virtualenv venv
Activate the virtual environment: source venv/bin/activate
Install the required Python packages: pip install -r requirements.txt
Edit the file sid/path.sh and set the KALDI variable to the path of your kaldi installation. (e.g.: KALDI=/opt/kaldi)
If you want to use diarization you need to edit diarization/path.sh and also set the KALDI variable there
Enter the diarization directory and run ./install.sh. This will set the required symlinks.

How To use

You can use the run.sh scripts in the sid folder for speaker identification or in the diarization folder for running training and testing.

For speaker identification please read the README.md inside the sid folder. For diarization read the README.md in the diarization folder.

Purpouse

The purpose of this work was to see if Angular Softmax with Cosine distance comparison can enhance end to end speaker identification and diarization. The goal was to find out if this could eventually outperform and replace the additional use of PLDA. Additionally it was checked if the use of an Attention Layer can also enhance speaker identification and diarization.

This was part of my Bachelor Thesis.

Also Interesting

Sphereface: The original implementation of the Angular margin based softmax implementation for face recognition.
Speech Brain An all in one PyTorch speech recognition framework.
pyannote.metric: A framework for diarization evaluation and error analysis.
kaldi with tensorflow dnn: A Tensorflow implementation of x-vector topology on top of kaldi.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.assets		.assets
diarization		diarization
sid		sid
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyTorch and Kaldi Speaker Identification and Diarization

What is this

Setup

What you need

How To Install

How To use

Purpouse

Also Interesting

About

Releases

Packages

Languages

License

theScrabi/kaldi_voxceleb_pytorch

Folders and files

Latest commit

History

Repository files navigation

PyTorch and Kaldi Speaker Identification and Diarization

What is this

Setup

What you need

How To Install

How To use

Purpouse

Also Interesting

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages