Neural speaker diarization with `pyannote.audio`

pyannote.audio is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.

TL;DR

# instantiate pretrained speaker diarization pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")

# apply pretrained pipeline
diarization = pipeline("audio.wav")

# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
    print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_A
# start=1.8s stop=3.9s speaker_B
# start=4.2s stop=5.7s speaker_A
# ...

What's new in `pyannote.audio` 2.0

For version 2.0 of pyannote.audio, I decided to rewrite almost everything from scratch. Highlights of this release are:

🤯 much better performance (see Benchmark)
🐍 Python-first API
🤗 pretrained pipelines (and models) on 🤗 model hub
⚡ multi-GPU training with pytorch-lightning
🎛️ data augmentation with torch-audiomentations

Installation

Only Python 3.8+ is officially supported (though it might work with Python 3.7)

conda create -n pyannote python=3.8
conda activate pyannote
conda install pytorch torchaudio -c pytorch
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip

Documentation

Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Training a pipeline
Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
Miscellaneous
- Training with pyannote-audio-train command line tool
- Speaker verification
- Visualization and debugging

Benchmark

The pretrained speaker diarization pipeline with default parameters is expected to be much better in v2.0 than in v1.1:

Diarization error rate (%)	v1.1	v2.0	∆DER
AMI `only_words` evaluation set	29.7	21.5	-28%
DIHARD 3 evaluation set	29.2	22.2	-23%
VoxConverse 0.0.2 evaluation set	21.5	12.8	-40%

Here is the (pseudo-)code used to obtain those numbers:

# v1.1
import torch
pipeline = torch.hub.load("pyannote/pyannote-audio", "dia")
diarization = pipeline({"audio": "audio.wav"})

# v2.0
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
diarization = pipeline("audio.wav")

# evaluation
from pyannote.metrics.diarization import DiarizationErrorRate
metric = DiarizationErrorRate(collar=0.0, skip_overlap=False)
for audio, reference in evaluation_set:  # pseudo-code
    diarization = pipeline(audio)
    _ = metric(reference, diarization)
der = abs(metric)

Support

For commercial enquiries and scientific consulting, please contact me.

Development

The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio library.

pip install -e .[dev,testing]
pre-commit install

Tests rely on a set of debugging files available in test/data directory. Set PYANNOTE_DATABASE_CONFIG environment variable to test/data/database.yml before running tests:

PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest

Name		Name	Last commit message	Last commit date
Latest commit History 2,068 Commits
.github		.github
doc		doc
notebook		notebook
pyannote		pyannote
tests		tests
tutorials		tutorials
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
environment.yaml		environment.yaml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural speaker diarization with `pyannote.audio`

TL;DR

What's new in `pyannote.audio` 2.0

Installation

Documentation

Benchmark

Support

Development

About

Releases

Packages

Languages

License

marianne-m/pyannote-audio

Folders and files

Latest commit

History

Repository files navigation

Neural speaker diarization with pyannote.audio

TL;DR

What's new in pyannote.audio 2.0

Installation

Documentation

Benchmark

Support

Development

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Neural speaker diarization with `pyannote.audio`

What's new in `pyannote.audio` 2.0

Packages