pyannote.audio
is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.
# instantiate pretrained speaker diarization pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
# apply pretrained pipeline
diarization = pipeline("audio.wav")
# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_A
# start=1.8s stop=3.9s speaker_B
# start=4.2s stop=5.7s speaker_A
# ...
For version 2.0 of pyannote.audio
, I decided to rewrite almost everything from scratch.
Highlights of this release are:
- 🤯 much better performance (see Benchmark)
- 🐍 Python-first API
- 🤗 pretrained pipelines (and models) on 🤗 model hub
- ⚡ multi-GPU training with pytorch-lightning
- 🎛️ data augmentation with torch-audiomentations
Only Python 3.8+ is officially supported (though it might work with Python 3.7)
conda create -n pyannote python=3.8
conda activate pyannote
conda install pytorch torchaudio -c pytorch
pip install https://github.com/pyannote/pyannote-audio/archive/develop.zip
- Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
- Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Training a pipeline
- Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
- Miscellaneous
- Training with
pyannote-audio-train
command line tool - Speaker verification
- Visualization and debugging
- Training with
The pretrained speaker diarization pipeline with default parameters is expected to be much better in v2.0 than in v1.1:
Diarization error rate (%) | v1.1 | v2.0 | ∆DER |
---|---|---|---|
AMI only_words evaluation set |
29.7 | 21.5 | -28% |
DIHARD 3 evaluation set | 29.2 | 22.2 | -23% |
VoxConverse 0.0.2 evaluation set | 21.5 | 12.8 | -40% |
Here is the (pseudo-)code used to obtain those numbers:
# v1.1
import torch
pipeline = torch.hub.load("pyannote/pyannote-audio", "dia")
diarization = pipeline({"audio": "audio.wav"})
# v2.0
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
diarization = pipeline("audio.wav")
# evaluation
from pyannote.metrics.diarization import DiarizationErrorRate
metric = DiarizationErrorRate(collar=0.0, skip_overlap=False)
for audio, reference in evaluation_set: # pseudo-code
diarization = pipeline(audio)
_ = metric(reference, diarization)
der = abs(metric)
For commercial enquiries and scientific consulting, please contact me.
The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio
library.
pip install -e .[dev,testing]
pre-commit install
Tests rely on a set of debugging files available in test/data
directory.
Set PYANNOTE_DATABASE_CONFIG
environment variable to test/data/database.yml
before running tests:
PYANNOTE_DATABASE_CONFIG=tests/data/database.yml pytest