pyannote wrapper for lhotse #883

hbredin · 2022-11-08T15:10:07Z

hbredin
Nov 8, 2022

Thanks for this great contribution to the community!

@FrenchKrab and I are considering a (progressive) switch from pyannote.database to lhotse for training pyannote.audio models and speaker diarization pipelines.

We will most likely start by writing a wrapper around lhotse manifests to turn them into pyannote.database protocols (more precisely pyannote.database.protocol.SpeakerDiarizationProtocol instances) and see how it goes.

We would also be happy to contribute more datasets if the wrapper experiment goes well.

But, before jumping in, I wanted to ask you about your plans regarding lhotse and speaker diarization. As of today, it seems (but I'd be more than happy to be proven wrong here) that speaker diarization datasets and API are not the main priority of lhotse.

pzelasko · 2022-11-08T18:37:32Z

pzelasko
Nov 8, 2022
Maintainer

Thanks @hbredin! I'm humbled as pyannote is an amazing project that I regard very highly. I'd be happy to support you in this process.

For future plans for speaker diarization I'll mostly defer to @desh2608 as he has been contributing various recipes and features in lhotse to support that. That said, if you have any suggestions for features that would better support your use-case, I'm happy to help -- depending their complexity and on my own availability, I can either suggest a few helpful pointers or code them up.

2 replies

hbredin Nov 14, 2022
Author

One thing that I would really use is combineing multiple datasets: basically the equivalent of pyannote.database meta protocols.

For instance pyannote/segmentation has been trained based on a combination of the training sets of 5 datasets (AISHELL-4, AMI, DIHARD3, REPERE, and VoxConverse).

from pyannote.database import get_protocol
protocol = get_protocol('X.SpeakerDiarization.AISHELL+AMI+DIHARD+REPERE+VoxConverse')
for training_file in protocol.train():
    pass

desh2608 Nov 14, 2022
Collaborator

Yes, combining different datasets is extremely easy in Lhotse. Once you have created manifests (and CutSets) for the different corpora, all you need to do is: cs = cs1 + cs2 + .... There are some ASR recipes in icefall which use this for ASR training.

@Gnosil is, in fact, working on training a "universal" VAD and overlap detector by combining several (~10) datasets, and the data pipeline is all built with Lhotse. We hope to release the code and models soon.

desh2608 · 2022-11-08T19:37:36Z

desh2608
Nov 8, 2022
Collaborator

@hbredin Yeah, I don't think we have many (or any?) diarization datasets or API at the moment. What we do have is a bunch of recipes for diarization related corpora such as DIHARD, AMI, etc. so I hope it would be straightforward to create Lhotse manifests from those. The wrapper you suggest could then easily convert the manifests to pyannote.database protocols. I imagine this would be similar to how we have functionalities to import/export Kaldi data directories.

3 replies

hbredin Nov 14, 2022
Author

I am a bit confused. Existing AMI supervisions do contain speaker labels (and I guess this is also true for DIHARD). Therefore, what do you mean by "I don't think we have [...] diarization datasets"?. I might not be familiar enough with lhotse concepts...

desh2608 Nov 14, 2022
Collaborator

Sorry, I should have clarified more. Lhotse dataset is similar to PyTorch's dataset (and is in fact, derived from that): see this. "Recipes", on the other hand, provide a way to represent existing corpora as standardized manifests (think Kaldi-style data dirs).

A standard training pipeline for, say, EEND on AMI, would involve:

Create manifests using the AMI recipe.
Create dataset (see above), sampler, and dataloader.
Train.

hbredin Nov 16, 2022
Author

I see. Thanks for the explanation.

For now, I think we are going to stick with current pyannote code for steps 2 and 3.
However, we might end up using lhotse for step 1. I'll update this discussion when/if we do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pyannote wrapper for lhotse #883

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

pyannote wrapper for lhotse #883

Uh oh!

hbredin Nov 8, 2022

Replies: 2 comments · 5 replies

Uh oh!

pzelasko Nov 8, 2022 Maintainer

Uh oh!

Uh oh!

hbredin Nov 14, 2022 Author

Uh oh!

desh2608 Nov 14, 2022 Collaborator

Uh oh!

desh2608 Nov 8, 2022 Collaborator

Uh oh!

hbredin Nov 14, 2022 Author

Uh oh!

desh2608 Nov 14, 2022 Collaborator

Uh oh!

hbredin Nov 16, 2022 Author

hbredin
Nov 8, 2022

Replies: 2 comments 5 replies

pzelasko
Nov 8, 2022
Maintainer

hbredin Nov 14, 2022
Author

desh2608 Nov 14, 2022
Collaborator

desh2608
Nov 8, 2022
Collaborator

hbredin Nov 14, 2022
Author

desh2608 Nov 14, 2022
Collaborator

hbredin Nov 16, 2022
Author