VAD workflow with Silero #1160

rilshok · 2023-09-24T18:59:28Z

Description of Changes

We have added integration of the Silero VAD into the Lhotse project. This integration enables the use of Silero VAD for voice activity detection in audio recordings within Lhotse RecordingSets. The solution analyzes each track of every recording and stores the results in the SupervisionSet.

In the current change, we have introduced the following:

A base class called ActivityDetector to allow for the addition of new activity detectors in the future.
A runner class named ActivityDetectionProcessor for parallel execution of activity detection on the RecordingSet.
Two classes, SileroVAD8k and SileroVAD16k, for the integration of Silero VAD.
A workflow named activity-detection for running activity detection on the RecordingSet.

You can find the Silero VAD model for this integration on the Silero VAD GitHub project.

Example usage in CLI

Prepare the model for work:

lhotse workflows activity-detection \
  --model-name silero-vad-16k \
  --chore

Run activity detection by Silero VAD:

lhotse workflows activity-detection \
  --model-name silero-vad-16k \
  --recordings-manifest data/librispeech_recordings_train-clean-5.jsonl.gz \
  --output-supervisions-manifest librispeech_recordings_train-clean-5.jsonl.gz \
  --jobs 2 \
  --device cpu

Loading recordings from data/librispeech_recordings_train-clean-5.jsonl.gz...
Making activity detection processor for 'silero-vad-16k'...
Running activity detection using 'silero-vad-16k'...
Using cache found in ~/.cache/torch/hub/snakers4_silero-vad_master
...
Detecting activities: 100%|████████████████| 1519/1519 [04:50<00:00,  5.22rec/s]
Saving 'silero-vad-16k' results ...
Results saved to:
.../librispeech_recordings_train-clean-5.jsonl.gz

Example usage in code

from lhotse.workflows.activity_detection.silero_vad import SileroVAD16k
from lhotse.audio import RecordingSet

vad = SileroVAD16k(device="cuda")

recordings = RecordingSet.from_file("data/librispeech_recordings_train-clean-5.jsonl.gz")
record = recordings[25]

vad(record)

[SupervisionSegment(id='6272-70171-0025-SileroVAD_16kHz-0-00000', recording_id='6272-70171-0025', start=0.194, duration=2.396, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None),
 SupervisionSegment(id='6272-70171-0025-SileroVAD_16kHz-0-00001', recording_id='6272-70171-0025', start=3.682, duration=1.02, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None),
 SupervisionSegment(id='6272-70171-0025-SileroVAD_16kHz-0-00002', recording_id='6272-70171-0025', start=4.994, duration=0.956, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None),
 SupervisionSegment(id='6272-70171-0025-SileroVAD_16kHz-0-00003', recording_id='6272-70171-0025', start=6.146, duration=2.652, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None),
 SupervisionSegment(id='6272-70171-0025-SileroVAD_16kHz-0-00004', recording_id='6272-70171-0025', start=9.122, duration=4.316, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None),
 SupervisionSegment(id='6272-70171-0025-SileroVAD_16kHz-0-00005', recording_id='6272-70171-0025', start=13.634, duration=3.006, channel=0, text=None, language=None, speaker=None, gender=None, custom=None, alignment=None)]

Related Issues

Related tasks and links to discussions related to this integration:

#1041 - Add Silero VAD integration

desh2608

Thanks for this contribution! Left some minor comments.

Out of curiosity, do you have any speed benchmarks? For example, if I were to run the VAD on a 1 hour recording with 1 GPU, how long would it take?

lhotse/bin/modes/workflows.py

lhotse/workflows/activity_detection/README.md

lhotse/workflows/activity_detection/base.py

csukuangfj · 2023-09-24T23:13:24Z

Thanks for this contribution! Left some minor comments.

Out of curiosity, do you have any speed benchmarks? For example, if I were to run the VAD on a 1 hour recording with 1 GPU, how long would it take?

I think it is faster on CPU for silero VAD.

Silero VAD uses LSTM and you need to process the file sequentially.

rilshok · 2023-09-25T08:42:40Z

I haven't made precise performance measurements. Silero VAD authors claim that processing one audio fragment (30+ ms) takes less than 1 ms on a single CPU thread. My own practice confirms this performance. However, it is worth noting that this value may not be correct for a single thread. When processing in parallel on a CPU with two processes (with two model states), Torch fully loads my CPU, and adding more workers does not bring a significant performance gain. On my inexpensive GPU, however, parallel processing resulted in about a 3x performance increase when running multiple processes. I would be grateful for contributions to document the performance of this model.

pzelasko

Thanks for your contribution, it looks great! Can you fix that doc comment before we merge?

lhotse/workflows/activity_detection/README.md

lhotse/workflows/activity_detection/base.py

rilshok · 2023-09-26T14:02:46Z

The tests fail on an older version of torch because of the trust_repo argument passed to trust.hub. I'm working on it.

pzelasko

LGTM

…to feature/distill-cutset

rilshok · 2023-09-27T16:46:30Z

Do we have a plan how to solve the test problem in py 3.11 environment? The problem seems to come from outside and is not related to my changes.

rilshok · 2023-09-27T16:47:35Z

I tried to reproduce the problem with this test, unfortunately it does not reproduce for me.

lhotse/bin/modes/workflows.py

lhotse/workflows/activity_detection/README.md

* initialise the script for activity detection * init lhotse.workflows.activity_distillation module * add the Silero VAD model wrapper * inherit SileroVAD from ActivityDetector * pass parameters to the model explicitly * process each channel and return the supervision * parallel processing by activity detector * number the segments found * make abstract processing of an individual track * rename module and workflow to activity_detection * standardise detectors by sampling rate * rename silero vad models * implement a script for supervisory with silero-vad * handle exceptions and user input * allow the path to the output dir * reset the cached state of the model if necessary * add docs for activity_detection module * fix if dir does not exist * fix cuda issue * add RecordingSet python example * add base test for silero vad workflow * add test for silero vad in parallel * change detector name * replace the chore option with force_download * improve user experience * add simple test for activity_detection workflow * clarify the need to use --force_download * rm slash * trust the repository since torch>=1.12 * skip tests if torch version <1.12 * exclude non-coverage eligible code * change the behaviour of the force_download option --------- Co-authored-by: Piotr Żelasko <[email protected]>

rilshok added 20 commits September 23, 2023 12:41

initialise the script for activity detection

8636a80

init lhotse.workflows.activity_distillation module

6c50efe

add the Silero VAD model wrapper

9fb33f2

inherit SileroVAD from ActivityDetector

0419467

pass parameters to the model explicitly

bbc5d09

process each channel and return the supervision

7011d06

parallel processing by activity detector

ae29304

number the segments found

c686432

make abstract processing of an individual track

8ad86be

rename module and workflow to activity_detection

c42fafe

standardise detectors by sampling rate

e2db456

rename silero vad models

9117d98

implement a script for supervisory with silero-vad

bd279ae

handle exceptions and user input

9279bd9

allow the path to the output dir

3cf621e

reset the cached state of the model if necessary

1a2acdc

add docs for activity_detection module

3638360

fix if dir does not exist

1d7cf8c

fix cuda issue

2d9e258

add RecordingSet python example

7ce7f32

desh2608 reviewed Sep 24, 2023

View reviewed changes

lhotse/bin/modes/workflows.py Outdated Show resolved Hide resolved

lhotse/workflows/activity_detection/README.md Show resolved Hide resolved

lhotse/workflows/activity_detection/base.py Show resolved Hide resolved

rilshok added 6 commits September 25, 2023 13:23

add base test for silero vad workflow

b335187

add test for silero vad in parallel

819e020

change detector name

7bd2c69

replace the chore option with force_download

729f279

improve user experience

1d33460

add simple test for activity_detection workflow

80163fe

pzelasko previously approved these changes Sep 26, 2023

View reviewed changes

lhotse/workflows/activity_detection/README.md Outdated Show resolved Hide resolved

lhotse/workflows/activity_detection/base.py Show resolved Hide resolved

lhotse/workflows/activity_detection/base.py Show resolved Hide resolved

clarify the need to use --force_download

1bc6d6e

rilshok dismissed pzelasko’s stale review via 1bc6d6e September 26, 2023 18:25

rilshok added 2 commits September 26, 2023 22:36

rm slash

c04558e

trust the repository since torch>=1.12

41a9085

pzelasko previously approved these changes Sep 26, 2023

View reviewed changes

pzelasko enabled auto-merge (squash) September 26, 2023 19:38

desh2608 added this to the v1.17 milestone Sep 26, 2023

skip tests if torch version <1.12

5544c25

auto-merge was automatically disabled September 26, 2023 19:56
Head branch was pushed to by a user without write access

rilshok dismissed pzelasko’s stale review via 5544c25 September 26, 2023 19:56

Merge branch 'master' into feature/distill-cutset

4a342d2

pzelasko enabled auto-merge (squash) September 27, 2023 01:19

pzelasko previously approved these changes Sep 27, 2023

View reviewed changes

rilshok added 2 commits September 27, 2023 09:46

exclude non-coverage eligible code

03a17ba

Merge branch 'feature/distill-cutset' of github.com:rilshok/lhotse in…

0bb19d9

…to feature/distill-cutset

auto-merge was automatically disabled September 27, 2023 05:47
Head branch was pushed to by a user without write access

rilshok dismissed pzelasko’s stale review via 0bb19d9 September 27, 2023 05:47

desh2608 reviewed Sep 27, 2023

View reviewed changes

lhotse/bin/modes/workflows.py Outdated Show resolved Hide resolved

lhotse/workflows/activity_detection/README.md Outdated Show resolved Hide resolved

change the behaviour of the force_download option

ed71b42

pzelasko approved these changes Sep 28, 2023

View reviewed changes

pzelasko enabled auto-merge (squash) September 28, 2023 18:11

pzelasko merged commit b138baf into lhotse-speech:master Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAD workflow with Silero #1160

VAD workflow with Silero #1160

rilshok commented Sep 24, 2023

desh2608 left a comment

csukuangfj commented Sep 24, 2023

rilshok commented Sep 25, 2023 •

edited

Loading

pzelasko left a comment

rilshok commented Sep 26, 2023

pzelasko left a comment

rilshok commented Sep 27, 2023

rilshok commented Sep 27, 2023

VAD workflow with Silero #1160

VAD workflow with Silero #1160

Conversation

rilshok commented Sep 24, 2023

Description of Changes

Example usage in CLI

Example usage in code

Related Issues

desh2608 left a comment

Choose a reason for hiding this comment

csukuangfj commented Sep 24, 2023

rilshok commented Sep 25, 2023 • edited Loading

pzelasko left a comment

Choose a reason for hiding this comment

rilshok commented Sep 26, 2023

pzelasko left a comment

Choose a reason for hiding this comment

rilshok commented Sep 27, 2023

rilshok commented Sep 27, 2023

rilshok commented Sep 25, 2023 •

edited

Loading