-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VAD workflow with Silero #1160
VAD workflow with Silero #1160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this contribution! Left some minor comments.
Out of curiosity, do you have any speed benchmarks? For example, if I were to run the VAD on a 1 hour recording with 1 GPU, how long would it take?
I think it is faster on CPU for silero VAD. Silero VAD uses LSTM and you need to process the file sequentially. |
I haven't made precise performance measurements. Silero VAD authors claim that processing one audio fragment (30+ ms) takes less than 1 ms on a single CPU thread. My own practice confirms this performance. However, it is worth noting that this value may not be correct for a single thread. When processing in parallel on a CPU with two processes (with two model states), Torch fully loads my CPU, and adding more workers does not bring a significant performance gain. On my inexpensive GPU, however, parallel processing resulted in about a 3x performance increase when running multiple processes. I would be grateful for contributions to document the performance of this model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution, it looks great! Can you fix that doc comment before we merge?
The tests fail on an older version of torch because of the trust_repo argument passed to trust.hub. I'm working on it. |
Head branch was pushed to by a user without write access
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Head branch was pushed to by a user without write access
Do we have a plan how to solve the test problem in py 3.11 environment? The problem seems to come from outside and is not related to my changes. |
I tried to reproduce the problem with this test, unfortunately it does not reproduce for me. |
* initialise the script for activity detection * init lhotse.workflows.activity_distillation module * add the Silero VAD model wrapper * inherit SileroVAD from ActivityDetector * pass parameters to the model explicitly * process each channel and return the supervision * parallel processing by activity detector * number the segments found * make abstract processing of an individual track * rename module and workflow to activity_detection * standardise detectors by sampling rate * rename silero vad models * implement a script for supervisory with silero-vad * handle exceptions and user input * allow the path to the output dir * reset the cached state of the model if necessary * add docs for activity_detection module * fix if dir does not exist * fix cuda issue * add RecordingSet python example * add base test for silero vad workflow * add test for silero vad in parallel * change detector name * replace the chore option with force_download * improve user experience * add simple test for activity_detection workflow * clarify the need to use --force_download * rm slash * trust the repository since torch>=1.12 * skip tests if torch version <1.12 * exclude non-coverage eligible code * change the behaviour of the force_download option --------- Co-authored-by: Piotr Żelasko <[email protected]>
Description of Changes
We have added integration of the Silero VAD into the Lhotse project. This integration enables the use of Silero VAD for voice activity detection in audio recordings within Lhotse RecordingSets. The solution analyzes each track of every recording and stores the results in the SupervisionSet.
In the current change, we have introduced the following:
ActivityDetector
to allow for the addition of new activity detectors in the future.ActivityDetectionProcessor
for parallel execution of activity detection on the RecordingSet.SileroVAD8k
andSileroVAD16k
, for the integration of Silero VAD.activity-detection
for running activity detection on the RecordingSet.You can find the Silero VAD model for this integration on the Silero VAD GitHub project.
Example usage in CLI
Prepare the model for work:
Run activity detection by Silero VAD:
Example usage in code
Related Issues
Related tasks and links to discussions related to this integration: