Cleanup and segment long utterances using nnet3 #2581

vimalmanohar · 2018-07-27T16:56:02Z

Adding the scripts used during MGB-3 challenge. This could be tested by someone needing this for e2e training.

aarora8 · 2018-07-27T17:33:27Z

Thanks, Is it related to or can it be used for cleaning corrupt training data.

vimalmanohar · 2018-07-27T17:39:43Z

Yes, it would be greate if you can test it.

On Fri, Jul 27, 2018 at 1:33 PM Ashish Arora ***@***.***> wrote: Thanks, Is it related to or can it be used for cleaning corrupt training data. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2581 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATVxCam7nEMDXzEPK9F0o4BhkQJ2t4ks5uK072gaJpZM4Vj-7w> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

aarora8 · 2018-07-27T17:40:29Z

Ok, thank you. I will test it with Yomdle datasets.

danpovey · 2018-07-27T17:54:42Z

Did this perform better than the existing data-cleanup script based on GMMs? I can see that, regardless, it could be useful in situations where GMMs do not work, like OCR.

…

On Fri, Jul 27, 2018 at 10:40 AM, Ashish Arora ***@***.***> wrote: Ok, thank you. I will test it with Yomdle datasets. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2581 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu_rUfX9OK_bWHlvT2VnSy5GesGY-ks5uK1CTgaJpZM4Vj-7w> .

danpovey · 2018-08-21T04:23:06Z

egs/wsj/s5/steps/compute_vad_decision.sh

+
+# Copyright    2017  Vimal Manohar
+# Apache 2.0
+


@vimalmanohar, this VAD and its use when extracting i-vectors... what scenario was this helpful for?

It might be useful for segmenting long utterances when there is a lot of silence in the recordings. I have not tried using without this VAD. But it needs to be tested.

danpovey · 2018-08-21T18:23:33Z

This only affects the i-vectors, right?

…

On Mon, Aug 20, 2018 at 11:51 PM Vimal Manohar ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/wsj/s5/steps/compute_vad_decision.sh <#2581 (comment)>: > @@ -0,0 +1,86 @@ +#!/bin/bash + +# Copyright 2017 Vimal Manohar +# Apache 2.0 + It might be useful for segmenting long utterances when there is a lot of silence in the recordings. I have not tried using without this VAD. But it needs to be tested. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2581 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu39ZqLykxSWkW3TX8zliesDrh5O2ks5uS64LgaJpZM4Vj-7w> .

vimalmanohar · 2018-08-21T18:52:42Z

Yes, only i-vectors. On Tue, Aug 21, 2018 at 2:23 PM Daniel Povey <[email protected]> wrote:

This only affects the i-vectors, right? On Mon, Aug 20, 2018 at 11:51 PM Vimal Manohar ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In egs/wsj/s5/steps/compute_vad_decision.sh > <#2581 (comment)>: > > > @@ -0,0 +1,86 @@ > +#!/bin/bash > + > +# Copyright 2017 Vimal Manohar > +# Apache 2.0 > + > > It might be useful for segmenting long utterances when there is a lot of > silence in the recordings. I have not tried using without this VAD. But it > needs to be tested. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#2581 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/ADJVu39ZqLykxSWkW3TX8zliesDrh5O2ks5uS64LgaJpZM4Vj-7w > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2581 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV6CQ-4svTTBg91_OYOcjKhvPMJ1Vks5uTFA4gaJpZM4Vj-7w> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

danpovey · 2018-08-21T22:46:06Z

@david-ryan-snyder, what do you think about the VAD-related part of this?

david-ryan-snyder · 2018-08-22T17:24:35Z

If I understand this PR correctly, a) a SAD system is used to segment long recordings, b) I-vectors are included in the input and c) because of a concern about the effect of nonspeech in the i-vectors, a frame-level weighting of the i-vector stats is introduced. (@vimalmanohar, please correct me if I misunderstood what's happening here...)

In my view, SAD should be a lightweight system. If it works adequately, I'd rather see a simple SAD model that just uses MFCCs and and nothing else. Adding in a component (i-vectors) which in turn requires an energy SAD seems a bit heavy weight to me.

Since @danpovey asked my opinion on this, I'd be interested to know the following:

Has anyone looked at the performance of the SAD system without i-vectors (or simple alternatives like pooling layer in the DNN)? I think it makes the most sense to measure the downstream performance on an application like ASR (rather than trying to measure the SAD system directly with something like DER). If you've determined that there's not much difference, you can do away with the i-vector subsystem, and eliminate the need for so many code changes.
If you've determined that the benefit from including i-vectors in the SAD system is large enough to outweigh the added complexity and code changes, have you performed any experiment to determine if the frame-level weights are necessary? If there's not much difference in performance, I would again prefer the simpler system. It also removes some lines of code from this PR.

david-ryan-snyder · 2018-08-22T17:38:22Z

egs/wsj/s5/steps/compute_vad_decision.sh

+fi
+
+
+echo "Created VAD output for $name"


Does this script do the same thing that https://github.com/kaldi-asr/kaldi/blob/master/egs/sre08/v1/sid/compute_vad_decision.sh does? Why not use that one?

vimalmanohar · 2018-08-22T17:47:59Z

I think there is a misunderstanding. This is for segmenting recordings using the transcript based on decoding using chain model, which already uses i-vectors (trained without silence by default). The recordings might have long silences or noise in them, and there is no SAD system for pre-processing. I am using 30s uniform segmentation similar to the one in Aspire. In Aspire, there was a two-stage decoding done with a first stage decoding to get the frame weights for i-vector extraction and this was very important for good performance. But, here I am just adding the energy VAD to alleviate the mismatch problem. However, it would be better to add a separate SAD system as a pre-processor to segmenting based on transcripts. This is an extra top-level addition, which can be done to both the recipe for segmenting long utterances using GMM as well as the one in this PR that uses chain model. This would involve modification to a run-level script that first creates segments using SAD, and then skip the uniform segmentation stage. I will add the run-level script doing this. It might be the same as https://github.com/kaldi-asr/kaldi/blob/master/egs/sre08/v1/sid/compute_vad_decision.sh. I can move the script in sid to steps and create a soft link there.

On Wed, Aug 22, 2018 at 1:25 PM David Snyder ***@***.***> wrote: If I understand this PR correctly, a) a SAD system is used to segment long recordings, b) I-vectors are included in the input and c) because of a concern about the effect of nonspeech in the i-vectors, a frame-level weighting of the i-vector stats is introduced. In my view, SAD should be a lightweight system. If it works adequately, I'd rather see a simple SAD model that just uses MFCCs and and nothing else. Adding in a component (i-vectors) which in turn requires an energy SAD seems a bit heavy weight to me. Since @danpovey <https://github.com/danpovey> asked my opinion on this, I'd be interested to know the following: - Has anyone looked at the performance of the SAD system without i-vectors (or simple alternatives like pooling layer in the DNN)? I think it makes the most sense to measure the downstream performance on an application like ASR (rather than something like DER). If you've determined that there's not much difference, you can do away with the i-vector subsystem, and eliminate the need for so many code changes. - If you've determined that the benefit from including i-vectors in the SAD system is large enough to outweigh the added complexity and code changes, have you performed any experiment to determine if the frame-level weights are necessary? If there's not much difference in performance, I would again prefer the simpler system. It also removes some lines of code from this PR. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2581 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEATV53sgKqEBx15RC1Roz17-ZFHWjweks5uTZPzgaJpZM4Vj-7w> .

-- Vimal Manohar PhD Student Electrical & Computer Engineering Johns Hopkins University

david-ryan-snyder · 2018-08-22T18:14:04Z

@vimalmanohar, thanks for the explanation. Yes, I misunderstood what the PR is doing, but I think I see now.

danpovey · 2018-08-22T18:20:24Z

Vimal, please put comments at the top of any example scripts stating clearly what the scenario is, so it's clear to others as well.

…

On Wed, Aug 22, 2018 at 11:14 AM David Snyder ***@***.***> wrote: @vimalmanohar <https://github.com/vimalmanohar>, thanks for the explanation. Yes, I misunderstood what the PR is doing, but I think I see now. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2581 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu4ySoSIK41xUWHGG8otz4rmss7QAks5uTZ9zgaJpZM4Vj-7w> .

david-ryan-snyder · 2018-08-22T18:28:26Z

In that case, I think using the energy SAD for this purpose is reasonable, and the corresponding changes to ivector-extract-online2.cc are also reasonable.

…asr#2581) This came from Vimal's work on the MGB-3 challenge. Interface is similar to the existing GMM-based cleanup/segmentation scripts.

vimalmanohar added 6 commits July 26, 2018 11:57

Adding some scripts from mgb3-clean

6c88ba1

Minor changes

f92694b

Segment long-utterances using chain

e5b3719

Add weights for ivectors

dbcdf1d

Adding chain cleanup

cb4c3ca

Deferring decode-faster update for later

9bc2985

Adding clean_and_segment_data_nnet3

91d71d6

danpovey reviewed Aug 21, 2018

View reviewed changes

david-ryan-snyder reviewed Aug 22, 2018

View reviewed changes

Adding some better comments

c4d7750

danpovey merged commit ed74857 into kaldi-asr:master Aug 24, 2018

Cleanup and segment long utterances using nnet3 #2581

Cleanup and segment long utterances using nnet3 #2581

Uh oh!

Conversation

vimalmanohar commented Jul 27, 2018

Uh oh!

aarora8 commented Jul 27, 2018

Uh oh!

vimalmanohar commented Jul 27, 2018 via email

Uh oh!

aarora8 commented Jul 27, 2018

Uh oh!

danpovey commented Jul 27, 2018 via email

Uh oh!

danpovey Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

vimalmanohar Aug 21, 2018

Choose a reason for hiding this comment

Uh oh!

danpovey commented Aug 21, 2018 via email

Uh oh!

vimalmanohar commented Aug 21, 2018 via email

Uh oh!

danpovey commented Aug 21, 2018

Uh oh!

david-ryan-snyder commented Aug 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-ryan-snyder Aug 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vimalmanohar commented Aug 22, 2018 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-ryan-snyder commented Aug 22, 2018

Uh oh!

danpovey commented Aug 22, 2018 via email

Uh oh!

david-ryan-snyder commented Aug 22, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

david-ryan-snyder commented Aug 22, 2018 •

edited

Loading

david-ryan-snyder Aug 22, 2018 •

edited

Loading

vimalmanohar commented Aug 22, 2018 via email •

edited

Loading