Skip to content

Conversation

@vimalmanohar
Copy link
Contributor

Adding the scripts used during MGB-3 challenge. This could be tested by someone needing this for e2e training.

@aarora8
Copy link
Contributor

aarora8 commented Jul 27, 2018

Thanks, Is it related to or can it be used for cleaning corrupt training data.

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Jul 27, 2018 via email

@aarora8
Copy link
Contributor

aarora8 commented Jul 27, 2018

Ok, thank you. I will test it with Yomdle datasets.

@danpovey
Copy link
Contributor

danpovey commented Jul 27, 2018 via email


# Copyright 2017 Vimal Manohar
# Apache 2.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vimalmanohar, this VAD and its use when extracting i-vectors... what scenario was this helpful for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful for segmenting long utterances when there is a lot of silence in the recordings. I have not tried using without this VAD. But it needs to be tested.

@danpovey
Copy link
Contributor

danpovey commented Aug 21, 2018 via email

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Aug 21, 2018 via email

@danpovey
Copy link
Contributor

@david-ryan-snyder, what do you think about the VAD-related part of this?

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Aug 22, 2018

If I understand this PR correctly, a) a SAD system is used to segment long recordings, b) I-vectors are included in the input and c) because of a concern about the effect of nonspeech in the i-vectors, a frame-level weighting of the i-vector stats is introduced. (@vimalmanohar, please correct me if I misunderstood what's happening here...)

In my view, SAD should be a lightweight system. If it works adequately, I'd rather see a simple SAD model that just uses MFCCs and and nothing else. Adding in a component (i-vectors) which in turn requires an energy SAD seems a bit heavy weight to me.

Since @danpovey asked my opinion on this, I'd be interested to know the following:

  • Has anyone looked at the performance of the SAD system without i-vectors (or simple alternatives like pooling layer in the DNN)? I think it makes the most sense to measure the downstream performance on an application like ASR (rather than trying to measure the SAD system directly with something like DER). If you've determined that there's not much difference, you can do away with the i-vector subsystem, and eliminate the need for so many code changes.

  • If you've determined that the benefit from including i-vectors in the SAD system is large enough to outweigh the added complexity and code changes, have you performed any experiment to determine if the frame-level weights are necessary? If there's not much difference in performance, I would again prefer the simpler system. It also removes some lines of code from this PR.

fi


echo "Created VAD output for $name"
Copy link
Contributor

@david-ryan-snyder david-ryan-snyder Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vimalmanohar
Copy link
Contributor Author

vimalmanohar commented Aug 22, 2018 via email

@david-ryan-snyder
Copy link
Contributor

@vimalmanohar, thanks for the explanation. Yes, I misunderstood what the PR is doing, but I think I see now.

@danpovey
Copy link
Contributor

danpovey commented Aug 22, 2018 via email

@david-ryan-snyder
Copy link
Contributor

In that case, I think using the energy SAD for this purpose is reasonable, and the corresponding changes to ivector-extract-online2.cc are also reasonable.

@danpovey danpovey merged commit ed74857 into kaldi-asr:master Aug 24, 2018
dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018
…asr#2581)

This came from Vimal's work on the MGB-3 challenge.  Interface is similar to the existing GMM-based cleanup/segmentation scripts.
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
…asr#2581)

This came from Vimal's work on the MGB-3 challenge.  Interface is similar to the existing GMM-based cleanup/segmentation scripts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants