Preprocessing audio #630

JRMeyer · 2021-03-08T03:12:28Z

JRMeyer
Mar 8, 2021
Maintainer

>>> pete
[May 14, 2019, 8:09am]

Hello!

I am forced to use 8000hz / mono audio (Phone calls). I know DeepSpeech
works best 16000hz, so my questions goes:

Does Deepspeech (version DeepSpeech: v0.4.1-0-g0e40db6) upsample my
training material from 8000hz to 16000hz ? What about dev and test
material, does it also upsample those from 8000hz to 16000hz ?

Anyone studied bad labeled audio data, how it affects to results ? Of
course it will affect, but lets say my data is 51% labeled right, and
rest of it is gibberish or wrong words etc. do you think it might still
do the job if I have it enough and still over 50% or more are Ok ... ?

I am doing some semi-automatic labeling to audio, and this method is
producing those figures ... doing same job manually would be expensive
and -very- time consuming as you know...

[This is an archived TTS discussion thread from discourse.mozilla.org/t/preprocessing-audio]

JRMeyer · 2021-03-08T03:12:30Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[May 14, 2019, 12:27pm]

> Does Deepspeech (version DeepSpeech: v0.4.1-0-g0e40db6) upsample my
> training material from 8000hz to 16000hz ? What about dev and test
> material, does it also upsample those from 8000hz to 16000hz ?

What are we talking about here, training ? Inference ? If you're
training a new model from scratch, you can do it at 8kHz

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:12:33Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> pete
[May 14, 2019, 1:33pm]

Sorry, I was talking about training.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:12:36Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> lissyx
[May 14, 2019, 2:26pm]

Then you should be able to train
with those 8kHz data. I'm sure somebody already shared such feedback on
the forum. There may be adjustements to perform to a few
hyper-parameters but it should work.

Regarding labelling, I fear that having 49% of broken labelled data

do you have an opinion?

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:12:38Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> carlfm01
[May 15, 2019, 6:50am]

I will suggest to upsample the
data with correct labels and fine tune an existing model, then you
upsample and evaluate the whole set, you can remove sorting code and
save the results to a file. In this way you will see which audios are
scoring CER 0% and then you can fine tune again. The results depend on
your number of correct labels to start, let's say 5k+ 5s-7s is working
for me.

If you don't have a good amount of correct labeled data you can start
scoring the whole set with an existing model and sort by CER to use the
ones with CER close to 0%.

Please don't use any of your labels of the set that you are evaluating
to train a new LM when scoring the dataset.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:12:41Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> pete
[May 15, 2019, 6:39am]

Hello, and thanks for you reply!

So, I take my existing model which is about 80hrs of manually labeled
data BUT its from different 'domain' than the ones I am going to train.
Lets say that 80hrs is about boats and I am trying to train model about
cars.

So I use that model (made from 80hrs) to evaluate my semi-automatic
labeled data about cars and pick only those which are nearest to CER 0%
and use them on top of that 80hrs and repeat this process and hope to
get model which understand talk about cars ... just to put it simply.

[Archived Post]

0 replies

JRMeyer · 2021-03-08T03:12:43Z

JRMeyer
Mar 8, 2021
Maintainer Author

>>> carlfm01
[May 15, 2019, 7:06am]

Hello,

> 80hrs

If you are using English data try the first evaluation with an existing
trained model, if you are not using English I'm afraid that 80h may not
do the trick.

> those which are nearest to CER 0% and use them on top of that 80hrs
> and repeat this process

Yes, you can also play around and use only the ones that scored CER 0%.

I did something similar with data labeled by the windows speech
recognition, at some point the correct ones will start to pop out.

[Archived Post]

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing audio #630

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Preprocessing audio #630

JRMeyer Mar 8, 2021 Maintainer

Replies: 6 comments

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer Mar 8, 2021 Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author

JRMeyer
Mar 8, 2021
Maintainer Author