Augmentation recipe for swbd #1112

tomkocse · 2016-10-12T09:03:40Z

No description provided.

francisr · 2016-10-12T09:53:46Z

Does it bring any improvement to use speed and volume perturbation in addition to adding noises?
EDIT: I just realised that you are indeed using the speed perturbed data.

danpovey · 2016-10-12T18:24:03Z

Tom-- it would be good if you give us a little background on this,
including how the results compare to un-corrupted and speech-perturbed.
It's surprising if you're not combining it with speed perturbation.

On Wed, Oct 12, 2016 at 5:53 AM, Rémi Francis [email protected]
wrote:

Does it bring any improvement to use speed and volume perturbation in
addition to adding noises?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu7JpBX3muhUnkyXQSzGU9_fv98HAks5qzK4tgaJpZM4KUhmK
.

tomkocse · 2016-10-13T01:43:33Z

@francisr Yes the improvement brought by reverberation is additive to speed and volume perturbation.
@danpovey I have discussed this with Vijay and I perceived that this would better be a separated recipe.
Or should i merge them to the current recipe ?

danpovey · 2016-10-16T18:58:03Z

egs/swbd/s5c/RESULTS

 %WER 11.6 | 1831 21395 | 89.7 7.0 3.3 1.4 11.6 47.0 | exp/chain/tdnn_7d_sp/decode_eval2000_sw1_tg/score_10_1.0/eval2000_hires.ctm.swbd.filt.sys

+
+# results with chain TDNNs (2 epoch training on data reverberated with room impulse responses) (see local/chain/multi_condition/run_tdnn_7b.sh)


can you clarify at this point how many repetitions of the data there were, and whether or not it included speed perturbation?

danpovey · 2016-10-16T19:00:13Z

egs/swbd/s5c/local/chain/multi_condition/run_tdnn_7b.sh

+stage=1
+train_stage=-10
+get_egs_stage=-10
+speed_perturb=true


OK, I see that by default this uses the speed-perturbed data as input. you might want to have a comment at the top that clarifies this.

danpovey · 2016-10-16T20:17:40Z

egs/swbd/s5c/local/nnet3/multi_condition/copy_ali_dir.sh

@@ -0,0 +1,78 @@
+#!/bin/bash
+
+# Copyright 2014  Johns Hopkins University (author: Vijayaditya Peddinti)


I don't think it will be necessary to add this script once you make those other changes.
I'll talk to David separately to see if we can come up with a roadmap to make it unnecessary to have alignments, for purposes of training the speaker-id systems... this is very inconvenient. I don't want to have a situation where the need for these alignments is making the scripts super complicated [see my other, long, comment].

danpovey · 2016-10-16T20:17:48Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+    --num-replications $num_data_reps \
+    --max-noises-per-minute 1 \
+    --source-sampling-rate 8000 \
+    data/${clean_data_dir} data/${train_set}


Some more comments after looking at the script in more detail:
If we want to use this script as a starting point for how to do data reverberation it's going to be quite hard.

For a start, it has a very messy dependency on the regular run_ivector_common.sh, because it assumes that the train_nodup_sp directory already exists. There are scenarios where we'd want to run the reverberation stuff from scratch without running the regular run_ivector_common.sh. It would be better to have this script do both the speed perturbing and the RIR stuff; to avoid overwriting features that might have been already written by run_ivector_common.sh, you could put that step first so the --stage parameter can be used to avoid redoing stuff, and have the script die if the feats.scp exists in the _sp directory (and would be overwritten if the script were to run).
To keep things simple, you can just make the speed-perturbing non-optional. I don't think anyone will want to do this without speed perturbing; if you want to do so for your own experiments you can do it locally.

Also, this script seems to have a very detailed dependency on the Switchboard setup, because of the way you prepare the data subset for the lda_mllt stage, and the way you re-use the alignments.
The LDA+MLLT transform is extremely non-critical, because it only affects the diagonal GMM which is just used to initialize the full GMM and to pre-prune when getting GMM posteriors. And the number of parameters is tiny. You could use a very small data subset for that, and it would be best to eliminate all of the detailed dependency on the Switchboard setup. What I think would make more sense is, after you create the _mix data, to select a fairly small subset of it (with a num-utts that can be specified on the command line, like 20k), and to dump parallel features with regular MFCC and _hires, and then use a supplied model [which should be defined at the top of the script, and not specified in the body of the script], to get the alignments so you can train a system on the _hires features.

Also, with regard to the way you're mixing it with the un-perturbed data (_mix)...
is there a reason why it's better to mix it manually like that, instead of specifying (say) a number less than 1 for the following:
--speech-rvb-probability 1
--pointsource-noise-addition-probability 1
--isotropic-noise-addition-probability 1
and maybe using 2 replications instead of 1? It seems to me that that would be more elegant and simpler, if the results were the same.

danpovey · 2016-10-16T20:20:42Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+    unzip rirs_noises.zip
+  fi
+
+  # corrupt the data to generate reverberated data 


please clarify via a comment that this doesn't do any real computation, it just writes commands in the wav.scp file.

danpovey · 2016-10-16T20:31:11Z

egs/swbd/s5c/RESULTS

+# results with chain TDNNs (2 epoch training on data reverberated with room impulse responses) (see local/chain/multi_condition/run_tdnn_7b.sh)
+%WER 10.0 | 1831 21395 | 91.0 6.0 3.0 1.1 10.0 43.8 | exp/chain/tdnn_7b_sp_rvb1_mix/decode_eval2000_sw1_fsh_fg/score_10_0.5/eval2000_hires.ctm.swbd.filt.sys
+%WER 20.0 | 2628 21594 | 82.1 11.7 6.2 2.1 20.0 55.6 | exp/chain/tdnn_7b_sp_rvb1_mix/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.callhm.filt.sys
+%WER 15.0 | 4459 42989 | 86.5 8.8 4.7 1.6 15.0 50.7 | exp/chain/tdnn_7b_sp_rvb1_mix/decode_eval2000_sw1_fsh_fg/score_10_0.5/eval2000_hires.ctm.filt.sys


It would be easier to parse these results in context if you make it compatible with the numbers just above and below, by putting the whole-test-set number (15.0) first, the 10.0 number second, and removing the 20.0 number [people rarely quote the callhome-only subset... I don't mind normally but this is just for consistency with the numbers above and below]

tomkocse · 2016-10-21T04:49:28Z

@danpovey Regarding your suggestion of using a smaller value for --speech-rvb-probability, let say 0.5,
instead of mixing the perturbed data with the un-perturbed data.
I have tired your suggested way and the result is a little bit worse.
It was because we can't guarantee the same utterances will both have its perturbed and unperturbed version under the random process in the script.
Also the old way can avoid repeat making the features of the unperturbed data.

danpovey · 2016-10-22T20:36:12Z

I responded by email a day or two ago, but github lost my comment. Resending:

How about adding an option --include-original-data in your script, so it will know to always include one copy of the original (checking first that the num-copies > 1)?
That would be much easier from a user perspective, than having to manually add the original data.

tomkocse · 2016-10-23T01:16:39Z

@danpovey I have thought about that too, but then the feature will have to be extracted again for the original data. is that fine ?

danpovey · 2016-10-23T01:18:22Z

I replied by email, but github reply-by-email is very flaky right now, so I'm posting it directly too.

That's OK.
For typical setups, we're extracting 40-dim features for all the perturbed data, and we probably won't have the 40-dim features for the un-perturbed data, so there is no duplication.

tomkocse · 2016-10-23T01:36:36Z

@danpovey Do you think it is critical to let the perturbed data be included in the training of UBM and ivector-extractor ? Actually i am verifying these recently and the result is a little bit worse if perturbed data is not included.

danpovey · 2016-10-24T22:34:53Z

[reposting since my email was lost.]
I think you should generate all the perturbed data, then if the amount of data is extremely large (say, more than 400 hours) than randomly sub-sample it (subset_data_dir.sh) for training the i-vector extractor.

…ript; modify swbd-rvb script

danpovey · 2016-10-25T18:33:31Z

How about adding an option --include-original-data in your script, so it
will know to always include one copy of the original (checking first that
the num-copies > 1)?
That would be much easier from a user perspective, than having to manually
add the original data.

On Fri, Oct 21, 2016 at 12:49 AM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Regarding your suggestion of
using a smaller value for --speech-rvb-probability, let say 0.5,
instead of mixing the perturbed data with the un-perturbed data.
I have tired your suggested way and the result is a little bit worse.
It was because we can't guarantee the same utterances will both have its
perturbed and unperturbed version under the random process in the script.
Also the old way can avoid repeat making the features of the unperturbed
data.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuwmJXXljiHt1RyxFAlSUvg5FjxRZks5q2ERbgaJpZM4KUhmK
.

danpovey · 2016-10-25T18:45:33Z

That's OK.
For typical setups, we're extracting 40-dim features for all the perturbed
data, and we probably won't have the 40-dim features for the un-perturbed
data, so there is no duplication.
Dan

On Sat, Oct 22, 2016 at 9:16 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey I have thought about that too,
but then the feature will have to be extracted again for the original data.
is that fine ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu1n3dtknomW1e8DEE-gIi2RAQtaoks5q2rV6gaJpZM4KUhmK
.

danpovey · 2016-10-25T18:55:04Z

I think you should generate all the perturbed data, then if the amount of
data is extremely large (say, more than 400 hours) than randomly sub-sample
it (subset_data_dir.sh) for training the i-vector extractor.

On Sat, Oct 22, 2016 at 9:36 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Do you think it is critical to
let the perturbed data be included in the training of UBM and
ivector-extractor ? Actually i am verifying these recently and the result
is a little bit worse if perturbed data is not included.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-z12ucwKOhRr8NGE_Ug-AXHs-3oks5q2rongaJpZM4KUhmK
.

tomkocse · 2016-10-27T03:47:47Z

@danpovey Do you have time to see if further modification is needed ?

danpovey · 2016-10-27T03:54:25Z

[reposting directly since github is still delaying my email],
I'll try to in a day or two, but I was kind of hoping Vijay would chime in.

danpovey · 2016-10-27T04:25:26Z

egs/wsj/s5/steps/data/reverberate_data_dir.py

                        help="Sampling rate of the source data. If a positive integer is specified with this option, "
                        "the RIRs/noises will be resampled to the rate of the source data.")
+    parser.add_argument("--include-original-data", type=str, help="If true, the output data includes one copy of the original data",
+                         choices=['true', 'false'], default = False)


This doesn't look right. If you have string-valued choices then you have to have default = 'true', and check it with args.include_original_data == 'true'.

danpovey · 2016-10-27T05:17:30Z

Can you please run a version of this based on the 7e script, which is the current best?
Be careful that I just committed a change to the neural net training (max-change-per-component) which affects results slightly.
But if you run it with that change, and it's significantly better than the existing 7e result, we can just accept the result without finding out exactly how much of the improvement is due to reverberation and how much due to the max-change alteration.

danpovey · 2016-10-27T15:35:45Z

I'll try to in a day or two, but I was kind of hoping Vijay would chime in.

On Wed, Oct 26, 2016 at 11:47 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Do you have time to see if
further modification is needed ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5YcIR4wTExsg_DD1uZNfW7JzBh-ks5q4B7ngaJpZM4KUhmK
.

tomkocse · 2016-10-29T03:40:17Z

@danpovey these are the result of running data reverberation on 7e script:
%WER 9.8 | 1831 21395 | 91.3 5.8 3.0 1.1 9.8 43.5 | exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.swbd.filt.sys
%WER 19.4 | 2628 21594 | 82.9 11.8 5.2 2.4 19.4 55.0 | exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_9_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 14.6 | 4459 42989 | 87.1 8.9 4.0 1.8 14.6 50.2 | exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_9_0.0/eval2000_hires.ctm.filt.sys

the callhm result is improved from 7b to 7e, i can't find the 7e baseline result on RESULTS
can you tell me the 7e baseline result?

danpovey · 2016-10-29T03:49:27Z

It's at the top of the script itself. The comparable number to your 14.6%
is 15.3. So it does look like the reverberation is helping.

Can you please move the script to local/chain/tuning/run_tdnn_7f.sh, make
sure you also test on train_dev, and run the
script local/chain/compare_wer.sh 7e 7f?
Then you can link run_tdnn.sh to tuning/run_tdnn_7f.sh, as that's the new
best script.

On Fri, Oct 28, 2016 at 11:40 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey these are the result of running
data reverberation on 7e script:
%WER 9.8 | 1831 21395 | 91.3 5.8 3.0 1.1 9.8 43.5 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/
score_10_0.0/eval2000_hires.ctm.swbd.filt.sys
%WER 19.4 | 2628 21594 | 82.9 11.8 5.2 2.4 19.4 55.0 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/
score_9_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 14.6 | 4459 42989 | 87.1 8.9 4.0 1.8 14.6 50.2 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/
score_9_0.0/eval2000_hires.ctm.filt.sys

the callhm result is improved from 7b to 7e, i can't find the 7e baseline
result on RESULTS
can you tell me the 7e baseline result?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-venMl8SoyzwOpLfAcQRIOB1PE3ks5q4sAkgaJpZM4KUhmK
.

danpovey · 2016-10-29T03:59:25Z

... also, you can make a comment that the difference may not be 100% due to
the reverberation because we also added per-component max-change between 7e
and 7f.

On Fri, Oct 28, 2016 at 11:49 PM, Daniel Povey [email protected] wrote:

It's at the top of the script itself. The comparable number to your 14.6%
is 15.3. So it does look like the reverberation is helping.

Can you please move the script to local/chain/tuning/run_tdnn_7f.sh, make
sure you also test on train_dev, and run the script local/chain/compare_wer.sh
7e 7f?
Then you can link run_tdnn.sh to tuning/run_tdnn_7f.sh, as that's the new
best script.

On Fri, Oct 28, 2016 at 11:40 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey these are the result of running
data reverberation on 7e script:
%WER 9.8 | 1831 21395 | 91.3 5.8 3.0 1.1 9.8 43.5 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_
10_0.0/eval2000_hires.ctm.swbd.filt.sys
%WER 19.4 | 2628 21594 | 82.9 11.8 5.2 2.4 19.4 55.0 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_
9_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 14.6 | 4459 42989 | 87.1 8.9 4.0 1.8 14.6 50.2 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_
9_0.0/eval2000_hires.ctm.filt.sys

the callhm result is improved from 7b to 7e, i can't find the 7e baseline
result on RESULTS
can you tell me the 7e baseline result?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-venMl8SoyzwOpLfAcQRIOB1PE3ks5q4sAkgaJpZM4KUhmK
.

tomkocse · 2016-10-29T04:05:05Z

but per-component max-change is already added in 7e , so from 7e to 7f only reverberation is added

danpovey · 2016-10-29T04:12:21Z

No, the 7e script and the results in it are from before the per-component
max-change was committed.

On Sat, Oct 29, 2016 at 12:05 AM, Tom Ko [email protected] wrote:

but per-component max-change is already added in 7e , so from 7e to 7f
only reverberation is added

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuykMgjTUcBo_t_ostY0jBNwCMBkQks5q4sX0gaJpZM4KUhmK
.

tomkocse · 2016-10-29T04:21:43Z

then maybe i will rerun the normal 7e script with un-reverberant data to check the improvement gain by per-component max-change and reverberation separately

danpovey · 2016-10-29T04:23:03Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+
+stage=1
+num_data_reps=1  # number of reverberated copies of data to generate
+clean_data_dir=train_nodup


I don't like the fact that your script changes this variable by adding _sp to it, because it could mislead a reader into thinking that they understand what the variable is.
Better to use different variable names-- you could call this 'input_data_dir', and have 'clean_data_dir' be either $input_data_dir or ${input_data_dir}_sp [if/else].

danpovey · 2016-10-29T04:23:41Z

egs/swbd/s5c/local/chain/multi_condition/run_tdnn_7b.sh

+speed_perturb=true
+dir=exp/chain/tdnn_7b  # Note: _sp will get added to this if $speed_perturb == true.
+decode_iter=
+iv_dir=exp/nnet3_rvb


please call this something like ivector_dir if that's what you mean by 'iv'.

danpovey · 2016-10-29T04:25:28Z

egs/swbd/s5c/local/chain/multi_condition/run_tdnn_7b.sh

+# TDNN options
+# this script uses the new tdnn config generator so it needs a final 0 to reflect that the final layer input has no splicing
+# smoothing options
+pool_window=


These pooling options were deprecated long ago-- please remove them. When you get the final 7f script, please also 'diff' with the 7e script and see if there are any other respects in which your script is outdated-- I want it as similar as possible with 7e.. You may need to change some things and rerun.

danpovey · 2016-10-29T04:26:48Z

egs/swbd/s5c/local/chain/multi_condition/run_tdnn_7b.sh

+# if we are using the speed-perturbed data we need to generate
+# alignments for it.
+# Also the data reverberation will be done in this script/
+echo local/nnet3/multi_condition/run_ivector_common.sh --stage $stage \


I assume this 'echo' should not be there.

danpovey · 2016-10-29T04:30:43Z

egs/swbd/s5c/local/chain/multi_condition/run_tdnn_7b.sh

+  sort -u $rvb_lat_dir/temp/combined_lats.scp > $rvb_lat_dir/temp/combined_lats_sorted.scp
+
+  lattice-copy scp:$rvb_lat_dir/temp/combined_lats_sorted.scp "ark:|gzip -c >$rvb_lat_dir/lat.1.gz" || exit 1;
+  echo "1" > $rvb_lat_dir/num_jobs


This looks like it would be extremely slow for large data sets, it's all in one job.
In any case, all the chain-training script (get_egs) does with the lattices is to immediately copy them to ark,scp format, which is what you've done here. So better to change get_egs.sh so that it requires either lat.*.gz or lat.scp to exist. If lat.scp exists, all get_egs.sh has to do is copy it to the right directory.

danpovey · 2016-10-29T04:42:35Z

If you do that, then please change the numbering and make the rerun 7f, and
your new script 7g. That way you can put the proper comparisons there
(e.g. the 7e->7f comparison and the 7f->7g comparison).
The script expects 2 dirs but just give it the same one twice and then work
out the output by hand with reference to the contents of the 7e log file.

On Sat, Oct 29, 2016 at 12:21 AM, Tom Ko [email protected] wrote:

then maybe i will rerun the normal 7e script with un-reverberant data to
check the improvement gain by per-component max-change and reverberation
separately

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu8e2FMsIzYFOq4EVGd1chkBtB5IFks5q4snbgaJpZM4KUhmK
.

tomkocse · 2016-10-29T04:52:55Z

In that case, 7f (rerun) script will be exactly the same as 7e script.
I want to clarify that per-component max-change is not added to the script itself but to the core training recipe, am i right ?

danpovey · 2016-10-29T04:57:11Z

In that case, 7f (rerun) script will be exactly the same as 7e script.

Yes, but with different results and an appropriate comment at the top.

I want to clarify that per-component max-change is not added to the script
itself but to the core training recipe, am i right ?

Yes.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVux2lPPwOhty7_hu3dH_GOIP1n6Uyks5q4tEqgaJpZM4KUhmK
.

tomkocse · 2016-11-01T01:55:22Z

@danpovey Here are the comparison from 7e->7f and 7f-7g
System 7e 7f
WER on train_dev(tg) 14.41 14.46
WER on train_dev(fg) 13.39 13.23
WER on eval2000(tg) 16.9 17.0
WER on eval2000(fg) 15.3 15.4
Final train prob -0.0853629 -0.0882071
Final valid prob -0.110972 -0.107545
Final train prob (xent) -1.25237 -1.26246
Final valid prob (xent) -1.36715 -1.35525

System 7f 7g
WER on train_dev(tg) 14.46 14.27
WER on train_dev(fg) 13.23 13.16
WER on eval2000(tg) 17.0 16.3
WER on eval2000(fg) 15.4 14.6
Final train prob -0.0882071 -0.123325
Final valid prob -0.107545 -0.131798
Final train prob (xent) -1.26246 -1.6196
Final valid prob (xent) -1.35525 -1.60244

For 7e result, I just manually copy it from the top of the 7e script. You can see there is no obvious improvement from 7e -> 7f (adding per-component max-change ). I don't know if this was due to the randomness between different runs. Do you still want to add the 7f script where the 7g represents the reverberation script ?

danpovey · 2016-11-01T02:02:58Z

Yes, please create both scripts, it will more accurately document the
changes that have been made.

Dan

On Mon, Oct 31, 2016 at 9:55 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Here are the comparison from
7e->7f and 7f-7g
System 7e 7f WER on train_dev(tg) 14.41 14.46 WER on train_dev(fg) 13.39
13.23 WER on eval2000(tg) 16.9 17.0 WER on eval2000(fg) 15.3 15.4 Final
train prob -0.0853629 -0.0882071 Final valid prob -0.110972 -0.107545 Final
train prob (xent) -1.25237 -1.26246 Final valid prob (xent) -1.36715
-1.35525

System 7f 7g
WER on train_dev(tg) 14.46 14.27
WER on train_dev(fg) 13.23 13.16
WER on eval2000(tg) 17.0 16.3
WER on eval2000(fg) 15.4 14.6
Final train prob -0.0882071 -0.123325
Final valid prob -0.107545 -0.131798
Final train prob (xent) -1.26246 -1.6196
Final valid prob (xent) -1.35525 -1.60244

For 7e result, I just manually copy it from the top of the 7e script. You
can see there is no obvious improvement from 7e -> 7f (adding per-component
max-change ). I don't know if this was due to the randomness between
different runs. Do you still want to add the 7f script where the 7g
represents the reverberation script ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu3azJqAQRIjsQAF88_Rk2mXX3WAMks5q5pwNgaJpZM4KUhmK
.

danpovey · 2016-11-02T01:19:58Z

This should be very close to ready to merge, or maybe ready.
@vijayaditya, do you want to check it first?

vijayaditya · 2016-11-03T16:26:00Z

egs/swbd/s5c/RESULTS

-# current best 'chain' models with TDNNs (see local/chain/run_tdnn_7d.sh)
-%WER 10.4 | 1831 21395 | 90.7 6.1 3.2 1.2 10.4 44.6 | exp/chain/tdnn_7d_sp/decode_eval2000_sw1_fsh_fg/score_11_1.0/eval2000_hires.ctm.swbd.filt.sys
-%WER 11.6 | 1831 21395 | 89.7 7.0 3.3 1.4 11.6 47.0 | exp/chain/tdnn_7d_sp/decode_eval2000_sw1_tg/score_10_1.0/eval2000_hires.ctm.swbd.filt.sys
+# current best 'chain' models with TDNNs (see local/chain/run_tdnn_7g.sh)


Does the model converge after 2 epochs of training ? Could you please post the log-likelihood plots here.

vijayaditya · 2016-11-03T16:36:43Z

egs/swbd/s5c/local/chain/tuning/run_tdnn_7f.sh

@@ -0,0 +1,210 @@
+#!/bin/bash
+
+# 7e is as 7f, but adding the max-change-per-component to the neural net training


7f is as 7e

vijayaditya · 2016-11-03T16:37:48Z

egs/swbd/s5c/local/chain/tuning/run_tdnn_7f.sh

+
+# TDNN options
+# this script uses the new tdnn config generator so it needs a final 0 to reflect that the final layer input has no splicing
+# smoothing options


what smoothing are you referring to ?

vijayaditya · 2016-11-03T16:49:33Z

egs/swbd/s5c/local/chain/tuning/run_tdnn_7g.sh

+# which leads to better results
+# This script assumes a mixing of the original training data with its reverberated copy
+# and results in a 2-fold training set. Thus the number of epochs is halved to
+# keep the same training time.


Add a comment describing what happens if you train for more number of epochs.

vijayaditya · 2016-11-03T16:50:10Z

egs/swbd/s5c/local/chain/tuning/run_tdnn_7g.sh

+
+
+# TDNN options
+# this script uses the new tdnn config generator so it needs a final 0 to reflect that the final layer input has no splicing


move this comment to the splice indexes specification.

vijayaditya · 2016-11-03T17:01:14Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+      rm -r data/temp1 data/temp2
+
+      mfccdir=mfcc_perturbed
+      steps/make_mfcc.sh --cmd "$train_cmd" --nj 50 \


add a comment describing why you need these features.

vijayaditya · 2016-11-03T17:01:58Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+
+  clean_data_dir=${input_data_dir}_sp
+else
+  clean_data_dir=${input_data_dir}


add a comment here saying we recommend speed perturbation as the gains are significant.

vijayaditya · 2016-11-03T17:03:06Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+  # if --include-original-data is true, the original data will be mixed with its reverberated copies
+  python steps/data/reverberate_data_dir.py \
+    --prefix "rev" \
+    --rir-set-parameters "0.3, simulated_rirs_8k/smallroom/rir_list" \


what happens to the other 0.1 probability mass ? could you add a comment here describing how these weights are used ?

vijayaditya · 2016-11-03T17:05:34Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+if [ $stage -le 5 ]; then
+  steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \
+    --splice-opts "--left-context=3 --right-context=3" \
+    5500 90000 data/train_100k_nodup_hires \


don't you want to train the lda_mllt transform on a mix of reverberated and clean data ?

@vijayaditya just train the lda_mllt transform on clean data is good enough, and this can avoid copying the alignment.

OK in that case could you add a comment here saying the same, this would help avoid any confusions.

vijayaditya · 2016-11-03T17:07:17Z

egs/wsj/s5/steps/data/reverberate_data_dir.py

-
-            if reverberate_opts == "":
+
+            # prefix with index 0, e.g. rvb0_swb0035, stangs for the original data


# prefix using index 0 is reserved for original data e.g. rvb0_swb0035 corresponds to the swb0035 recording in original data

danpovey · 2016-11-04T18:07:22Z

That's OK with me. The LDA+MLLT is the least critical part of the whole
thing, I wouldn't even bother testing if it makes a difference because
you'd just see noise.
David is still looking into whether we can replace it with PCA.

On Fri, Nov 4, 2016 at 12:23 PM, Tom Ko [email protected] wrote:

@tomkocse commented on this pull request.

In egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh
#1112:
features; this helps make trained nnets more invariant to test data volume.

utils/data/perturb_data_dir_volume.sh data/${dataset}_hires
steps/make_mfcc.sh --nj 70 --mfcc-config conf/mfcc_hires.conf \
   --cmd "$train_cmd" data/${dataset}_hires exp/make_hires/$dataset $mfccdir;
steps/compute_cmvn_stats.sh data/${dataset}_hires exp/make_hires/${dataset} $mfccdir;

utils/fix_data_dir.sh data/${dataset}_hires;

done
+fi
+# ivector extractor training
+if [ $stage -le 5 ]; then

steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \

--splice-opts "--left-context=3 --right-context=3" \

5500 90000 data/train_100k_nodup_hires \
@vijayaditya https://github.com/vijayaditya just train the lda_mllt
transform on clean data is good enough, and this can avoid copying the
alignment.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9v2Wa5Fcyu-cjIrYqXLjwcRFShCks5q61wVgaJpZM4KUhmK
.

vijayaditya · 2016-11-04T18:59:33Z

egs/swbd/s5c/local/chain/tuning/run_tdnn_7g.sh

+# smoothing options
+self_repair_scale=0.00001
+# training options
+num_epochs=2


Did you get a chance to check the log-likelihood values at the end of training ? Did the training converge, is there no improvement from running the training for few more epochs ?

@vijayaditya I have checked that there is no improvement from training for more epochs. I guess we have already shown the convergence and the likelihood values in our paper.

vijayaditya

Will merge after the two requested minor changes have been made.

vijayaditya · 2016-11-07T14:58:00Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+. ./cmd.sh

-stage=1
+stage=3


Did you forget to change this back ?

vijayaditya · 2016-11-07T14:59:22Z

egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh

+if [ $stage -le 5 ]; then
+  steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \
+    --splice-opts "--left-context=3 --right-context=3" \
+    5500 90000 data/train_100k_nodup_hires \


OK in that case could you add a comment here saying the same, this would help avoid any confusions.

vijayaditya · 2016-11-07T16:58:21Z

egs/swbd/s5c/local/chain/run_tdnn.sh

@@ -1 +1 @@
-tuning/run_tdnn_7e.sh
+tuning/run_tdnn_7g.sh


@danpovey @tomkocse @freewym Do you actually want to make this the preferred swbd recipe ?

I am asking this question as we will not be able to compare our results with other papers. We don't already do it anyway as we use speed-perturbation. So @tomkocse could you just add a commented line in this script

#for swbd recipe without the reverberation of training data use the following script # it is similar to run_tdnn_7g.sh except for the run_ivector_common.sh being called. # tuning/run_tdnn_7f.sh

danpovey · 2016-11-07T21:41:47Z

Actually, I'm not sure, regarding making it the preferred Switchboard
recipe.
Maybe leave the current one the preferred recipe, but put a note at the
top, saying you can run [this recipe] which has reverberation and gives
better results, although it will take a little longer (and more disk space)
to dump egs.

Dan

On Mon, Nov 7, 2016 at 12:08 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@vijayaditya commented on this pull request.

In egs/swbd/s5c/local/chain/run_tdnn.sh
#1112:

\ No newline at end of file
+tuning/run_tdnn_7g.sh

I am asking this question as we will not be able to compare our results
with other papers. We don't already do it anyway as we use
speed-perturbation. So @tomkocse https://github.com/tomkocse could you
just add a commented line in this script

#for swbd recipe without the reverberation of training data use the following script# it is similar to run_tdnn_7g.sh except for the run_ivector_common.sh being called.# tuning/run_tdnn_7f.sh

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu6iE4YpiwP8R0M_KY3aDN-PNLWSwks5q71r-gaJpZM4KUhmK
.

tomkocse · 2016-11-08T02:32:48Z

What about moving the reverberated recipe (7g) to local/chain/multi_condition then go on making the non-reverberated one (7f) the preferred recipe?

danpovey · 2016-11-08T02:34:29Z

OK.

On Mon, Nov 7, 2016 at 9:32 PM, Tom Ko [email protected] wrote:

What about moving the reverberated recipe (7g) to
local/chain/multi_condition then go on making the non-reverberated one (7f)
the preferred recipe?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuwYSPCBkxPK6J2OD12njF_wZejiWks5q799TgaJpZM4KUhmK
.

Augmentation recipe for swbd

92ad8ba

result added

0bcf41e

danpovey reviewed Oct 16, 2016

View reviewed changes

remove copy_ali_dir.sh; add --include-original-data to reverberate sc…

178c9d1

…ript; modify swbd-rvb script

Add coments and fix typo

dc13729

danpovey reviewed Oct 27, 2016

View reviewed changes

fix --include-original-data option in reverberate_data_dir.py

82cd6f7

danpovey reviewed Oct 29, 2016

View reviewed changes

adding run_tdnn_7g.sh which is the current best chain result

a4ee796

vijayaditya requested changes Nov 3, 2016

View reviewed changes

adding more comments the the script

67673fa

vijayaditya reviewed Nov 4, 2016

View reviewed changes

vijayaditya reviewed Nov 7, 2016

View reviewed changes

fixing typo

823bcac

vijayaditya reviewed Nov 7, 2016

View reviewed changes

tomkocse and others added 2 commits November 7, 2016 21:55

Moving tuning/run_tdnn_7g.sh back to multi_condition/run_tdnn_7f.sh

01e47f6

Merge branch 'master' into new_augment_swbd

b8453c0

vijayaditya approved these changes Nov 8, 2016

View reviewed changes

vijayaditya merged commit f4495be into kaldi-asr:master Nov 8, 2016

		%WER 11.6 \| 1831 21395 \| 89.7 7.0 3.3 1.4 11.6 47.0 \| exp/chain/tdnn_7d_sp/decode_eval2000_sw1_tg/score_10_1.0/eval2000_hires.ctm.swbd.filt.sys


		# results with chain TDNNs (2 epoch training on data reverberated with room impulse responses) (see local/chain/multi_condition/run_tdnn_7b.sh)

		@@ -0,0 +1,78 @@
		#!/bin/bash

		# Copyright 2014 Johns Hopkins University (author: Vijayaditya Peddinti)

		@@ -0,0 +1,210 @@
		#!/bin/bash

		# 7e is as 7f, but adding the max-change-per-component to the neural net training



		# TDNN options
		# this script uses the new tdnn config generator so it needs a final 0 to reflect that the final layer input has no splicing


		if reverberate_opts == "":

		# prefix with index 0, e.g. rvb0_swb0035, stangs for the original data

		@@ -1 +1 @@
		tuning/run_tdnn_7e.sh No newline at end of file
		tuning/run_tdnn_7g.sh No newline at end of file

Augmentation recipe for swbd #1112

Augmentation recipe for swbd #1112

Uh oh!

Conversation

tomkocse commented Oct 12, 2016

Uh oh!

francisr commented Oct 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danpovey commented Oct 12, 2016

Uh oh!

tomkocse commented Oct 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomkocse commented Oct 21, 2016

Uh oh!

danpovey commented Oct 22, 2016

Uh oh!

tomkocse commented Oct 23, 2016

Uh oh!

danpovey commented Oct 23, 2016

Uh oh!

tomkocse commented Oct 23, 2016

Uh oh!

danpovey commented Oct 24, 2016

Uh oh!

danpovey commented Oct 25, 2016

Uh oh!

danpovey commented Oct 25, 2016

Uh oh!

danpovey commented Oct 25, 2016

Uh oh!

tomkocse commented Oct 27, 2016

Uh oh!

danpovey commented Oct 27, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Oct 27, 2016

Uh oh!

danpovey commented Oct 27, 2016

Uh oh!

tomkocse commented Oct 29, 2016

Uh oh!

danpovey commented Oct 29, 2016

Uh oh!

danpovey commented Oct 29, 2016

Uh oh!

tomkocse commented Oct 29, 2016

Uh oh!

danpovey commented Oct 29, 2016

Uh oh!

tomkocse commented Oct 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Oct 29, 2016

Uh oh!

tomkocse commented Oct 29, 2016

Uh oh!

danpovey commented Oct 29, 2016

francisr commented Oct 12, 2016 •

edited

Loading

tomkocse commented Nov 1, 2016 •

edited

Loading