Skip to content

Conversation

@tomkocse
Copy link
Contributor

No description provided.

@francisr
Copy link
Contributor

francisr commented Oct 12, 2016

Does it bring any improvement to use speed and volume perturbation in addition to adding noises?
EDIT: I just realised that you are indeed using the speed perturbed data.

@danpovey
Copy link
Contributor

Tom-- it would be good if you give us a little background on this,
including how the results compare to un-corrupted and speech-perturbed.
It's surprising if you're not combining it with speed perturbation.

On Wed, Oct 12, 2016 at 5:53 AM, Rémi Francis [email protected]
wrote:

Does it bring any improvement to use speed and volume perturbation in
addition to adding noises?


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu7JpBX3muhUnkyXQSzGU9_fv98HAks5qzK4tgaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

@francisr Yes the improvement brought by reverberation is additive to speed and volume perturbation.
@danpovey I have discussed this with Vijay and I perceived that this would better be a separated recipe.
Or should i merge them to the current recipe ?

%WER 11.6 | 1831 21395 | 89.7 7.0 3.3 1.4 11.6 47.0 | exp/chain/tdnn_7d_sp/decode_eval2000_sw1_tg/score_10_1.0/eval2000_hires.ctm.swbd.filt.sys


# results with chain TDNNs (2 epoch training on data reverberated with room impulse responses) (see local/chain/multi_condition/run_tdnn_7b.sh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify at this point how many repetitions of the data there were, and whether or not it included speed perturbation?

stage=1
train_stage=-10
get_egs_stage=-10
speed_perturb=true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see that by default this uses the speed-perturbed data as input. you might want to have a comment at the top that clarifies this.

@@ -0,0 +1,78 @@
#!/bin/bash

# Copyright 2014 Johns Hopkins University (author: Vijayaditya Peddinti)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will be necessary to add this script once you make those other changes.
I'll talk to David separately to see if we can come up with a roadmap to make it unnecessary to have alignments, for purposes of training the speaker-id systems... this is very inconvenient. I don't want to have a situation where the need for these alignments is making the scripts super complicated [see my other, long, comment].

--num-replications $num_data_reps \
--max-noises-per-minute 1 \
--source-sampling-rate 8000 \
data/${clean_data_dir} data/${train_set}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments after looking at the script in more detail:
If we want to use this script as a starting point for how to do data reverberation it's going to be quite hard.

For a start, it has a very messy dependency on the regular run_ivector_common.sh, because it assumes that the train_nodup_sp directory already exists. There are scenarios where we'd want to run the reverberation stuff from scratch without running the regular run_ivector_common.sh. It would be better to have this script do both the speed perturbing and the RIR stuff; to avoid overwriting features that might have been already written by run_ivector_common.sh, you could put that step first so the --stage parameter can be used to avoid redoing stuff, and have the script die if the feats.scp exists in the _sp directory (and would be overwritten if the script were to run).
To keep things simple, you can just make the speed-perturbing non-optional. I don't think anyone will want to do this without speed perturbing; if you want to do so for your own experiments you can do it locally.

Also, this script seems to have a very detailed dependency on the Switchboard setup, because of the way you prepare the data subset for the lda_mllt stage, and the way you re-use the alignments.
The LDA+MLLT transform is extremely non-critical, because it only affects the diagonal GMM which is just used to initialize the full GMM and to pre-prune when getting GMM posteriors. And the number of parameters is tiny. You could use a very small data subset for that, and it would be best to eliminate all of the detailed dependency on the Switchboard setup. What I think would make more sense is, after you create the _mix data, to select a fairly small subset of it (with a num-utts that can be specified on the command line, like 20k), and to dump parallel features with regular MFCC and _hires, and then use a supplied model [which should be defined at the top of the script, and not specified in the body of the script], to get the alignments so you can train a system on the _hires features.

Also, with regard to the way you're mixing it with the un-perturbed data (_mix)...
is there a reason why it's better to mix it manually like that, instead of specifying (say) a number less than 1 for the following:
--speech-rvb-probability 1
--pointsource-noise-addition-probability 1
--isotropic-noise-addition-probability 1
and maybe using 2 replications instead of 1? It seems to me that that would be more elegant and simpler, if the results were the same.

unzip rirs_noises.zip
fi

# corrupt the data to generate reverberated data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please clarify via a comment that this doesn't do any real computation, it just writes commands in the wav.scp file.

# results with chain TDNNs (2 epoch training on data reverberated with room impulse responses) (see local/chain/multi_condition/run_tdnn_7b.sh)
%WER 10.0 | 1831 21395 | 91.0 6.0 3.0 1.1 10.0 43.8 | exp/chain/tdnn_7b_sp_rvb1_mix/decode_eval2000_sw1_fsh_fg/score_10_0.5/eval2000_hires.ctm.swbd.filt.sys
%WER 20.0 | 2628 21594 | 82.1 11.7 6.2 2.1 20.0 55.6 | exp/chain/tdnn_7b_sp_rvb1_mix/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 15.0 | 4459 42989 | 86.5 8.8 4.7 1.6 15.0 50.7 | exp/chain/tdnn_7b_sp_rvb1_mix/decode_eval2000_sw1_fsh_fg/score_10_0.5/eval2000_hires.ctm.filt.sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier to parse these results in context if you make it compatible with the numbers just above and below, by putting the whole-test-set number (15.0) first, the 10.0 number second, and removing the 20.0 number [people rarely quote the callhome-only subset... I don't mind normally but this is just for consistency with the numbers above and below]

@tomkocse
Copy link
Contributor Author

@danpovey Regarding your suggestion of using a smaller value for --speech-rvb-probability, let say 0.5,
instead of mixing the perturbed data with the un-perturbed data.
I have tired your suggested way and the result is a little bit worse.
It was because we can't guarantee the same utterances will both have its perturbed and unperturbed version under the random process in the script.
Also the old way can avoid repeat making the features of the unperturbed data.

@danpovey
Copy link
Contributor

I responded by email a day or two ago, but github lost my comment. Resending:

How about adding an option --include-original-data in your script, so it will know to always include one copy of the original (checking first that the num-copies > 1)?
That would be much easier from a user perspective, than having to manually add the original data.

@tomkocse
Copy link
Contributor Author

@danpovey I have thought about that too, but then the feature will have to be extracted again for the original data. is that fine ?

@danpovey
Copy link
Contributor

I replied by email, but github reply-by-email is very flaky right now, so I'm posting it directly too.

That's OK.
For typical setups, we're extracting 40-dim features for all the perturbed data, and we probably won't have the 40-dim features for the un-perturbed data, so there is no duplication.

@tomkocse
Copy link
Contributor Author

@danpovey Do you think it is critical to let the perturbed data be included in the training of UBM and ivector-extractor ? Actually i am verifying these recently and the result is a little bit worse if perturbed data is not included.

@danpovey
Copy link
Contributor

[reposting since my email was lost.]
I think you should generate all the perturbed data, then if the amount of data is extremely large (say, more than 400 hours) than randomly sub-sample it (subset_data_dir.sh) for training the i-vector extractor.

@danpovey
Copy link
Contributor

How about adding an option --include-original-data in your script, so it
will know to always include one copy of the original (checking first that
the num-copies > 1)?
That would be much easier from a user perspective, than having to manually
add the original data.

On Fri, Oct 21, 2016 at 12:49 AM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Regarding your suggestion of
using a smaller value for --speech-rvb-probability, let say 0.5,
instead of mixing the perturbed data with the un-perturbed data.
I have tired your suggested way and the result is a little bit worse.
It was because we can't guarantee the same utterances will both have its
perturbed and unperturbed version under the random process in the script.
Also the old way can avoid repeat making the features of the unperturbed
data.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuwmJXXljiHt1RyxFAlSUvg5FjxRZks5q2ERbgaJpZM4KUhmK
.

@danpovey
Copy link
Contributor

That's OK.
For typical setups, we're extracting 40-dim features for all the perturbed
data, and we probably won't have the 40-dim features for the un-perturbed
data, so there is no duplication.
Dan

On Sat, Oct 22, 2016 at 9:16 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey I have thought about that too,
but then the feature will have to be extracted again for the original data.
is that fine ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu1n3dtknomW1e8DEE-gIi2RAQtaoks5q2rV6gaJpZM4KUhmK
.

@danpovey
Copy link
Contributor

I think you should generate all the perturbed data, then if the amount of
data is extremely large (say, more than 400 hours) than randomly sub-sample
it (subset_data_dir.sh) for training the i-vector extractor.

On Sat, Oct 22, 2016 at 9:36 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Do you think it is critical to
let the perturbed data be included in the training of UBM and
ivector-extractor ? Actually i am verifying these recently and the result
is a little bit worse if perturbed data is not included.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-z12ucwKOhRr8NGE_Ug-AXHs-3oks5q2rongaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

@danpovey Do you have time to see if further modification is needed ?

@danpovey
Copy link
Contributor

[reposting directly since github is still delaying my email],
I'll try to in a day or two, but I was kind of hoping Vijay would chime in.

help="Sampling rate of the source data. If a positive integer is specified with this option, "
"the RIRs/noises will be resampled to the rate of the source data.")
parser.add_argument("--include-original-data", type=str, help="If true, the output data includes one copy of the original data",
choices=['true', 'false'], default = False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right. If you have string-valued choices then you have to have default = 'true', and check it with args.include_original_data == 'true'.

@danpovey
Copy link
Contributor

Can you please run a version of this based on the 7e script, which is the current best?
Be careful that I just committed a change to the neural net training (max-change-per-component) which affects results slightly.
But if you run it with that change, and it's significantly better than the existing 7e result, we can just accept the result without finding out exactly how much of the improvement is due to reverberation and how much due to the max-change alteration.

@danpovey
Copy link
Contributor

I'll try to in a day or two, but I was kind of hoping Vijay would chime in.

On Wed, Oct 26, 2016 at 11:47 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Do you have time to see if
further modification is needed ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5YcIR4wTExsg_DD1uZNfW7JzBh-ks5q4B7ngaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

@danpovey these are the result of running data reverberation on 7e script:
%WER 9.8 | 1831 21395 | 91.3 5.8 3.0 1.1 9.8 43.5 | exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.swbd.filt.sys
%WER 19.4 | 2628 21594 | 82.9 11.8 5.2 2.4 19.4 55.0 | exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_9_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 14.6 | 4459 42989 | 87.1 8.9 4.0 1.8 14.6 50.2 | exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_9_0.0/eval2000_hires.ctm.filt.sys

the callhm result is improved from 7b to 7e, i can't find the 7e baseline result on RESULTS
can you tell me the 7e baseline result?

@danpovey
Copy link
Contributor

It's at the top of the script itself. The comparable number to your 14.6%
is 15.3. So it does look like the reverberation is helping.

Can you please move the script to local/chain/tuning/run_tdnn_7f.sh, make
sure you also test on train_dev, and run the
script local/chain/compare_wer.sh 7e 7f?
Then you can link run_tdnn.sh to tuning/run_tdnn_7f.sh, as that's the new
best script.

On Fri, Oct 28, 2016 at 11:40 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey these are the result of running
data reverberation on 7e script:
%WER 9.8 | 1831 21395 | 91.3 5.8 3.0 1.1 9.8 43.5 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/
score_10_0.0/eval2000_hires.ctm.swbd.filt.sys
%WER 19.4 | 2628 21594 | 82.9 11.8 5.2 2.4 19.4 55.0 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/
score_9_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 14.6 | 4459 42989 | 87.1 8.9 4.0 1.8 14.6 50.2 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/
score_9_0.0/eval2000_hires.ctm.filt.sys

the callhm result is improved from 7b to 7e, i can't find the 7e baseline
result on RESULTS
can you tell me the 7e baseline result?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-venMl8SoyzwOpLfAcQRIOB1PE3ks5q4sAkgaJpZM4KUhmK
.

@danpovey
Copy link
Contributor

... also, you can make a comment that the difference may not be 100% due to
the reverberation because we also added per-component max-change between 7e
and 7f.

On Fri, Oct 28, 2016 at 11:49 PM, Daniel Povey [email protected] wrote:

It's at the top of the script itself. The comparable number to your 14.6%
is 15.3. So it does look like the reverberation is helping.

Can you please move the script to local/chain/tuning/run_tdnn_7f.sh, make
sure you also test on train_dev, and run the script local/chain/compare_wer.sh
7e 7f?
Then you can link run_tdnn.sh to tuning/run_tdnn_7f.sh, as that's the new
best script.

On Fri, Oct 28, 2016 at 11:40 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey these are the result of running
data reverberation on 7e script:
%WER 9.8 | 1831 21395 | 91.3 5.8 3.0 1.1 9.8 43.5 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_
10_0.0/eval2000_hires.ctm.swbd.filt.sys
%WER 19.4 | 2628 21594 | 82.9 11.8 5.2 2.4 19.4 55.0 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_
9_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 14.6 | 4459 42989 | 87.1 8.9 4.0 1.8 14.6 50.2 |
exp/chain/tdnn_7e2_sp_rvb1/decode_eval2000_sw1_fsh_fg/score_
9_0.0/eval2000_hires.ctm.filt.sys

the callhm result is improved from 7b to 7e, i can't find the 7e baseline
result on RESULTS
can you tell me the 7e baseline result?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-venMl8SoyzwOpLfAcQRIOB1PE3ks5q4sAkgaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

but per-component max-change is already added in 7e , so from 7e to 7f only reverberation is added

@danpovey
Copy link
Contributor

No, the 7e script and the results in it are from before the per-component
max-change was committed.

On Sat, Oct 29, 2016 at 12:05 AM, Tom Ko [email protected] wrote:

but per-component max-change is already added in 7e , so from 7e to 7f
only reverberation is added


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuykMgjTUcBo_t_ostY0jBNwCMBkQks5q4sX0gaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

then maybe i will rerun the normal 7e script with un-reverberant data to check the improvement gain by per-component max-change and reverberation separately


stage=1
num_data_reps=1 # number of reverberated copies of data to generate
clean_data_dir=train_nodup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the fact that your script changes this variable by adding _sp to it, because it could mislead a reader into thinking that they understand what the variable is.
Better to use different variable names-- you could call this 'input_data_dir', and have 'clean_data_dir' be either $input_data_dir or ${input_data_dir}_sp [if/else].

speed_perturb=true
dir=exp/chain/tdnn_7b # Note: _sp will get added to this if $speed_perturb == true.
decode_iter=
iv_dir=exp/nnet3_rvb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please call this something like ivector_dir if that's what you mean by 'iv'.

# TDNN options
# this script uses the new tdnn config generator so it needs a final 0 to reflect that the final layer input has no splicing
# smoothing options
pool_window=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These pooling options were deprecated long ago-- please remove them. When you get the final 7f script, please also 'diff' with the 7e script and see if there are any other respects in which your script is outdated-- I want it as similar as possible with 7e.. You may need to change some things and rerun.

# if we are using the speed-perturbed data we need to generate
# alignments for it.
# Also the data reverberation will be done in this script/
echo local/nnet3/multi_condition/run_ivector_common.sh --stage $stage \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this 'echo' should not be there.

sort -u $rvb_lat_dir/temp/combined_lats.scp > $rvb_lat_dir/temp/combined_lats_sorted.scp

lattice-copy scp:$rvb_lat_dir/temp/combined_lats_sorted.scp "ark:|gzip -c >$rvb_lat_dir/lat.1.gz" || exit 1;
echo "1" > $rvb_lat_dir/num_jobs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it would be extremely slow for large data sets, it's all in one job.
In any case, all the chain-training script (get_egs) does with the lattices is to immediately copy them to ark,scp format, which is what you've done here. So better to change get_egs.sh so that it requires either lat.*.gz or lat.scp to exist. If lat.scp exists, all get_egs.sh has to do is copy it to the right directory.

@danpovey
Copy link
Contributor

If you do that, then please change the numbering and make the rerun 7f, and
your new script 7g. That way you can put the proper comparisons there
(e.g. the 7e->7f comparison and the 7f->7g comparison).
The script expects 2 dirs but just give it the same one twice and then work
out the output by hand with reference to the contents of the 7e log file.

On Sat, Oct 29, 2016 at 12:21 AM, Tom Ko [email protected] wrote:

then maybe i will rerun the normal 7e script with un-reverberant data to
check the improvement gain by per-component max-change and reverberation
separately


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu8e2FMsIzYFOq4EVGd1chkBtB5IFks5q4snbgaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

In that case, 7f (rerun) script will be exactly the same as 7e script.
I want to clarify that per-component max-change is not added to the script itself but to the core training recipe, am i right ?

@danpovey
Copy link
Contributor

In that case, 7f (rerun) script will be exactly the same as 7e script.

Yes, but with different results and an appropriate comment at the top.

I want to clarify that per-component max-change is not added to the script
itself but to the core training recipe, am i right ?

Yes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVux2lPPwOhty7_hu3dH_GOIP1n6Uyks5q4tEqgaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

tomkocse commented Nov 1, 2016

@danpovey Here are the comparison from 7e->7f and 7f-7g
System 7e 7f
WER on train_dev(tg) 14.41 14.46
WER on train_dev(fg) 13.39 13.23
WER on eval2000(tg) 16.9 17.0
WER on eval2000(fg) 15.3 15.4
Final train prob -0.0853629 -0.0882071
Final valid prob -0.110972 -0.107545
Final train prob (xent) -1.25237 -1.26246
Final valid prob (xent) -1.36715 -1.35525

System 7f 7g
WER on train_dev(tg) 14.46 14.27
WER on train_dev(fg) 13.23 13.16
WER on eval2000(tg) 17.0 16.3
WER on eval2000(fg) 15.4 14.6
Final train prob -0.0882071 -0.123325
Final valid prob -0.107545 -0.131798
Final train prob (xent) -1.26246 -1.6196
Final valid prob (xent) -1.35525 -1.60244

For 7e result, I just manually copy it from the top of the 7e script. You can see there is no obvious improvement from 7e -> 7f (adding per-component max-change ). I don't know if this was due to the randomness between different runs. Do you still want to add the 7f script where the 7g represents the reverberation script ?

@danpovey
Copy link
Contributor

danpovey commented Nov 1, 2016

Yes, please create both scripts, it will more accurately document the
changes that have been made.

Dan

On Mon, Oct 31, 2016 at 9:55 PM, Tom Ko [email protected] wrote:

@danpovey https://github.com/danpovey Here are the comparison from
7e->7f and 7f-7g
System 7e 7f WER on train_dev(tg) 14.41 14.46 WER on train_dev(fg) 13.39
13.23 WER on eval2000(tg) 16.9 17.0 WER on eval2000(fg) 15.3 15.4 Final
train prob -0.0853629 -0.0882071 Final valid prob -0.110972 -0.107545 Final
train prob (xent) -1.25237 -1.26246 Final valid prob (xent) -1.36715
-1.35525

System 7f 7g
WER on train_dev(tg) 14.46 14.27
WER on train_dev(fg) 13.23 13.16
WER on eval2000(tg) 17.0 16.3
WER on eval2000(fg) 15.4 14.6
Final train prob -0.0882071 -0.123325
Final valid prob -0.107545 -0.131798
Final train prob (xent) -1.26246 -1.6196
Final valid prob (xent) -1.35525 -1.60244

For 7e result, I just manually copy it from the top of the 7e script. You
can see there is no obvious improvement from 7e -> 7f (adding per-component
max-change ). I don't know if this was due to the randomness between
different runs. Do you still want to add the 7f script where the 7g
represents the reverberation script ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu3azJqAQRIjsQAF88_Rk2mXX3WAMks5q5pwNgaJpZM4KUhmK
.

@danpovey
Copy link
Contributor

danpovey commented Nov 2, 2016

This should be very close to ready to merge, or maybe ready.
@vijayaditya, do you want to check it first?

# current best 'chain' models with TDNNs (see local/chain/run_tdnn_7d.sh)
%WER 10.4 | 1831 21395 | 90.7 6.1 3.2 1.2 10.4 44.6 | exp/chain/tdnn_7d_sp/decode_eval2000_sw1_fsh_fg/score_11_1.0/eval2000_hires.ctm.swbd.filt.sys
%WER 11.6 | 1831 21395 | 89.7 7.0 3.3 1.4 11.6 47.0 | exp/chain/tdnn_7d_sp/decode_eval2000_sw1_tg/score_10_1.0/eval2000_hires.ctm.swbd.filt.sys
# current best 'chain' models with TDNNs (see local/chain/run_tdnn_7g.sh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the model converge after 2 epochs of training ? Could you please post the log-likelihood plots here.

@@ -0,0 +1,210 @@
#!/bin/bash

# 7e is as 7f, but adding the max-change-per-component to the neural net training
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7f is as 7e


# TDNN options
# this script uses the new tdnn config generator so it needs a final 0 to reflect that the final layer input has no splicing
# smoothing options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what smoothing are you referring to ?

# which leads to better results
# This script assumes a mixing of the original training data with its reverberated copy
# and results in a 2-fold training set. Thus the number of epochs is halved to
# keep the same training time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment describing what happens if you train for more number of epochs.



# TDNN options
# this script uses the new tdnn config generator so it needs a final 0 to reflect that the final layer input has no splicing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this comment to the splice indexes specification.

rm -r data/temp1 data/temp2

mfccdir=mfcc_perturbed
steps/make_mfcc.sh --cmd "$train_cmd" --nj 50 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment describing why you need these features.


clean_data_dir=${input_data_dir}_sp
else
clean_data_dir=${input_data_dir}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment here saying we recommend speed perturbation as the gains are significant.

# if --include-original-data is true, the original data will be mixed with its reverberated copies
python steps/data/reverberate_data_dir.py \
--prefix "rev" \
--rir-set-parameters "0.3, simulated_rirs_8k/smallroom/rir_list" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to the other 0.1 probability mass ? could you add a comment here describing how these weights are used ?

if [ $stage -le 5 ]; then
steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \
--splice-opts "--left-context=3 --right-context=3" \
5500 90000 data/train_100k_nodup_hires \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you want to train the lda_mllt transform on a mix of reverberated and clean data ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vijayaditya just train the lda_mllt transform on clean data is good enough, and this can avoid copying the alignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK in that case could you add a comment here saying the same, this would help avoid any confusions.


if reverberate_opts == "":

# prefix with index 0, e.g. rvb0_swb0035, stangs for the original data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# prefix using index 0 is reserved for original data e.g. rvb0_swb0035 corresponds to the swb0035 recording in original data

@danpovey
Copy link
Contributor

danpovey commented Nov 4, 2016

That's OK with me. The LDA+MLLT is the least critical part of the whole
thing, I wouldn't even bother testing if it makes a difference because
you'd just see noise.
David is still looking into whether we can replace it with PCA.

On Fri, Nov 4, 2016 at 12:23 PM, Tom Ko [email protected] wrote:

@tomkocse commented on this pull request.

In egs/swbd/s5c/local/nnet3/multi_condition/run_ivector_common.sh
#1112:

  • features; this helps make trained nnets more invariant to test data volume.

  • utils/data/perturb_data_dir_volume.sh data/${dataset}_hires
  • steps/make_mfcc.sh --nj 70 --mfcc-config conf/mfcc_hires.conf \
  •    --cmd "$train_cmd" data/${dataset}_hires exp/make_hires/$dataset $mfccdir;
    
  • steps/compute_cmvn_stats.sh data/${dataset}_hires exp/make_hires/${dataset} $mfccdir;
  • utils/fix_data_dir.sh data/${dataset}_hires;
  • done
    +fi

+# ivector extractor training
+if [ $stage -le 5 ]; then

  • steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \
  • --splice-opts "--left-context=3 --right-context=3" \
  • 5500 90000 data/train_100k_nodup_hires \

@vijayaditya https://github.com/vijayaditya just train the lda_mllt
transform on clean data is good enough, and this can avoid copying the
alignment.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu9v2Wa5Fcyu-cjIrYqXLjwcRFShCks5q61wVgaJpZM4KUhmK
.

# smoothing options
self_repair_scale=0.00001
# training options
num_epochs=2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you get a chance to check the log-likelihood values at the end of training ? Did the training converge, is there no improvement from running the training for few more epochs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vijayaditya I have checked that there is no improvement from training for more epochs. I guess we have already shown the convergence and the likelihood values in our paper.

Copy link
Contributor

@vijayaditya vijayaditya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will merge after the two requested minor changes have been made.

. ./cmd.sh

stage=1
stage=3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you forget to change this back ?

if [ $stage -le 5 ]; then
steps/train_lda_mllt.sh --cmd "$train_cmd" --num-iters 13 \
--splice-opts "--left-context=3 --right-context=3" \
5500 90000 data/train_100k_nodup_hires \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK in that case could you add a comment here saying the same, this would help avoid any confusions.

@@ -1 +1 @@
tuning/run_tdnn_7e.sh No newline at end of file
tuning/run_tdnn_7g.sh No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpovey @tomkocse @freewym Do you actually want to make this the preferred swbd recipe ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking this question as we will not be able to compare our results with other papers. We don't already do it anyway as we use speed-perturbation. So @tomkocse could you just add a commented line in this script

#for swbd recipe without the reverberation of training data use the following script
# it is similar to run_tdnn_7g.sh except for the run_ivector_common.sh being called.
# tuning/run_tdnn_7f.sh

@danpovey
Copy link
Contributor

danpovey commented Nov 7, 2016

Actually, I'm not sure, regarding making it the preferred Switchboard
recipe.
Maybe leave the current one the preferred recipe, but put a note at the
top, saying you can run [this recipe] which has reverberation and gives
better results, although it will take a little longer (and more disk space)
to dump egs.

Dan

On Mon, Nov 7, 2016 at 12:08 PM, Vijayaditya Peddinti <
[email protected]> wrote:

@vijayaditya commented on this pull request.

In egs/swbd/s5c/local/chain/run_tdnn.sh
#1112:

\ No newline at end of file
+tuning/run_tdnn_7g.sh

I am asking this question as we will not be able to compare our results
with other papers. We don't already do it anyway as we use
speed-perturbation. So @tomkocse https://github.com/tomkocse could you
just add a commented line in this script

#for swbd recipe without the reverberation of training data use the following script# it is similar to run_tdnn_7g.sh except for the run_ivector_common.sh being called.# tuning/run_tdnn_7f.sh


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu6iE4YpiwP8R0M_KY3aDN-PNLWSwks5q71r-gaJpZM4KUhmK
.

@tomkocse
Copy link
Contributor Author

tomkocse commented Nov 8, 2016

What about moving the reverberated recipe (7g) to local/chain/multi_condition then go on making the non-reverberated one (7f) the preferred recipe?

@danpovey
Copy link
Contributor

danpovey commented Nov 8, 2016

OK.

On Mon, Nov 7, 2016 at 9:32 PM, Tom Ko [email protected] wrote:

What about moving the reverberated recipe (7g) to
local/chain/multi_condition then go on making the non-reverberated one (7f)
the preferred recipe?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1112 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuwYSPCBkxPK6J2OD12njF_wZejiWks5q799TgaJpZM4KUhmK
.

@vijayaditya vijayaditya merged commit f4495be into kaldi-asr:master Nov 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants