Skip to content

Conversation

@vince62s
Copy link
Contributor

@vince62s vince62s commented Nov 1, 2016

This is a modification of the s5_r2 recipe to take into account the LM from Tedlium2 paper.
Gives better results and much better results than in the Paper.
Please review.
Vincent

@danpovey
Copy link
Contributor

danpovey commented Nov 1, 2016

@david-ryan-snyder, if you were going to test out the ivector change thing, perhaps you could do it on this setup and kill 2 birds with one stone, making sure this PR runs smoothly?

@david-ryan-snyder
Copy link
Contributor

@danpovey, sure, no problem.

@david-ryan-snyder
Copy link
Contributor

@danpovey, @vince62s sorry for the delay. I still haven't gotten to this but I will try to do so by the end of the week.

@danpovey
Copy link
Contributor

@david-ryan-snyder, don't forget to test this!

@david-ryan-snyder
Copy link
Contributor

On it now. Will update ASAP.

@@ -59,7 +59,8 @@ if [ $stage -le 0 ]; then
rm ${dir}/data/text/* 2>/dev/null || true

# cantab-TEDLIUM is the larger data source. gzip it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 61 is out of date, isn't it?

# cantab-TEDLIUM is the larger data source. gzip it.
sed 's/ <\/s>//g' < db/cantab-TEDLIUM/cantab-TEDLIUM.txt | gzip -c > ${dir}/data/text/train.txt.gz
gunzip db/TEDLIUM_release2/LM/*.en.gz
cat db/TEDLIUM_release2/LM/*.en | sed 's/ <\/s>//g' | local/join_suffix.py | gzip -c > ${dir}/data/text/train.txt.gz
Copy link
Contributor

@david-ryan-snyder david-ryan-snyder Nov 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to unzip these files on the disk? If not, I think you should replace it with this:

gunzip -c db/TEDLIUM_release2/LM/*.en.gz | sed 's/ <\/s>//g' | local/join_suffix.py | gzip -c > ${dir}/data/text/train.txt.gz

# get_data_prob.py: log-prob of data/local/local_lm/data/real_dev_set.txt given model data/local/local_lm/data/wordlist_4.pocolm was -5.13902242865 per word [perplexity = 170.514153159] over 18290.0 words.
# even older results, before adding min-counts:
# get_data_prob.py: log-prob of data/local/local_lm/data/real_dev_set.txt given model data/local/local_lm/data/lm_4 was -5.10576291033 per word [perplexity = 164.969879761] over 18290.0 words.
#[perplexity = 157.87] over 18290.0 words
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deleted lines appear to have provided a more detailed comment, about the logprob and the model being used. Is it possible to do the same with your updates?

# This will only work if you have GPUs on your system (and note that it requires
# you to have the queue set up the right way... see kaldi-asr.org/doc/queue.html)
local/chain/run_tdnn.sh
local/chain/run_tdnn.sh --train-set train --gmm tri3 --nnet3-affix ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your RESULTS file, you say that

This is about 0.6% worse than the corresponding results with cleanup.

If that's the case, shouldn't the version with cleanup be the default here (like it was before)?

@david-ryan-snyder
Copy link
Contributor

I did something dumb with the PCA vs LDA test I have to rerun something.

In the meantime, I added some comments to the pull request that I hope will be helpful in getting it through.

@vince62s
Copy link
Contributor Author

I will fix the first 2 comments.
the last 2:
no I did not rerun with both min counts perplexities.
the 0.6% worse comment is a bad copy paste from Dan's previous results.
But I did not run the cleaned version.
If you do rerun the whole stuff to update results on a grid, it's better anyway.

@danpovey
Copy link
Contributor

@david-ryan-snyder, is this ready to merge?

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Nov 25, 2016

@danpovey, not quite. @vince62s said he didn't run the cleaned version with his changes. I ran both on the CLSP grid. Hopefully @vince62s can decide what he needs from the following results to complete his RESULTS file.

%WER 27.8 | 507 17783 | 75.7 17.5 6.8 3.4 27.8 96.6 | 0.071 | exp/tri1/decode_nosp_dev/score_10_0.0/ctm.filt.filt.sys
%WER 26.3 | 507 17783 | 76.8 16.1 7.1 3.1 26.3 95.9 | 0.080 | exp/tri1/decode_nosp_dev_rescore/score_11_0.0/ctm.filt.filt.sys
%WER 27.3 | 1155 27500 | 75.3 18.4 6.3 2.7 27.3 93.0 | 0.119 | exp/tri1/decode_nosp_test/score_11_0.0/ctm.filt.filt.sys
%WER 26.2 | 1155 27500 | 76.6 17.3 6.1 2.8 26.2 92.6 | 0.081 | exp/tri1/decode_nosp_test_rescore/score_11_0.0/ctm.filt.filt.sys
%WER 22.5 | 507 17783 | 80.5 14.0 5.5 3.1 22.5 94.7 | 0.092 | exp/tri2/decode_dev/score_15_0.0/ctm.filt.filt.sys
%WER 21.3 | 507 17783 | 81.8 13.1 5.1 3.1 21.3 93.7 | 0.038 | exp/tri2/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 23.6 | 507 17783 | 79.6 14.8 5.6 3.2 23.6 95.1 | 0.024 | exp/tri2/decode_nosp_dev/score_12_0.0/ctm.filt.filt.sys
%WER 22.3 | 507 17783 | 80.7 13.5 5.8 3.0 22.3 93.7 | -0.002 | exp/tri2/decode_nosp_dev_rescore/score_13_0.0/ctm.filt.filt.sys
%WER 23.2 | 1155 27500 | 79.5 15.5 5.0 2.7 23.2 91.1 | 0.070 | exp/tri2/decode_nosp_test/score_12_0.0/ctm.filt.filt.sys
%WER 21.9 | 1155 27500 | 80.7 14.6 4.7 2.6 21.9 90.2 | 0.026 | exp/tri2/decode_nosp_test_rescore/score_12_0.0/ctm.filt.filt.sys
%WER 22.1 | 1155 27500 | 80.7 14.9 4.3 2.8 22.1 90.6 | 0.089 | exp/tri2/decode_test/score_13_0.0/ctm.filt.filt.sys
%WER 20.9 | 1155 27500 | 81.9 14.0 4.1 2.8 20.9 90.5 | 0.046 | exp/tri2/decode_test_rescore/score_13_0.0/ctm.filt.filt.sys
%WER 19.0 | 507 17783 | 83.9 11.4 4.7 2.9 19.0 92.1 | -0.054 | exp/tri3_cleaned/decode_dev/score_13_0.5/ctm.filt.filt.sys
%WER 17.9 | 507 17783 | 85.1 10.5 4.4 3.0 17.9 90.9 | -0.055 | exp/tri3_cleaned/decode_dev_rescore/score_15_0.0/ctm.filt.filt.sys
%WER 22.9 | 507 17783 | 80.0 14.0 5.9 3.0 22.9 94.1 | -0.098 | exp/tri3_cleaned/decode_dev.si/score_14_0.5/ctm.filt.filt.sys
%WER 17.6 | 1155 27500 | 84.8 11.7 3.5 2.4 17.6 87.6 | 0.001 | exp/tri3_cleaned/decode_test/score_15_0.0/ctm.filt.filt.sys
%WER 16.6 | 1155 27500 | 85.8 10.9 3.4 2.4 16.6 86.4 | -0.058 | exp/tri3_cleaned/decode_test_rescore/score_15_0.0/ctm.filt.filt.sys
%WER 22.3 | 1155 27500 | 80.7 15.2 4.1 3.1 22.3 91.0 | -0.092 | exp/tri3_cleaned/decode_test.si/score_13_0.0/ctm.filt.filt.sys
%WER 18.7 | 507 17783 | 83.9 11.4 4.7 2.6 18.7 92.3 | -0.006 | exp/tri3/decode_dev/score_17_0.0/ctm.filt.filt.sys
%WER 17.6 | 507 17783 | 85.0 10.5 4.4 2.6 17.6 90.5 | -0.030 | exp/tri3/decode_dev_rescore/score_16_0.0/ctm.filt.filt.sys
%WER 22.8 | 507 17783 | 80.4 14.1 5.5 3.2 22.8 93.1 | -0.130 | exp/tri3/decode_dev.si/score_12_0.5/ctm.filt.filt.sys
%WER 17.6 | 1155 27500 | 84.7 11.6 3.7 2.4 17.6 87.2 | 0.013 | exp/tri3/decode_test/score_15_0.0/ctm.filt.filt.sys
%WER 16.7 | 1155 27500 | 85.7 10.9 3.4 2.4 16.7 86.4 | -0.044 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 22.3 | 1155 27500 | 80.6 15.2 4.1 3.0 22.3 91.3 | -0.076 | exp/tri3/decode_test.si/score_13_0.0/ctm.filt.filt.sys
%WER 9.8 | 507 17783 | 91.5 6.1 2.4 1.3 9.8 79.1 | 0.121 | exp/chain_cleaned/tdnn_sp_bi/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 9.1 | 507 17783 | 92.3 5.4 2.3 1.3 9.1 76.5 | 0.083 | exp/chain_cleaned/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 9.8 | 1155 27500 | 91.5 6.0 2.5 1.2 9.8 74.1 | 0.096 | exp/chain_cleaned/tdnn_sp_bi/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.3 | 1155 27500 | 91.9 5.6 2.5 1.2 9.3 72.6 | 0.073 | exp/chain_cleaned/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.1 | 507 17783 | 91.2 6.0 2.7 1.3 10.1 80.7 | 0.077 | exp/chain/tdnn_sp_bi/decode_dev/score_9_0.0/ctm.filt.filt.sys
%WER 9.3 | 507 17783 | 92.1 5.6 2.3 1.4 9.3 77.3 | 0.022 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys
%WER 10.1 | 1155 27500 | 91.2 5.9 2.9 1.3 10.1 74.5 | 0.076 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys
%WER 9.5 | 1155 27500 | 91.7 5.4 2.9 1.2 9.5 72.6 | 0.043 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_9_0.0/ctm.filt.filt.sys

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Nov 25, 2016

@danpovey , the previous results used the normal LDA features for the ivector. Here are results using PCA.

%WER 9.5 (vs 9.8) | 507 17783 | 91.8 5.8 2.4 1.3 9.5 76.9 | 0.075 | exp/chain_cleaned/tdnn_sp_bi/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 8.8 (vs 9.1) | 507 17783 | 92.4 5.3 2.3 1.3 8.8 76.3 | 0.092 | exp/chain_cleaned/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 9.9 (vs 9.8) | 1155 27500 | 91.4 6.0 2.6 1.3 9.9 74.2 | 0.129 | exp/chain_cleaned/tdnn_sp_bi/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.3 (vs 9.3) | 1155 27500 | 91.9 5.6 2.5 1.3 9.3 72.3 | 0.100 | exp/chain_cleaned/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.0 (vs 10.1) | 507 17783 | 91.3 5.9 2.8 1.3 10.0 78.1 | 0.101 | exp/chain/tdnn_sp_bi/decode_dev/score_9_0.0/ctm.filt.filt.sys
%WER 9.2 (vs 9.3) | 507 17783 | 92.1 5.3 2.6 1.3 9.2 75.5 | 0.061 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_9_0.0/ctm.filt.filt.sys
%WER 10.2 (vs 10.1) | 1155 27500 | 91.0 6.0 3.0 1.2 10.2 75.5 | 0.063 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys
%WER 9.8 (vs 9.5) | 1155 27500 | 91.2 5.4 3.4 1.1 9.8 73.9 | 0.095 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

@danpovey
Copy link
Contributor

danpovey commented Nov 25, 2016 via email

@david-ryan-snyder
Copy link
Contributor

I added a (vs XX.X) in each line so that you can see the corresponding results with LDA.

Bottom line is, PCA is better in 4 (out of 8) of the results, LDA is better in 3, and they're the same in 1. After averaging all 8 results, PCA is 9.59% and LDA Is 9.63%.

@danpovey
Copy link
Contributor

danpovey commented Nov 25, 2016 via email

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Nov 26, 2016

@danpovey are you referring to @vince62s's LM stuff or the PCA vs LDA stuff?

@danpovey
Copy link
Contributor

danpovey commented Nov 26, 2016 via email

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Nov 26, 2016

There was no fix. The PCA vs LDA results in #1123 use the same setup as here.

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Nov 26, 2016

I did something dumb with the PCA vs LDA test I have to rerun something.

@danpovey I was just referring to this PR in particular. The earlier PCA vs LDA results in #1123 are still valid.

@david-ryan-snyder
Copy link
Contributor

@danpovey Since we want to try the PCA vs LDA thing on more recipes, is it OK if we allow @vince62s to finish up the PR without that change? Since we already know the results, it won't need to be rerun later, when we decide to add in PCA for ivectors.

@vince62s I think the main thing is that you need to update your RESULTS file with the cleaned results I posted in an earlier comment. Also, since the cleaned results are better, I imagine you will want to make them the default in the run.sh file (like it was before).

@danpovey
Copy link
Contributor

danpovey commented Nov 26, 2016 via email

@vince62s
Copy link
Contributor Author

well, what should we do ? just remove all the old LM results or leave them for reference ?

also in @david-ryan-snyder results, there is no "standard" nnet3 results, do I just omit / remove from the file ?

@david-ryan-snyder
Copy link
Contributor

I think that's an @danpovey question.

I can run the other nnet3 results if we need them.

@danpovey
Copy link
Contributor

danpovey commented Nov 26, 2016 via email

# local/chain/run_tdnn.sh --train-set train --gmm tri3 --nnet3-affix ""
# for d in exp/chain/tdnn_sp_bi/decode_*; do grep Sum $d/*/*ys | utils/best_wer.sh; done
# This is about 0.6% worse than the corresponding results with cleanup.
AFTER MAX-CHANGE PER COMPONENT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this "AFTER MAX-CHANGE PER COMPONENT" line.. that's history now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, and the 0.6% worse is no longer 0.6% I'll fix it

--chain.l2-regularize 0.00005 \
--chain.apply-deriv-weights false \
--chain.lm-opts="--num-extra-lm-states=2000" \
--chain.lm-opts="--ngram-order=5 --num-extra-lm-states=2000" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you really find that --ngram-order=5 was better? By how much?
In general I don't like too much tuning that's specific to specific egs directories. People copy them to other setups, and I prefer to have settings that will work everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't recall exactly. It was very slightly better but not sure how much.
Also I am wondering if this is a change I may have reset after my first commit.
@david-ryan-snyder : in your run it was --ngram-order=5 or just no specified, ie default 4 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I mentioned what it was https://groups.google.com/forum/#!topic/kaldi-help/N4NeQ0g4B7Y
baseline: phone lm order 4 - --no-prune-ngram-order 3 - extra 2000
10.9 - 10.4 - 10.6 - 10.2
phone lm 5 - --no-prune-ngram-order 3 - extra 2000
10.5 - 10.2 - 10.5 - 10.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran it with what you had in the PR, --ngram-order=5.

@danpovey
Copy link
Contributor

danpovey commented Nov 26, 2016 via email

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Nov 26, 2016

I can rerun it on the CLSP grid without the --ngram-order=5.

@vince62s
Copy link
Contributor Author

yes go ahead, we'll see how it goes. I just pushed the change.

@david-ryan-snyder
Copy link
Contributor

david-ryan-snyder commented Nov 28, 2016

@danpovey, below are the results with ngram-order=4 on the CLSP grid. The average (across both cleaned and regular versions) WER is 9.43% for the ngram-order=4 and 9.63% with ngram-order=5. @vince62s, should we update the RESULTS file to reflect this?

%WER 9.8 | 507 17783 | 91.6 6.0 2.4 1.5 9.8 80.1 | -0.038 | exp/chain/tdnn_sp_bi/decode_dev/score_8_0.0/ctm.filt.filt.sys
%WER 9.1 | 507 17783 | 92.3 5.5 2.3 1.4 9.1 77.5 | 0.011 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys
%WER 9.9 | 1155 27500 | 91.4 5.7 2.9 1.3 9.9 74.9 | 0.083 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys
%WER 9.4 | 1155 27500 | 91.9 5.6 2.5 1.4 9.4 72.7 | 0.018 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_8_0.0/ctm.filt.filt.sys
%WER 9.7 | 507 17783 | 91.7 5.8 2.5 1.4 9.7 78.7 | 0.097 | exp/chain_cleaned/tdnn_sp_bi/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 9.0 | 507 17783 | 92.3 5.3 2.4 1.3 9.0 76.7 | 0.067 | exp/chain_cleaned/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 9.5 | 1155 27500 | 91.7 5.8 2.5 1.2 9.5 72.5 | 0.079 | exp/chain_cleaned/tdnn_sp_bi/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.0 | 1155 27500 | 92.2 5.3 2.5 1.2 9.0 71.3 | 0.064 | exp/chain_cleaned/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

@vince62s
Copy link
Contributor Author

I will update so but this is annoying how consistently you seem to have better results with n-gram 4 versus me on a single server better with n-gram 5.

@vince62s
Copy link
Contributor Author

ok I think now it should be ok to merge.

--chain.l2-regularize 0.00005 \
--chain.apply-deriv-weights false \
--chain.lm-opts="--num-extra-lm-states=2000" \
--chain.lm-opts="--ngram-order=4 --num-extra-lm-states=2000" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please remove the --ngram-order=4 since it's the default? Then should be ready to merge.

@danpovey
Copy link
Contributor

Thanks! Merging.

@danpovey danpovey merged commit b710d78 into kaldi-asr:master Nov 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants