WIP : original LM from TedliumRelease2 #1164

vince62s · 2016-11-01T21:07:19Z

This is a modification of the s5_r2 recipe to take into account the LM from Tedlium2 paper.
Gives better results and much better results than in the Paper.
Please review.
Vincent

danpovey · 2016-11-01T21:12:01Z

@david-ryan-snyder, if you were going to test out the ivector change thing, perhaps you could do it on this setup and kill 2 birds with one stone, making sure this PR runs smoothly?

david-ryan-snyder · 2016-11-01T21:19:22Z

@danpovey, sure, no problem.

david-ryan-snyder · 2016-11-07T15:17:15Z

@danpovey, @vince62s sorry for the delay. I still haven't gotten to this but I will try to do so by the end of the week.

danpovey · 2016-11-15T23:09:26Z

@david-ryan-snyder, don't forget to test this!

david-ryan-snyder · 2016-11-16T18:24:10Z

On it now. Will update ASAP.

david-ryan-snyder · 2016-11-21T17:16:37Z

egs/tedlium/s5_r2/local/ted_train_lm.sh

@@ -59,7 +59,8 @@ if [ $stage -le 0 ]; then
  rm ${dir}/data/text/* 2>/dev/null || true

  # cantab-TEDLIUM is the larger data source.  gzip it.


The comment on line 61 is out of date, isn't it?

david-ryan-snyder · 2016-11-21T17:24:24Z

egs/tedlium/s5_r2/local/ted_train_lm.sh

  # cantab-TEDLIUM is the larger data source.  gzip it.
-  sed 's/ <\/s>//g' < db/cantab-TEDLIUM/cantab-TEDLIUM.txt | gzip -c  > ${dir}/data/text/train.txt.gz
+  gunzip db/TEDLIUM_release2/LM/*.en.gz
+  cat db/TEDLIUM_release2/LM/*.en | sed 's/ <\/s>//g' | local/join_suffix.py | gzip -c  > ${dir}/data/text/train.txt.gz


Do you need to unzip these files on the disk? If not, I think you should replace it with this:

gunzip -c db/TEDLIUM_release2/LM/*.en.gz | sed 's/ <\/s>//g' | local/join_suffix.py | gzip -c > ${dir}/data/text/train.txt.gz

david-ryan-snyder · 2016-11-21T17:30:04Z

egs/tedlium/s5_r2/local/ted_train_lm.sh

-  # get_data_prob.py: log-prob of data/local/local_lm/data/real_dev_set.txt given model data/local/local_lm/data/wordlist_4.pocolm was -5.13902242865 per word [perplexity = 170.514153159] over 18290.0 words.
-  # even older results, before adding min-counts:
-  # get_data_prob.py: log-prob of data/local/local_lm/data/real_dev_set.txt given model data/local/local_lm/data/lm_4 was -5.10576291033 per word [perplexity = 164.969879761] over 18290.0 words.
+  #[perplexity = 157.87] over 18290.0 words


The deleted lines appear to have provided a more detailed comment, about the logprob and the model being used. Is it possible to do the same with your updates?

david-ryan-snyder · 2016-11-21T17:38:18Z

egs/tedlium/s5_r2/run.sh

  # This will only work if you have GPUs on your system (and note that it requires
  # you to have the queue set up the right way... see kaldi-asr.org/doc/queue.html)
-  local/chain/run_tdnn.sh
+  local/chain/run_tdnn.sh --train-set train --gmm tri3 --nnet3-affix ""


In your RESULTS file, you say that

This is about 0.6% worse than the corresponding results with cleanup.

If that's the case, shouldn't the version with cleanup be the default here (like it was before)?

david-ryan-snyder · 2016-11-21T17:39:59Z

I did something dumb with the PCA vs LDA test I have to rerun something.

In the meantime, I added some comments to the pull request that I hope will be helpful in getting it through.

vince62s · 2016-11-21T22:24:14Z

I will fix the first 2 comments.
the last 2:
no I did not rerun with both min counts perplexities.
the 0.6% worse comment is a bad copy paste from Dan's previous results.
But I did not run the cleaned version.
If you do rerun the whole stuff to update results on a grid, it's better anyway.

danpovey · 2016-11-25T21:33:22Z

@david-ryan-snyder, is this ready to merge?

david-ryan-snyder · 2016-11-25T23:31:26Z

@danpovey, not quite. @vince62s said he didn't run the cleaned version with his changes. I ran both on the CLSP grid. Hopefully @vince62s can decide what he needs from the following results to complete his RESULTS file.

%WER 27.8 | 507 17783 | 75.7 17.5 6.8 3.4 27.8 96.6 | 0.071 | exp/tri1/decode_nosp_dev/score_10_0.0/ctm.filt.filt.sys
%WER 26.3 | 507 17783 | 76.8 16.1 7.1 3.1 26.3 95.9 | 0.080 | exp/tri1/decode_nosp_dev_rescore/score_11_0.0/ctm.filt.filt.sys
%WER 27.3 | 1155 27500 | 75.3 18.4 6.3 2.7 27.3 93.0 | 0.119 | exp/tri1/decode_nosp_test/score_11_0.0/ctm.filt.filt.sys
%WER 26.2 | 1155 27500 | 76.6 17.3 6.1 2.8 26.2 92.6 | 0.081 | exp/tri1/decode_nosp_test_rescore/score_11_0.0/ctm.filt.filt.sys
%WER 22.5 | 507 17783 | 80.5 14.0 5.5 3.1 22.5 94.7 | 0.092 | exp/tri2/decode_dev/score_15_0.0/ctm.filt.filt.sys
%WER 21.3 | 507 17783 | 81.8 13.1 5.1 3.1 21.3 93.7 | 0.038 | exp/tri2/decode_dev_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 23.6 | 507 17783 | 79.6 14.8 5.6 3.2 23.6 95.1 | 0.024 | exp/tri2/decode_nosp_dev/score_12_0.0/ctm.filt.filt.sys
%WER 22.3 | 507 17783 | 80.7 13.5 5.8 3.0 22.3 93.7 | -0.002 | exp/tri2/decode_nosp_dev_rescore/score_13_0.0/ctm.filt.filt.sys
%WER 23.2 | 1155 27500 | 79.5 15.5 5.0 2.7 23.2 91.1 | 0.070 | exp/tri2/decode_nosp_test/score_12_0.0/ctm.filt.filt.sys
%WER 21.9 | 1155 27500 | 80.7 14.6 4.7 2.6 21.9 90.2 | 0.026 | exp/tri2/decode_nosp_test_rescore/score_12_0.0/ctm.filt.filt.sys
%WER 22.1 | 1155 27500 | 80.7 14.9 4.3 2.8 22.1 90.6 | 0.089 | exp/tri2/decode_test/score_13_0.0/ctm.filt.filt.sys
%WER 20.9 | 1155 27500 | 81.9 14.0 4.1 2.8 20.9 90.5 | 0.046 | exp/tri2/decode_test_rescore/score_13_0.0/ctm.filt.filt.sys
%WER 19.0 | 507 17783 | 83.9 11.4 4.7 2.9 19.0 92.1 | -0.054 | exp/tri3_cleaned/decode_dev/score_13_0.5/ctm.filt.filt.sys
%WER 17.9 | 507 17783 | 85.1 10.5 4.4 3.0 17.9 90.9 | -0.055 | exp/tri3_cleaned/decode_dev_rescore/score_15_0.0/ctm.filt.filt.sys
%WER 22.9 | 507 17783 | 80.0 14.0 5.9 3.0 22.9 94.1 | -0.098 | exp/tri3_cleaned/decode_dev.si/score_14_0.5/ctm.filt.filt.sys
%WER 17.6 | 1155 27500 | 84.8 11.7 3.5 2.4 17.6 87.6 | 0.001 | exp/tri3_cleaned/decode_test/score_15_0.0/ctm.filt.filt.sys
%WER 16.6 | 1155 27500 | 85.8 10.9 3.4 2.4 16.6 86.4 | -0.058 | exp/tri3_cleaned/decode_test_rescore/score_15_0.0/ctm.filt.filt.sys
%WER 22.3 | 1155 27500 | 80.7 15.2 4.1 3.1 22.3 91.0 | -0.092 | exp/tri3_cleaned/decode_test.si/score_13_0.0/ctm.filt.filt.sys
%WER 18.7 | 507 17783 | 83.9 11.4 4.7 2.6 18.7 92.3 | -0.006 | exp/tri3/decode_dev/score_17_0.0/ctm.filt.filt.sys
%WER 17.6 | 507 17783 | 85.0 10.5 4.4 2.6 17.6 90.5 | -0.030 | exp/tri3/decode_dev_rescore/score_16_0.0/ctm.filt.filt.sys
%WER 22.8 | 507 17783 | 80.4 14.1 5.5 3.2 22.8 93.1 | -0.130 | exp/tri3/decode_dev.si/score_12_0.5/ctm.filt.filt.sys
%WER 17.6 | 1155 27500 | 84.7 11.6 3.7 2.4 17.6 87.2 | 0.013 | exp/tri3/decode_test/score_15_0.0/ctm.filt.filt.sys
%WER 16.7 | 1155 27500 | 85.7 10.9 3.4 2.4 16.7 86.4 | -0.044 | exp/tri3/decode_test_rescore/score_14_0.0/ctm.filt.filt.sys
%WER 22.3 | 1155 27500 | 80.6 15.2 4.1 3.0 22.3 91.3 | -0.076 | exp/tri3/decode_test.si/score_13_0.0/ctm.filt.filt.sys
%WER 9.8 | 507 17783 | 91.5 6.1 2.4 1.3 9.8 79.1 | 0.121 | exp/chain_cleaned/tdnn_sp_bi/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 9.1 | 507 17783 | 92.3 5.4 2.3 1.3 9.1 76.5 | 0.083 | exp/chain_cleaned/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 9.8 | 1155 27500 | 91.5 6.0 2.5 1.2 9.8 74.1 | 0.096 | exp/chain_cleaned/tdnn_sp_bi/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.3 | 1155 27500 | 91.9 5.6 2.5 1.2 9.3 72.6 | 0.073 | exp/chain_cleaned/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.1 | 507 17783 | 91.2 6.0 2.7 1.3 10.1 80.7 | 0.077 | exp/chain/tdnn_sp_bi/decode_dev/score_9_0.0/ctm.filt.filt.sys
%WER 9.3 | 507 17783 | 92.1 5.6 2.3 1.4 9.3 77.3 | 0.022 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys
%WER 10.1 | 1155 27500 | 91.2 5.9 2.9 1.3 10.1 74.5 | 0.076 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys
%WER 9.5 | 1155 27500 | 91.7 5.4 2.9 1.2 9.5 72.6 | 0.043 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_9_0.0/ctm.filt.filt.sys

david-ryan-snyder · 2016-11-25T23:40:24Z

@danpovey , the previous results used the normal LDA features for the ivector. Here are results using PCA.

%WER 9.5 (vs 9.8) | 507 17783 | 91.8 5.8 2.4 1.3 9.5 76.9 | 0.075 | exp/chain_cleaned/tdnn_sp_bi/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 8.8 (vs 9.1) | 507 17783 | 92.4 5.3 2.3 1.3 8.8 76.3 | 0.092 | exp/chain_cleaned/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 9.9 (vs 9.8) | 1155 27500 | 91.4 6.0 2.6 1.3 9.9 74.2 | 0.129 | exp/chain_cleaned/tdnn_sp_bi/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.3 (vs 9.3) | 1155 27500 | 91.9 5.6 2.5 1.3 9.3 72.3 | 0.100 | exp/chain_cleaned/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.0 (vs 10.1) | 507 17783 | 91.3 5.9 2.8 1.3 10.0 78.1 | 0.101 | exp/chain/tdnn_sp_bi/decode_dev/score_9_0.0/ctm.filt.filt.sys
%WER 9.2 (vs 9.3) | 507 17783 | 92.1 5.3 2.6 1.3 9.2 75.5 | 0.061 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_9_0.0/ctm.filt.filt.sys
%WER 10.2 (vs 10.1) | 1155 27500 | 91.0 6.0 3.0 1.2 10.2 75.5 | 0.063 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys
%WER 9.8 (vs 9.5) | 1155 27500 | 91.2 5.4 3.4 1.1 9.8 73.9 | 0.095 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

danpovey · 2016-11-25T23:44:03Z

not easy to compare as they look different... are the results about the same? how do they differ?

…

On Fri, Nov 25, 2016 at 6:40 PM, david-ryan-snyder ***@***.*** > wrote: @danpovey <https://github.com/danpovey> , the previous results used the normal LDA features for the ivector. Here are results using PCA. %WER 9.5 | 507 17783 | 91.8 5.8 2.4 1.3 9.5 76.9 | 0.075 | exp/chain_cleaned/tdnn_sp_bi/decode_dev/score_10_0.0/ctm.filt.filt.sys %WER 8.8 | 507 17783 | 92.4 5.3 2.3 1.3 8.8 76.3 | 0.092 | exp/chain_cleaned/tdnn_sp_bi/decode_dev_rescore/score_10_0. 0/ctm.filt.filt.sys %WER 9.9 | 1155 27500 | 91.4 6.0 2.6 1.3 9.9 74.2 | 0.129 | exp/chain_cleaned/tdnn_sp_bi/decode_test/score_10_0.0/ctm.filt.filt.sys %WER 9.3 | 1155 27500 | 91.9 5.6 2.5 1.3 9.3 72.3 | 0.100 | exp/chain_cleaned/tdnn_sp_bi/decode_test_rescore/score_10_ 0.0/ctm.filt.filt.sys %WER 10.0 | 507 17783 | 91.3 5.9 2.8 1.3 10.0 78.1 | 0.101 | exp/chain/tdnn_sp_bi/decode_dev/score_9_0.0/ctm.filt.filt.sys %WER 9.2 | 507 17783 | 92.1 5.3 2.6 1.3 9.2 75.5 | 0.061 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_9_0.0/ctm.filt.filt.sys %WER 10.2 | 1155 27500 | 91.0 6.0 3.0 1.2 10.2 75.5 | 0.063 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys %WER 9.8 | 1155 27500 | 91.2 5.4 3.4 1.1 9.8 73.9 | 0.095 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1164 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVux5rqnRv1qRE361JctX1g5Tlgt4Zks5rB3HpgaJpZM4KmlLW> .

david-ryan-snyder · 2016-11-25T23:54:51Z

I added a (vs XX.X) in each line so that you can see the corresponding results with LDA.

Bottom line is, PCA is better in 4 (out of 8) of the results, LDA is better in 3, and they're the same in 1. After averaging all 8 results, PCA is 9.59% and LDA Is 9.63%.

danpovey · 2016-11-25T23:59:19Z

Is this the first time you tested it after your fix? Maybe we should test on 1 other setup before committing.

…

On Fri, Nov 25, 2016 at 6:54 PM, david-ryan-snyder ***@***.*** > wrote: I added a (vs XX.X) in each line so that you can see the corresponding results with LDA. Bottom line is, PCA is better in 4 (out of 8) of the results, LDA is better in 3, and they're the same in 1. After averaging all 8 results, PCA is 9.59% and LDA Is 9.63%. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1164 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuzw-gSPFiL1JxL5JNz_CCmw3GS_Lks5rB3VNgaJpZM4KmlLW> .

david-ryan-snyder · 2016-11-26T00:02:44Z

@danpovey are you referring to @vince62s's LM stuff or the PCA vs LDA stuff?

danpovey · 2016-11-26T00:10:56Z

No, the PCA vs LDA stuff.

…

On Fri, Nov 25, 2016 at 7:02 PM, david-ryan-snyder ***@***.*** > wrote: @danpovey <https://github.com/danpovey> are you referring to the @vince62s <https://github.com/vince62s> or the PCA vs LDA stuff? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1164 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu-5cquD62MbES78auRPf6Ah7Xq9nks5rB3cmgaJpZM4KmlLW> .

david-ryan-snyder · 2016-11-26T00:12:36Z

There was no fix. The PCA vs LDA results in #1123 use the same setup as here.

david-ryan-snyder · 2016-11-26T00:14:21Z

I did something dumb with the PCA vs LDA test I have to rerun something.

@danpovey I was just referring to this PR in particular. The earlier PCA vs LDA results in #1123 are still valid.

david-ryan-snyder · 2016-11-26T00:42:46Z

@danpovey Since we want to try the PCA vs LDA thing on more recipes, is it OK if we allow @vince62s to finish up the PR without that change? Since we already know the results, it won't need to be rerun later, when we decide to add in PCA for ivectors.

@vince62s I think the main thing is that you need to update your RESULTS file with the cleaned results I posted in an earlier comment. Also, since the cleaned results are better, I imagine you will want to make them the default in the run.sh file (like it was before).

danpovey · 2016-11-26T00:43:39Z

Yes that's fine.

…

On Fri, Nov 25, 2016 at 7:42 PM, david-ryan-snyder ***@***.*** > wrote: @danpovey <https://github.com/danpovey> Since we want to try the PCA vs LDA thing on more recipes, is it OK if we allow @vince62s <https://github.com/vince62s> to finish up the PR without that change? Since we already know the results, it won't need to be rerun later, when we decide to add in PCA for ivectors. @vince62s <https://github.com/vince62s> I think the main thing is that you need to update your RESULTS file with the cleaned results I posted in an earlier comment. Also, since the cleaned results are better, I imagine you will want to make them the default in the run.sh file (like it was before). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1164 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu9CCrPpa4TGb_1EVIW1ZM7nBvDhNks5rB4CHgaJpZM4KmlLW> .

vince62s · 2016-11-26T08:04:06Z

well, what should we do ? just remove all the old LM results or leave them for reference ?

also in @david-ryan-snyder results, there is no "standard" nnet3 results, do I just omit / remove from the file ?

david-ryan-snyder · 2016-11-26T16:09:36Z

I think that's an @danpovey question.

I can run the other nnet3 results if we need them.

danpovey · 2016-11-26T19:09:22Z

The RESULTS file can have both the new and old results if you want, just make sure the new ones are displayed more prominently, e.g. first have the new results, and then say, "below this point are the results before we changed the LM [describe what you changed]; these old results are more complete and include neural net results."

…

On Sat, Nov 26, 2016 at 11:09 AM, david-ryan-snyder < ***@***.***> wrote: I think that's an @danpovey <https://github.com/danpovey> question. I can run the other nnet3 results if we need them. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1164 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu-oUdY55eJSt9kZfT4MnoGwPaLJaks5rCFnBgaJpZM4KmlLW> .

danpovey · 2016-11-26T20:28:49Z

egs/tedlium/s5_r2/RESULTS

+  # local/chain/run_tdnn.sh --train-set train --gmm tri3 --nnet3-affix ""
+  # for d in exp/chain/tdnn_sp_bi/decode_*; do grep Sum $d/*/*ys | utils/best_wer.sh; done
+  # This is about 0.6% worse than the corresponding results with cleanup.
+AFTER MAX-CHANGE PER COMPONENT


Please remove this "AFTER MAX-CHANGE PER COMPONENT" line.. that's history now.

right, and the 0.6% worse is no longer 0.6% I'll fix it

danpovey · 2016-11-26T20:29:49Z

egs/tedlium/s5_r2/local/chain/run_tdnn.sh

    --chain.l2-regularize 0.00005 \
    --chain.apply-deriv-weights false \
-    --chain.lm-opts="--num-extra-lm-states=2000" \
+    --chain.lm-opts="--ngram-order=5 --num-extra-lm-states=2000" \


Did you really find that --ngram-order=5 was better? By how much?
In general I don't like too much tuning that's specific to specific egs directories. People copy them to other setups, and I prefer to have settings that will work everywhere.

hmm I don't recall exactly. It was very slightly better but not sure how much.
Also I am wondering if this is a change I may have reset after my first commit.
@david-ryan-snyder : in your run it was --ngram-order=5 or just no specified, ie default 4 ?

here I mentioned what it was https://groups.google.com/forum/#!topic/kaldi-help/N4NeQ0g4B7Y
baseline: phone lm order 4 - --no-prune-ngram-order 3 - extra 2000
10.9 - 10.4 - 10.6 - 10.2
phone lm 5 - --no-prune-ngram-order 3 - extra 2000
10.5 - 10.2 - 10.5 - 10.1

I ran it with what you had in the PR, --ngram-order=5.

danpovey · 2016-11-26T21:13:34Z

I tested that option on another setup and did not find it helpful (may have been a little worse), so decided not to go for that. Can you please change it back to 4 and we'll rerun that part?

…

On Sat, Nov 26, 2016 at 4:12 PM, david-ryan-snyder ***@***.*** > wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/tedlium/s5_r2/local/chain/run_tdnn.sh <#1164>: > @@ -149,7 +149,7 @@ if [ $stage -le 18 ]; then --chain.leaky-hmm-coefficient 0.1 \ --chain.l2-regularize 0.00005 \ --chain.apply-deriv-weights false \ - --chain.lm-opts="--num-extra-lm-states=2000" \ + --chain.lm-opts="--ngram-order=5 --num-extra-lm-states=2000" \ I ran it with what you had in the PR, --ngram-order=5. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1164>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu3WXd-sNrHPxfD0Z58fZ5zY72Z4Sks5rCKCygaJpZM4KmlLW> .

david-ryan-snyder · 2016-11-26T21:18:22Z

I can rerun it on the CLSP grid without the --ngram-order=5.

vince62s · 2016-11-26T21:22:41Z

yes go ahead, we'll see how it goes. I just pushed the change.

david-ryan-snyder · 2016-11-28T05:34:24Z

@danpovey, below are the results with ngram-order=4 on the CLSP grid. The average (across both cleaned and regular versions) WER is 9.43% for the ngram-order=4 and 9.63% with ngram-order=5. @vince62s, should we update the RESULTS file to reflect this?

%WER 9.8 | 507 17783 | 91.6 6.0 2.4 1.5 9.8 80.1 | -0.038 | exp/chain/tdnn_sp_bi/decode_dev/score_8_0.0/ctm.filt.filt.sys
%WER 9.1 | 507 17783 | 92.3 5.5 2.3 1.4 9.1 77.5 | 0.011 | exp/chain/tdnn_sp_bi/decode_dev_rescore/score_8_0.0/ctm.filt.filt.sys
%WER 9.9 | 1155 27500 | 91.4 5.7 2.9 1.3 9.9 74.9 | 0.083 | exp/chain/tdnn_sp_bi/decode_test/score_9_0.0/ctm.filt.filt.sys
%WER 9.4 | 1155 27500 | 91.9 5.6 2.5 1.4 9.4 72.7 | 0.018 | exp/chain/tdnn_sp_bi/decode_test_rescore/score_8_0.0/ctm.filt.filt.sys
%WER 9.7 | 507 17783 | 91.7 5.8 2.5 1.4 9.7 78.7 | 0.097 | exp/chain_cleaned/tdnn_sp_bi/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 9.0 | 507 17783 | 92.3 5.3 2.4 1.3 9.0 76.7 | 0.067 | exp/chain_cleaned/tdnn_sp_bi/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 9.5 | 1155 27500 | 91.7 5.8 2.5 1.2 9.5 72.5 | 0.079 | exp/chain_cleaned/tdnn_sp_bi/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.0 | 1155 27500 | 92.2 5.3 2.5 1.2 9.0 71.3 | 0.064 | exp/chain_cleaned/tdnn_sp_bi/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

vince62s · 2016-11-28T06:54:29Z

I will update so but this is annoying how consistently you seem to have better results with n-gram 4 versus me on a single server better with n-gram 5.

vince62s · 2016-11-28T08:11:06Z

ok I think now it should be ok to merge.

danpovey · 2016-11-28T20:19:47Z

egs/tedlium/s5_r2/local/chain/run_tdnn.sh

    --chain.l2-regularize 0.00005 \
    --chain.apply-deriv-weights false \
-    --chain.lm-opts="--num-extra-lm-states=2000" \
+    --chain.lm-opts="--ngram-order=4 --num-extra-lm-states=2000" \


can you please remove the --ngram-order=4 since it's the default? Then should be ready to merge.

danpovey · 2016-11-28T20:55:41Z

Thanks! Merging.

original LM from TedliumRelease2

903be5a

david-ryan-snyder reviewed Nov 21, 2016

View reviewed changes

minor fixes as per D Snyder's comment

8574e01

results update based on David Snyder's run.

f5a51fe

danpovey reviewed Nov 26, 2016

View reviewed changes

minor fix

b3c7ed2

phone lm order change

cdde886

last results with n-gram 4 (phone LM)

21acdda

danpovey reviewed Nov 28, 2016

View reviewed changes

minor fix removing default ngram

057eaad

danpovey merged commit b710d78 into kaldi-asr:master Nov 28, 2016

		@@ -59,7 +59,8 @@ if [ $stage -le 0 ]; then
		rm ${dir}/data/text/* 2>/dev/null \|\| true

		# cantab-TEDLIUM is the larger data source. gzip it.

WIP : original LM from TedliumRelease2 #1164

WIP : original LM from TedliumRelease2 #1164

Uh oh!

Conversation

vince62s commented Nov 1, 2016

Uh oh!

danpovey commented Nov 1, 2016

Uh oh!

david-ryan-snyder commented Nov 1, 2016

Uh oh!

david-ryan-snyder commented Nov 7, 2016

Uh oh!

danpovey commented Nov 15, 2016

Uh oh!

david-ryan-snyder commented Nov 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

david-ryan-snyder Nov 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

This is about 0.6% worse than the corresponding results with cleanup.

Uh oh!

david-ryan-snyder commented Nov 21, 2016

Uh oh!

vince62s commented Nov 21, 2016

Uh oh!

danpovey commented Nov 25, 2016

Uh oh!

david-ryan-snyder commented Nov 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-ryan-snyder commented Nov 25, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danpovey commented Nov 25, 2016 via email

Uh oh!

david-ryan-snyder commented Nov 25, 2016

Uh oh!

danpovey commented Nov 25, 2016 via email

Uh oh!

david-ryan-snyder commented Nov 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danpovey commented Nov 26, 2016 via email

Uh oh!

david-ryan-snyder commented Nov 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-ryan-snyder commented Nov 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

david-ryan-snyder commented Nov 26, 2016

Uh oh!

danpovey commented Nov 26, 2016 via email

Uh oh!

vince62s commented Nov 26, 2016

Uh oh!

david-ryan-snyder commented Nov 26, 2016

Uh oh!

danpovey commented Nov 26, 2016 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Nov 26, 2016 via email

Uh oh!

david-ryan-snyder Nov 21, 2016 •

edited

Loading

david-ryan-snyder commented Nov 25, 2016 •

edited

Loading

david-ryan-snyder commented Nov 25, 2016 •

edited

Loading

david-ryan-snyder commented Nov 26, 2016 •

edited

Loading

david-ryan-snyder commented Nov 26, 2016 •

edited

Loading

david-ryan-snyder commented Nov 26, 2016 •

edited

Loading

david-ryan-snyder commented Nov 26, 2016 •

edited

Loading

david-ryan-snyder commented Nov 28, 2016 •

edited

Loading