Add a BPE-based recipe for IAM #2600

hhadian · 2018-08-07T18:30:51Z

Add a BPE version of the IAM recipe.

Since there were quite a lot of changes in the scripts, I created a v2 recipe (which only has e2e and e2e+chain training). I think there are useful scripts in the v1 recipe so it might be worth to keep it (maybe for a while).
Here is the comparison of the same model with/without BPE:

# local/chain/compare_wer.sh v1/exp/chain/cnn_e2eali_1b v2/exp/chain/cnn_e2eali_1b
# System                         non-BPE       BPE
# WER                             12.40     10.33
# WER (rescored)                     --     10.10
# CER                              5.59      5.00
# CER (rescored)                     --      4.88
# Final train prob              -0.0322   -0.0428
# Final valid prob              -0.0563   -0.0666
# Final train prob (xent)       -0.6891   -0.9210
# Final valid prob (xent)       -0.8309   -1.0264
# Parameters                      3.95M     3.98M

Also added (in v2) a new chain script (fewer but bigger layers + smaller l2 + dropout + more epochs) which improves 1b a bit:

# local/chain/compare_wer.sh exp/chain/cnn_e2eali_1b exp/chain/cnn_e2eali_1c
# System                      cnn_e2eali_1b cnn_e2eali_1c
# WER                             10.33     10.05
# WER (rescored)                  10.10      9.75
# CER                              5.00      4.76
# CER (rescored)                   4.88      4.68
# Final train prob              -0.0428   -0.0317
# Final valid prob              -0.0666   -0.0630
# Final train prob (xent)       -0.9210   -0.5413
# Final valid prob (xent)       -1.0264   -0.7096
# Parameters                      3.98M     5.12M

danpovey · 2018-08-07T18:32:22Z

I notice there no separate "rescored" line in the new results. Was the LM on the BPE system small enough to create a graph directly? Would there have been a benefit in using a higher-order LM, in that case, to rescore with?

…

On Tue, Aug 7, 2018 at 11:30 AM, Hossein Hadian ***@***.***> wrote: Add a BPE version of the IAM recipe. Since there were quite a lot of changes in the scripts, I created a v2 recipe (which only has e2e and e2e+chain training). I think there are useful scripts in the v1 recipe so it might be worth to keep it (maybe for a while). Here is the comparison of the same model with/without BPE: # local/chain/compare_wer.sh v1/exp/chain/cnn_e2eali_1b v2/exp/chain/cnn_e2eali_1b # System non-BPE BPE # WER 12.40 10.33 # WER (rescored) -- 10.10 # CER 5.59 5.00 # CER (rescored) -- 4.88 # Final train prob -0.0322 -0.0428 # Final valid prob -0.0563 -0.0666 # Final train prob (xent) -0.6891 -0.9210 # Final valid prob (xent) -0.8309 -1.0264 # Parameters 3.95M 3.98M Also added (in v2) a new chain script (fewer but bigger layers + smaller l2 + dropout + more epochs) which improves 1b a bit: # local/chain/compare_wer.sh exp/chain/cnn_e2eali_1b exp/chain/cnn_e2eali_1c # System cnn_e2eali_1b cnn_e2eali_1c # WER 10.33 10.05 # WER (rescored) 10.10 9.75 # CER 5.00 4.76 # CER (rescored) 4.88 4.68 # Final train prob -0.0428 -0.0317 # Final valid prob -0.0666 -0.0630 # Final train prob (xent) -0.9210 -0.5413 # Final valid prob (xent) -1.0264 -0.7096 # Parameters 3.98M 5.12M ------------------------------ You can view, comment on, or merge this pull request online at: #2600 Commit Summary - Add a BPE-based recipe for IAM - small fix File Changes - *A* egs/iam/v2/cmd.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-0> (13) - *A* egs/iam/v2/image <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-1> (1) - *A* egs/iam/v2/local/chain/compare_wer.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-2> (90) - *A* egs/iam/v2/local/chain/run_cnn_e2eali.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-3> (1) - *A* egs/iam/v2/local/chain/run_e2e_cnn.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-4> (170) - *A* egs/iam/v2/local/chain/tuning/run_cnn_e2eali_1a.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-5> (245) - *A* egs/iam/v2/local/chain/tuning/run_cnn_e2eali_1b.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-6> (251) - *A* egs/iam/v2/local/chain/tuning/run_cnn_e2eali_1c.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-7> (253) - *A* egs/iam/v2/local/check_tools.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-8> (43) - *A* egs/iam/v2/local/make_features.py <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-9> (127) - *A* egs/iam/v2/local/prepare_data.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-10> (170) - *A* egs/iam/v2/local/prepare_dict.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-11> (50) - *A* egs/iam/v2/local/prepend_words.py <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-12> (13) - *A* egs/iam/v2/local/process_data.py <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-13> (82) - *A* egs/iam/v2/local/remove_test_utterances_from_lob.py <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-14> (117) - *A* egs/iam/v2/local/remove_wellington_annotations.py <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-15> (32) - *A* egs/iam/v2/local/score.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-16> (155) - *A* egs/iam/v2/local/srilm_train.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-17> (49) - *A* egs/iam/v2/local/train_lm.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-18> (156) - *A* egs/iam/v2/local/wer_output_filter <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-19> (31) - *A* egs/iam/v2/path.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-20> (9) - *A* egs/iam/v2/run_end2end.sh <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-21> (104) - *A* egs/iam/v2/steps <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-22> (1) - *A* egs/iam/v2/utils <https://github.com/kaldi-asr/kaldi/pull/2600/files#diff-23> (1) Patch Links: - https://github.com/kaldi-asr/kaldi/pull/2600.patch - https://github.com/kaldi-asr/kaldi/pull/2600.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2600>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu5yM8m4leEX0U74CeIzG4i-gicomks5uOdzjgaJpZM4Vyqv9> .

hhadian · 2018-08-07T18:41:18Z

Do you mean for non-BPE? because there are actually "rescored" lines in the results for the BPE models.

The graph for the BPE system was very big so I used a highly-pruned 6-gram LM (created using pocolm) to create the graph and decode, and then rescore with an unpruned 6-gram pocoLM. I also tried rescoring with an 8-gram LM (created using SRILM) but it was not helpful.

danpovey · 2018-08-07T18:42:24Z

oh, I see.

…

On Tue, Aug 7, 2018 at 11:41 AM, Hossein Hadian ***@***.***> wrote: Do you mean for non-BPE? because there are actually "rescored" lines in the results for the BPE models. The graph for the BPE system was very big so I used a highly-pruned 6-gram LM (created using pocolm) to create the graph and decode, and then rescore with an unpruned 6-gram pocoLM. I also tried rescoring with an 8-gram LM (created using SRILM) but it was not helpful. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2600 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu2wmH8mrA-BYn1XLTfrU1-Ui7XNpks5uOd9SgaJpZM4Vyqv9> .

…sr#2600)

hhadian added 2 commits August 7, 2018 14:00

Add a BPE-based recipe for IAM

ea39395

small fix

3e0f6ae

Remove mistakenly-added file

c0a3c6d

Minor fixes

466958e

danpovey merged commit 6926b60 into kaldi-asr:master Aug 11, 2018

dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018

[egs] Add a BPE-based recipe for IAM handwriting recognition (kaldi-a…

3a07796

…sr#2600)

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[egs] Add a BPE-based recipe for IAM handwriting recognition (kaldi-a…

cfdffeb

…sr#2600)

hhadian deleted the add-iam-bpe branch October 4, 2018 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a BPE-based recipe for IAM #2600

Add a BPE-based recipe for IAM #2600

Uh oh!

hhadian commented Aug 7, 2018

Uh oh!

danpovey commented Aug 7, 2018 via email

Uh oh!

hhadian commented Aug 7, 2018

Uh oh!

danpovey commented Aug 7, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add a BPE-based recipe for IAM #2600

Add a BPE-based recipe for IAM #2600

Uh oh!

Conversation

hhadian commented Aug 7, 2018

Uh oh!

danpovey commented Aug 7, 2018 via email

Uh oh!

hhadian commented Aug 7, 2018

Uh oh!

danpovey commented Aug 7, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants