-
Notifications
You must be signed in to change notification settings - Fork 5.4k
[egs, script] Zeroth-Korean: Korean open-source corpus and its script #2296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
44 commits
Select commit
Hold shift + click to select a range
06f0bc8
Merge remote-tracking branch 'kaldi-asr/master'
d03fa19
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml efdd432
Merge remote-tracking branch 'kaldi-asr/master'
cc19bbf
Merge remote-tracking branch 'kaldi-asr/master'
1887b1d
Merge remote-tracking branch 'kaldi-asr/master'
954bdb5
Merge remote-tracking branch 'kaldi-asr/master'
87d3740
Merge remote-tracking branch 'kaldi-asr/master'
c0c8039
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml 41f3d33
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml c4ec4a2
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml ac8551f
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml f83073b
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml df3826d
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml ceb9d32
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml ff1a880
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml 0e5f310
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml ee2612d
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml 13071f1
initial setting
d2856ba
main script
9dc8f81
cleaning script
wonkyuml 6bd06af
cmd.sh cleaninig
wonkyuml b00a813
run.sh script fix
wonkyuml 8b499a5
add RESULTS page with minor typo fix
wonkyuml 5c22bab
run_tdnn_1a.sh fix
wonkyuml 73b9bdb
tdnn_opgru_1a change
wonkyuml bc67d9f
add README.txt
wonkyuml 6554ff0
compare_wer.sh script
wonkyuml 7e82148
result and diagnostics added
wonkyuml 62d6b36
frames-per-chunk added on decoding script
wonkyuml 8ec0079
chunk left right
wonkyuml 3526f60
omit $mfccdir
wonkyuml 6d19ab2
removed locale dependency
11d1a07
removed locale dependency
lucas-jo f9dca8f
Merge branch 'zeroth_egs' of https://github.com/wonkyuml/kaldi into z…
lucas-jo c75b1f8
changed filename
lucas-jo d1b2277
re-indented with no tab
lucas-jo 17378f2
changed to use PCA instead of LDA+MLLT
lucas-jo 90400dd
added -bash on echo statements
lucas-jo 6d010aa
fix pointing update_segmentation.sh in run.sh
bd8094f
simplified and added echo statement
7b55b5f
results updated
cb817af
data prep interface change
6259aed
cosmetic fix for ivector script
7e14701
increase parameter for TDNN-F
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| Zeroth-Korean kaldi example is from Zeroth Project. Zeroth project introduces free Korean speech corpus and aims to make Korean speech recognition more broadly accessible to everyone. This project was developed in collaboration between Lucas Jo(@Atlas Guide Inc.) and Wonkyum Lee(@Gridspace Inc.). | ||
|
|
||
| In this example, we are using 51.6 hours transcribed Korean audio for training data (22,263 utterances, 105 people, 3000 sentences) and 1.2 hours transcribed Korean audio for testing data (457 utterances, 10 people). Besides audio and transcription, we provide pre-trained/designed language model, lexicon and morpheme-based segmenter(morfessor) | ||
|
|
||
| The database can be also downloaded from openslr: | ||
| http://www.openslr.org/40 | ||
|
|
||
| The database is licensed under Attribution 4.0 International (CC BY 4.0) | ||
|
|
||
| This folder contains a speech recognition recipe which is based on WSJ/Librispeech example. | ||
|
|
||
| For more details about Zeroth project, please visit: | ||
| https://github.com/goodatlas/zeroth |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| #!/bin/bash | ||
|
|
||
| # this RESULTS file was obtained by Wonkyum Lee in July 2018. | ||
|
|
||
| for dir in exp/*; do | ||
| steps/info/gmm_dir_info.pl $dir | ||
| for x in $dir/decode*test*; do [ -d $x ] && [[ $x =~ "$1" ]] && grep WER $x/wer_* | utils/best_wer.sh; done | ||
| done | ||
| exit 0 | ||
|
|
||
| # monophone, trained on the 2k shortest utterances | ||
| exp/mono: nj=16 align prob=-99.85 over 2.66h [retry=0.8%, fail=0.3%] states=130 gauss=1004 | ||
| %WER 70.24 [ 6499 / 9253, 295 ins, 1399 del, 4805 sub ] exp/mono/decode_nosp_fglarge_test_clean/wer_8_0.5 | ||
| %WER 71.28 [ 6596 / 9253, 185 ins, 1721 del, 4690 sub ] exp/mono/decode_nosp_tglarge_test_clean/wer_9_1.0 | ||
| %WER 78.83 [ 7294 / 9253, 218 ins, 1752 del, 5324 sub ] exp/mono/decode_nosp_tgsmall_test_clean/wer_10_0.0 | ||
|
|
||
| # first triphone build, trained on 5k utterances | ||
| exp/tri1: nj=16 align prob=-98.34 over 11.55h [retry=1.6%, fail=0.6%] states=1568 gauss=10030 tree-impr=4.07 | ||
| %WER 37.44 [ 3464 / 9253, 258 ins, 725 del, 2481 sub ] exp/tri1/decode_nosp_fglarge_test_clean/wer_15_0.5 | ||
| %WER 38.85 [ 3595 / 9253, 347 ins, 633 del, 2615 sub ] exp/tri1/decode_nosp_tglarge_test_clean/wer_15_0.0 | ||
| %WER 53.23 [ 4925 / 9253, 296 ins, 1060 del, 3569 sub ] exp/tri1/decode_nosp_tgsmall_test_clean/wer_15_0.0 | ||
|
|
||
| # tri2 is an LDA+MLLT systemm, trained on 10k utterances | ||
| exp/tri2: nj=16 align prob=-49.63 over 23.00h [retry=1.7%, fail=0.8%] states=2000 gauss=15039 tree-impr=4.70 lda-sum=18.11 mllt:impr,logdet=0.99,1.39 | ||
| %WER 33.50 [ 3100 / 9253, 248 ins, 626 del, 2226 sub ] exp/tri2/decode_nosp_fglarge_test_clean/wer_16_0.5 | ||
| %WER 34.55 [ 3197 / 9253, 315 ins, 537 del, 2345 sub ] exp/tri2/decode_nosp_tglarge_test_clean/wer_16_0.0 | ||
| %WER 48.98 [ 4532 / 9253, 303 ins, 903 del, 3326 sub ] exp/tri2/decode_nosp_tgsmall_test_clean/wer_14_0.0 | ||
|
|
||
| # tri3 is an LDA+MLLT+SAT system, trained on entire clean training set | ||
| exp/tri3: nj=16 align prob=-48.95 over 51.22h [retry=1.6%, fail=0.7%] states=3336 gauss=40065 fmllr-impr=2.72 over 19.18h tree-impr=7.23 | ||
| %WER 23.89 [ 2211 / 9253, 233 ins, 404 del, 1574 sub ] exp/tri3/decode_nosp_fglarge_test_clean/wer_15_0.0 | ||
| %WER 24.47 [ 2264 / 9253, 252 ins, 385 del, 1627 sub ] exp/tri3/decode_nosp_tglarge_test_clean/wer_13_0.0 | ||
| %WER 37.81 [ 3499 / 9253, 274 ins, 671 del, 2554 sub ] exp/tri3/decode_nosp_tgsmall_test_clean/wer_13_0.0 | ||
| %WER 49.00 [ 4534 / 9253, 302 ins, 874 del, 3358 sub ] exp/tri3/decode_nosp_tgsmall_test_clean.si/wer_14_0.0 | ||
| %WER 21.68 [ 2006 / 9253, 226 ins, 346 del, 1434 sub ] exp/tri3/decode_fglarge_test_clean/wer_15_0.0 | ||
| %WER 22.59 [ 2090 / 9253, 231 ins, 372 del, 1487 sub ] exp/tri3/decode_tglarge_test_clean/wer_15_0.0 | ||
| %WER 34.83 [ 3223 / 9253, 294 ins, 605 del, 2324 sub ] exp/tri3/decode_tgsmall_test_clean/wer_12_0.0 | ||
| %WER 45.28 [ 4190 / 9253, 270 ins, 880 del, 3040 sub ] exp/tri3/decode_tgsmall_test_clean.si/wer_15_0.0 | ||
|
|
||
| # tri4 is an LDA+MLLT+SAT system after estimating pronunciation probabilities | ||
| # and word-and-pronunciation-dependent silence probabilities. | ||
| exp/tri4: nj=16 align prob=-48.70 over 51.22h [retry=1.5%, fail=0.7%] states=3368 gauss=40039 fmllr-impr=0.23 over 42.91h tree-impr=7.87 | ||
| %WER 21.61 [ 2000 / 9253, 210 ins, 379 del, 1411 sub ] exp/tri4/decode_fglarge_test_clean/wer_14_0.5 | ||
| %WER 22.59 [ 2090 / 9253, 237 ins, 371 del, 1482 sub ] exp/tri4/decode_tglarge_test_clean/wer_15_0.0 | ||
| %WER 34.57 [ 3199 / 9253, 285 ins, 595 del, 2319 sub ] exp/tri4/decode_tgsmall_test_clean/wer_12_0.0 | ||
| %WER 45.82 [ 4240 / 9253, 270 ins, 833 del, 3137 sub ] exp/tri4/decode_tgsmall_test_clean.si/wer_13_0.0 | ||
|
|
||
| for dir in exp/chain/tdnn*_sp; do | ||
| steps/info/chain_dir_info.pl $dir | ||
| for x in ${dir}_online/decode*test*; do [ -d $x ] && [[ $x =~ "$1" ]] && grep WER $x/wer_* | utils/best_wer.sh; done | ||
| done | ||
| exit 0 | ||
|
|
||
| # tdnn_1a is a kind of factorized TDNN, with skip connections. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These systems seem to be underfitting-- I don't see any train/valid difference. Therefore you might want to choose a larger config for this TDNN-F system. (the one with OPGRU is already quite large but you could try more epochs). |
||
| exp/chain/tdnn1b_sp: num-iters=174 nj=2..8 num-params=12.9M dim=40+100->3040 combine=-0.041->-0.041 (over 2) xent:train/valid[115,173,final]=(-1.14,-0.759,-0.751/-1.14,-0.788,-0.777) logprob:train/valid[115,173,final]=(-0.084,-0.047,-0.046/-0.080,-0.050,-0.048) | ||
| %WER 10.55 [ 976 / 9253, 122 ins, 166 del, 688 sub ] exp/chain/tdnn1b_sp_online/decode_fglarge_test_clean/wer_13_1.0 | ||
| %WER 17.65 [ 1633 / 9253, 208 ins, 233 del, 1192 sub ] exp/chain/tdnn1b_sp_online/decode_tgsmall_test_clean/wer_10_0.0 | ||
|
|
||
| # This chain system has TDNN+Norm-OPGRU architecture. | ||
| exp/chain/tdnn_opgru1a_sp: num-iters=99 nj=2..12 num-params=38.0M dim=40+100->3040 combine=-0.045->-0.045 (over 1) xent:train/valid[65,98,final]=(-1.18,-0.663,-0.651/-1.21,-0.698,-0.684) logprob:train/valid[65,98,final]=(-0.079,-0.038,-0.037/-0.076,-0.040,-0.039) | ||
| %WER 9.45 [ 874 / 9253, 109 ins, 159 del, 606 sub ] exp/chain/tdnn_opgru1a_sp_online/decode_fglarge_test_clean/wer_10_1.0 | ||
| %WER 15.22 [ 1408 / 9253, 175 ins, 196 del, 1037 sub ] exp/chain/tdnn_opgru1a_sp_online/decode_tgsmall_test_clean/wer_8_0.0 | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| # you can change cmd.sh depending on what type of queue you are using. | ||
| # If you have no queueing system and want to run on a local machine, you | ||
| # can change all instances 'queue.pl' to run.pl (but be careful and run | ||
| # commands one by one: most recipes will exhaust the memory on your | ||
| # machine). queue.pl works with GridEngine (qsub). slurm.pl works | ||
| # with slurm. Different queues are configured differently, with different | ||
| # queue names and different ways of specifying things like memory; | ||
| # to account for these differences you can create and edit the file | ||
| # conf/queue.conf to match your queue's configuration. Search for | ||
| # conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information, | ||
| # or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl. | ||
|
|
||
| export train_cmd="queue.pl --mem 2G" | ||
| export decode_cmd="queue.pl --mem 4G" | ||
| export mkgraph_cmd="queue.pl --mem 8G" | ||
| export normalize_cmd="queue.pl --mem 4G" | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # empty config, just use the defaults. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| --use-energy=false # only non-default option. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| # config for high-resolution MFCC features, intended for neural network training | ||
| # Note: we keep all cepstra, so it has the same info as filterbank features, | ||
| # but MFCC is more easily compressible (because less correlated) which is why | ||
| # we prefer this method. | ||
| --use-energy=false # use average of log energy, not energy. | ||
| --num-mel-bins=40 # similar to Google's setup. | ||
| --num-ceps=40 # there is no dimensionality reduction. | ||
| --low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so | ||
| # there might be some information at the low end. | ||
| --high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,107 @@ | ||
| #!/bin/bash | ||
|
|
||
| # this script is used for comparing decoding results between systems. | ||
| # e.g. local/chain/compare_wer.sh exp/chain/tdnn_{c,d}_sp | ||
| # For use with discriminatively trained systems you specify the epochs after a colon: | ||
| # for instance, | ||
| # local/chain/compare_wer.sh exp/chain/tdnn_c_sp exp/chain/tdnn_c_sp_smbr:{1,2,3} | ||
|
|
||
|
|
||
| if [ $# == 0 ]; then | ||
| echo "Usage: $0: <dir1> [<dir2> ... ]" | ||
| echo "e.g.: $0 exp/chain/tdnn_{b,c}_sp" | ||
| echo "or (with epoch numbers for discriminative training):" | ||
| echo "$0 exp/chain/tdnn_b_sp_disc:{1,2,3}" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "# $0 $*" | ||
|
|
||
| used_epochs=false | ||
|
|
||
| # this function set_names is used to separate the epoch-related parts of the name | ||
| # [for discriminative training] and the regular parts of the name. | ||
| # If called with a colon-free directory name, like: | ||
| # set_names exp/chain/tdnn_lstm1e_sp_bi_smbr | ||
| # it will set dir=exp/chain/tdnn_lstm1e_sp_bi_smbr and epoch_infix="" | ||
| # If called with something like: | ||
| # set_names exp/chain/tdnn_d_sp_smbr:3 | ||
| # it will set dir=exp/chain/tdnn_d_sp_smbr and epoch_infix="_epoch3" | ||
|
|
||
|
|
||
| set_names() { | ||
| if [ $# != 1 ]; then | ||
| echo "compare_wer_general.sh: internal error" | ||
| exit 1 # exit the program | ||
| fi | ||
| dirname=$(echo $1 | cut -d: -f1) | ||
| epoch=$(echo $1 | cut -s -d: -f2) | ||
| if [ -z $epoch ]; then | ||
| epoch_infix="" | ||
| else | ||
| used_epochs=true | ||
| epoch_infix=_epoch${epoch} | ||
| fi | ||
| } | ||
|
|
||
|
|
||
|
|
||
| echo -n "# System " | ||
| for x in $*; do printf "% 10s" " $(basename $x)"; done | ||
| echo | ||
|
|
||
| strings=( | ||
| "#WER test_clean (tgsmall) " | ||
| "#WER test_clean (fglarge) ") | ||
|
|
||
| for n in 0 1 ; do | ||
| echo -n "${strings[$n]}" | ||
| for x in $*; do | ||
| set_names $x # sets $dirname and $epoch_infix | ||
| decode_names=(tgsmall_test_clean fglarge_test_clean) | ||
|
|
||
| wer=$(grep WER ${dirname}_online/decode_${decode_names[$n]}/wer_* | utils/best_wer.sh | awk '{print $2}') | ||
| printf "% 10s" $wer | ||
| done | ||
| echo | ||
| done | ||
|
|
||
|
|
||
| if $used_epochs; then | ||
| exit 0; # the diagnostics aren't comparable between regular and discriminatively trained systems. | ||
| fi | ||
|
|
||
|
|
||
| echo -n "# Final train prob " | ||
| for x in $*; do | ||
| prob=$(grep Overall $x/log/compute_prob_train.final.log | grep -v xent | awk '{printf("%.4f", $8)}') | ||
| printf "% 10s" $prob | ||
| done | ||
| echo | ||
|
|
||
| echo -n "# Final valid prob " | ||
| for x in $*; do | ||
| prob=$(grep Overall $x/log/compute_prob_valid.final.log | grep -v xent | awk '{printf("%.4f", $8)}') | ||
| printf "% 10s" $prob | ||
| done | ||
| echo | ||
|
|
||
| echo -n "# Final train prob (xent)" | ||
| for x in $*; do | ||
| prob=$(grep Overall $x/log/compute_prob_train.final.log | grep -w xent | awk '{printf("%.4f", $8)}') | ||
| printf "% 10s" $prob | ||
| done | ||
| echo | ||
|
|
||
| echo -n "# Final valid prob (xent)" | ||
| for x in $*; do | ||
| prob=$(grep Overall $x/log/compute_prob_valid.final.log | grep -w xent | awk '{printf("%.4f", $8)}') | ||
| printf "% 10s" $prob | ||
| done | ||
| echo | ||
|
|
||
| echo -n "# Num-params " | ||
| for x in $*; do | ||
| printf "% 10s" $(grep num-parameters $x/log/progress.1.log | awk '{print $2}') | ||
| done | ||
| echo |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| tuning/run_tdnn_1a.sh |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| tuning/run_tdnn_opgru_1a.sh |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a file s5/README.txt that explains something about the data: what type of data, how much of it, how you can obtain it, what the license is, things like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added README.txt