Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
06f0bc8
Merge remote-tracking branch 'kaldi-asr/master'
Aug 20, 2016
d03fa19
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Aug 24, 2016
efdd432
Merge remote-tracking branch 'kaldi-asr/master'
Aug 18, 2017
cc19bbf
Merge remote-tracking branch 'kaldi-asr/master'
Aug 19, 2017
1887b1d
Merge remote-tracking branch 'kaldi-asr/master'
Aug 24, 2017
954bdb5
Merge remote-tracking branch 'kaldi-asr/master'
Aug 24, 2017
87d3740
Merge remote-tracking branch 'kaldi-asr/master'
Aug 25, 2017
c0c8039
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Aug 26, 2017
41f3d33
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Aug 29, 2017
c4ec4a2
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Aug 29, 2017
ac8551f
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Aug 30, 2017
f83073b
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Sep 1, 2017
df3826d
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Sep 5, 2017
ceb9d32
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Oct 3, 2017
ff1a880
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Oct 17, 2017
0e5f310
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Feb 22, 2018
ee2612d
Merge remote-tracking branch 'kaldi-asr/master'
wonkyuml Mar 20, 2018
13071f1
initial setting
Mar 20, 2018
d2856ba
main script
Mar 20, 2018
9dc8f81
cleaning script
wonkyuml Jul 10, 2018
6bd06af
cmd.sh cleaninig
wonkyuml Jul 10, 2018
b00a813
run.sh script fix
wonkyuml Jul 10, 2018
8b499a5
add RESULTS page with minor typo fix
wonkyuml Jul 11, 2018
5c22bab
run_tdnn_1a.sh fix
wonkyuml Jul 12, 2018
73b9bdb
tdnn_opgru_1a change
wonkyuml Jul 13, 2018
bc67d9f
add README.txt
wonkyuml Jul 13, 2018
6554ff0
compare_wer.sh script
wonkyuml Jul 13, 2018
7e82148
result and diagnostics added
wonkyuml Jul 13, 2018
62d6b36
frames-per-chunk added on decoding script
wonkyuml Jul 13, 2018
8ec0079
chunk left right
wonkyuml Jul 13, 2018
3526f60
omit $mfccdir
wonkyuml Jul 13, 2018
6d19ab2
removed locale dependency
Aug 20, 2018
11d1a07
removed locale dependency
lucas-jo Aug 20, 2018
f9dca8f
Merge branch 'zeroth_egs' of https://github.com/wonkyuml/kaldi into z…
lucas-jo Aug 20, 2018
c75b1f8
changed filename
lucas-jo Aug 20, 2018
d1b2277
re-indented with no tab
lucas-jo Aug 20, 2018
17378f2
changed to use PCA instead of LDA+MLLT
lucas-jo Aug 20, 2018
90400dd
added -bash on echo statements
lucas-jo Aug 20, 2018
6d010aa
fix pointing update_segmentation.sh in run.sh
Aug 27, 2018
bd8094f
simplified and added echo statement
Aug 27, 2018
7b55b5f
results updated
Aug 28, 2018
cb817af
data prep interface change
Aug 31, 2018
6259aed
cosmetic fix for ivector script
Aug 31, 2018
7e14701
increase parameter for TDNN-F
Aug 31, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions egs/zeroth_korean/s5/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Zeroth-Korean kaldi example is from Zeroth Project. Zeroth project introduces free Korean speech corpus and aims to make Korean speech recognition more broadly accessible to everyone. This project was developed in collaboration between Lucas Jo(@Atlas Guide Inc.) and Wonkyum Lee(@Gridspace Inc.).

In this example, we are using 51.6 hours transcribed Korean audio for training data (22,263 utterances, 105 people, 3000 sentences) and 1.2 hours transcribed Korean audio for testing data (457 utterances, 10 people). Besides audio and transcription, we provide pre-trained/designed language model, lexicon and morpheme-based segmenter(morfessor)

The database can be also downloaded from openslr:
http://www.openslr.org/40

The database is licensed under Attribution 4.0 International (CC BY 4.0)

This folder contains a speech recognition recipe which is based on WSJ/Librispeech example.

For more details about Zeroth project, please visit:
https://github.com/goodatlas/zeroth
63 changes: 63 additions & 0 deletions egs/zeroth_korean/s5/RESULTS
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a file s5/README.txt that explains something about the data: what type of data, how much of it, how you can obtain it, what the license is, things like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added README.txt


# this RESULTS file was obtained by Wonkyum Lee in July 2018.

for dir in exp/*; do
steps/info/gmm_dir_info.pl $dir
for x in $dir/decode*test*; do [ -d $x ] && [[ $x =~ "$1" ]] && grep WER $x/wer_* | utils/best_wer.sh; done
done
exit 0

# monophone, trained on the 2k shortest utterances
exp/mono: nj=16 align prob=-99.85 over 2.66h [retry=0.8%, fail=0.3%] states=130 gauss=1004
%WER 70.24 [ 6499 / 9253, 295 ins, 1399 del, 4805 sub ] exp/mono/decode_nosp_fglarge_test_clean/wer_8_0.5
%WER 71.28 [ 6596 / 9253, 185 ins, 1721 del, 4690 sub ] exp/mono/decode_nosp_tglarge_test_clean/wer_9_1.0
%WER 78.83 [ 7294 / 9253, 218 ins, 1752 del, 5324 sub ] exp/mono/decode_nosp_tgsmall_test_clean/wer_10_0.0

# first triphone build, trained on 5k utterances
exp/tri1: nj=16 align prob=-98.34 over 11.55h [retry=1.6%, fail=0.6%] states=1568 gauss=10030 tree-impr=4.07
%WER 37.44 [ 3464 / 9253, 258 ins, 725 del, 2481 sub ] exp/tri1/decode_nosp_fglarge_test_clean/wer_15_0.5
%WER 38.85 [ 3595 / 9253, 347 ins, 633 del, 2615 sub ] exp/tri1/decode_nosp_tglarge_test_clean/wer_15_0.0
%WER 53.23 [ 4925 / 9253, 296 ins, 1060 del, 3569 sub ] exp/tri1/decode_nosp_tgsmall_test_clean/wer_15_0.0

# tri2 is an LDA+MLLT systemm, trained on 10k utterances
exp/tri2: nj=16 align prob=-49.63 over 23.00h [retry=1.7%, fail=0.8%] states=2000 gauss=15039 tree-impr=4.70 lda-sum=18.11 mllt:impr,logdet=0.99,1.39
%WER 33.50 [ 3100 / 9253, 248 ins, 626 del, 2226 sub ] exp/tri2/decode_nosp_fglarge_test_clean/wer_16_0.5
%WER 34.55 [ 3197 / 9253, 315 ins, 537 del, 2345 sub ] exp/tri2/decode_nosp_tglarge_test_clean/wer_16_0.0
%WER 48.98 [ 4532 / 9253, 303 ins, 903 del, 3326 sub ] exp/tri2/decode_nosp_tgsmall_test_clean/wer_14_0.0

# tri3 is an LDA+MLLT+SAT system, trained on entire clean training set
exp/tri3: nj=16 align prob=-48.95 over 51.22h [retry=1.6%, fail=0.7%] states=3336 gauss=40065 fmllr-impr=2.72 over 19.18h tree-impr=7.23
%WER 23.89 [ 2211 / 9253, 233 ins, 404 del, 1574 sub ] exp/tri3/decode_nosp_fglarge_test_clean/wer_15_0.0
%WER 24.47 [ 2264 / 9253, 252 ins, 385 del, 1627 sub ] exp/tri3/decode_nosp_tglarge_test_clean/wer_13_0.0
%WER 37.81 [ 3499 / 9253, 274 ins, 671 del, 2554 sub ] exp/tri3/decode_nosp_tgsmall_test_clean/wer_13_0.0
%WER 49.00 [ 4534 / 9253, 302 ins, 874 del, 3358 sub ] exp/tri3/decode_nosp_tgsmall_test_clean.si/wer_14_0.0
%WER 21.68 [ 2006 / 9253, 226 ins, 346 del, 1434 sub ] exp/tri3/decode_fglarge_test_clean/wer_15_0.0
%WER 22.59 [ 2090 / 9253, 231 ins, 372 del, 1487 sub ] exp/tri3/decode_tglarge_test_clean/wer_15_0.0
%WER 34.83 [ 3223 / 9253, 294 ins, 605 del, 2324 sub ] exp/tri3/decode_tgsmall_test_clean/wer_12_0.0
%WER 45.28 [ 4190 / 9253, 270 ins, 880 del, 3040 sub ] exp/tri3/decode_tgsmall_test_clean.si/wer_15_0.0

# tri4 is an LDA+MLLT+SAT system after estimating pronunciation probabilities
# and word-and-pronunciation-dependent silence probabilities.
exp/tri4: nj=16 align prob=-48.70 over 51.22h [retry=1.5%, fail=0.7%] states=3368 gauss=40039 fmllr-impr=0.23 over 42.91h tree-impr=7.87
%WER 21.61 [ 2000 / 9253, 210 ins, 379 del, 1411 sub ] exp/tri4/decode_fglarge_test_clean/wer_14_0.5
%WER 22.59 [ 2090 / 9253, 237 ins, 371 del, 1482 sub ] exp/tri4/decode_tglarge_test_clean/wer_15_0.0
%WER 34.57 [ 3199 / 9253, 285 ins, 595 del, 2319 sub ] exp/tri4/decode_tgsmall_test_clean/wer_12_0.0
%WER 45.82 [ 4240 / 9253, 270 ins, 833 del, 3137 sub ] exp/tri4/decode_tgsmall_test_clean.si/wer_13_0.0

for dir in exp/chain/tdnn*_sp; do
steps/info/chain_dir_info.pl $dir
for x in ${dir}_online/decode*test*; do [ -d $x ] && [[ $x =~ "$1" ]] && grep WER $x/wer_* | utils/best_wer.sh; done
done
exit 0

# tdnn_1a is a kind of factorized TDNN, with skip connections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These systems seem to be underfitting-- I don't see any train/valid difference. Therefore you might want to choose a larger config for this TDNN-F system. (the one with OPGRU is already quite large but you could try more epochs).

exp/chain/tdnn1b_sp: num-iters=174 nj=2..8 num-params=12.9M dim=40+100->3040 combine=-0.041->-0.041 (over 2) xent:train/valid[115,173,final]=(-1.14,-0.759,-0.751/-1.14,-0.788,-0.777) logprob:train/valid[115,173,final]=(-0.084,-0.047,-0.046/-0.080,-0.050,-0.048)
%WER 10.55 [ 976 / 9253, 122 ins, 166 del, 688 sub ] exp/chain/tdnn1b_sp_online/decode_fglarge_test_clean/wer_13_1.0
%WER 17.65 [ 1633 / 9253, 208 ins, 233 del, 1192 sub ] exp/chain/tdnn1b_sp_online/decode_tgsmall_test_clean/wer_10_0.0

# This chain system has TDNN+Norm-OPGRU architecture.
exp/chain/tdnn_opgru1a_sp: num-iters=99 nj=2..12 num-params=38.0M dim=40+100->3040 combine=-0.045->-0.045 (over 1) xent:train/valid[65,98,final]=(-1.18,-0.663,-0.651/-1.21,-0.698,-0.684) logprob:train/valid[65,98,final]=(-0.079,-0.038,-0.037/-0.076,-0.040,-0.039)
%WER 9.45 [ 874 / 9253, 109 ins, 159 del, 606 sub ] exp/chain/tdnn_opgru1a_sp_online/decode_fglarge_test_clean/wer_10_1.0
%WER 15.22 [ 1408 / 9253, 175 ins, 196 del, 1037 sub ] exp/chain/tdnn_opgru1a_sp_online/decode_tgsmall_test_clean/wer_8_0.0

17 changes: 17 additions & 0 deletions egs/zeroth_korean/s5/cmd.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# you can change cmd.sh depending on what type of queue you are using.
# If you have no queueing system and want to run on a local machine, you
# can change all instances 'queue.pl' to run.pl (but be careful and run
# commands one by one: most recipes will exhaust the memory on your
# machine). queue.pl works with GridEngine (qsub). slurm.pl works
# with slurm. Different queues are configured differently, with different
# queue names and different ways of specifying things like memory;
# to account for these differences you can create and edit the file
# conf/queue.conf to match your queue's configuration. Search for
# conf/queue.conf in http://kaldi-asr.org/doc/queue.html for more information,
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl.

export train_cmd="queue.pl --mem 2G"
export decode_cmd="queue.pl --mem 4G"
export mkgraph_cmd="queue.pl --mem 8G"
export normalize_cmd="queue.pl --mem 4G"

1 change: 1 addition & 0 deletions egs/zeroth_korean/s5/conf/decode.config
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# empty config, just use the defaults.
1 change: 1 addition & 0 deletions egs/zeroth_korean/s5/conf/mfcc.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
--use-energy=false # only non-default option.
10 changes: 10 additions & 0 deletions egs/zeroth_korean/s5/conf/mfcc_hires.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# config for high-resolution MFCC features, intended for neural network training
# Note: we keep all cepstra, so it has the same info as filterbank features,
# but MFCC is more easily compressible (because less correlated) which is why
# we prefer this method.
--use-energy=false # use average of log energy, not energy.
--num-mel-bins=40 # similar to Google's setup.
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so
# there might be some information at the low end.
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)
1 change: 1 addition & 0 deletions egs/zeroth_korean/s5/conf/online_cmvn.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh
107 changes: 107 additions & 0 deletions egs/zeroth_korean/s5/local/chain/compare_wer.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
#!/bin/bash

# this script is used for comparing decoding results between systems.
# e.g. local/chain/compare_wer.sh exp/chain/tdnn_{c,d}_sp
# For use with discriminatively trained systems you specify the epochs after a colon:
# for instance,
# local/chain/compare_wer.sh exp/chain/tdnn_c_sp exp/chain/tdnn_c_sp_smbr:{1,2,3}


if [ $# == 0 ]; then
echo "Usage: $0: <dir1> [<dir2> ... ]"
echo "e.g.: $0 exp/chain/tdnn_{b,c}_sp"
echo "or (with epoch numbers for discriminative training):"
echo "$0 exp/chain/tdnn_b_sp_disc:{1,2,3}"
exit 1
fi

echo "# $0 $*"

used_epochs=false

# this function set_names is used to separate the epoch-related parts of the name
# [for discriminative training] and the regular parts of the name.
# If called with a colon-free directory name, like:
# set_names exp/chain/tdnn_lstm1e_sp_bi_smbr
# it will set dir=exp/chain/tdnn_lstm1e_sp_bi_smbr and epoch_infix=""
# If called with something like:
# set_names exp/chain/tdnn_d_sp_smbr:3
# it will set dir=exp/chain/tdnn_d_sp_smbr and epoch_infix="_epoch3"


set_names() {
if [ $# != 1 ]; then
echo "compare_wer_general.sh: internal error"
exit 1 # exit the program
fi
dirname=$(echo $1 | cut -d: -f1)
epoch=$(echo $1 | cut -s -d: -f2)
if [ -z $epoch ]; then
epoch_infix=""
else
used_epochs=true
epoch_infix=_epoch${epoch}
fi
}



echo -n "# System "
for x in $*; do printf "% 10s" " $(basename $x)"; done
echo

strings=(
"#WER test_clean (tgsmall) "
"#WER test_clean (fglarge) ")

for n in 0 1 ; do
echo -n "${strings[$n]}"
for x in $*; do
set_names $x # sets $dirname and $epoch_infix
decode_names=(tgsmall_test_clean fglarge_test_clean)

wer=$(grep WER ${dirname}_online/decode_${decode_names[$n]}/wer_* | utils/best_wer.sh | awk '{print $2}')
printf "% 10s" $wer
done
echo
done


if $used_epochs; then
exit 0; # the diagnostics aren't comparable between regular and discriminatively trained systems.
fi


echo -n "# Final train prob "
for x in $*; do
prob=$(grep Overall $x/log/compute_prob_train.final.log | grep -v xent | awk '{printf("%.4f", $8)}')
printf "% 10s" $prob
done
echo

echo -n "# Final valid prob "
for x in $*; do
prob=$(grep Overall $x/log/compute_prob_valid.final.log | grep -v xent | awk '{printf("%.4f", $8)}')
printf "% 10s" $prob
done
echo

echo -n "# Final train prob (xent)"
for x in $*; do
prob=$(grep Overall $x/log/compute_prob_train.final.log | grep -w xent | awk '{printf("%.4f", $8)}')
printf "% 10s" $prob
done
echo

echo -n "# Final valid prob (xent)"
for x in $*; do
prob=$(grep Overall $x/log/compute_prob_valid.final.log | grep -w xent | awk '{printf("%.4f", $8)}')
printf "% 10s" $prob
done
echo

echo -n "# Num-params "
for x in $*; do
printf "% 10s" $(grep num-parameters $x/log/progress.1.log | awk '{print $2}')
done
echo
1 change: 1 addition & 0 deletions egs/zeroth_korean/s5/local/chain/run_tdnn.sh
1 change: 1 addition & 0 deletions egs/zeroth_korean/s5/local/chain/run_tdnn_opgru.sh
Loading