Small fix on scripts of Aishell2 by underdogliu · Pull Request #2522 · kaldi-asr/kaldi

underdogliu · 2018-06-26T14:21:32Z

This PR includes some slight modifications on Aishell2 scripts, including:

Add a mode option, which decides whether to train simple model with 40-dim mfcc or 'normal' model with 43-dim pitch-added mfcc, i-vector and dropout.
Some minor comment fix.

@danpovey This PR has been primarily checked by @dophist although I believe you may have more comments before merging. Thanks for checking!

…e output probs for diff. objectives

danpovey · 2018-06-27T23:17:44Z

egs/aishell2/s5/local/chain/tuning/run_tdnn_1d.sh

 #!/bin/bash

-# _1d is as _1c, but with dropout schedule added, referenced from wsj
+# _1d is as _1a, but with i-vector and dropout schedule added, referenced from wsj


OK, I am assuming this is not the entire difference because this one has pitch too.

danpovey · 2018-06-29T21:33:45Z

can you please rename 1d to 1b? Also, please make clear at the top of each chain script, which 'mode' it was run with, i.e. whether it was done in 'normal' or 'simple' mode. Make sure that local/chain/compare_wer.sh prints out the number of parameters, and also please include the output chain_dir_info.pl in the comments at the top. (That will also help clarify what the feature type was).

dophist · 2018-07-02T10:13:06Z

egs/aishell2/s5/local/run_gmm.sh

 # nj for dev and test
-dev_nj=$(wc -l data/dev/utt2spk | awk '${print $1}' || exit 1;)
-test_nj=$(wc -l data/test/utt2spk | awk '${print $1}' || exit 1;)
+dev_nj=$(wc -l data/dev/spk2utt | awk '${print $1}' || exit 1;)


@underdogliu hi, Xuechen, still syntax error in awk command

danpovey · 2018-07-04T18:27:09Z

egs/aishell2/s5/local/chain/tuning/run_tdnn_1b.sh


-# _1d is as _1c, but with dropout schedule added, referenced from wsj
+# _1b is as _1a, but with pitch feats, i-vector and dropout schedule added, referenced from wsj
+# this script is for 'normal' mode


When you say this script is for 'normal' mode: do you mean that the results you show here are from running it in 'normal' mode, or that it would only run correctly in 'normal' mod?

it would only run correctly in 'normal' mode due to feature dimension. I think it has been indicated in run.sh, where setting different modes leads to different scripts. But of course I can make changes to make it potentially simpler.

danpovey · 2018-07-04T19:41:17Z

That script works out the feature dimension from the features-- it looks to me like it would work.

…

On Wed, Jul 4, 2018 at 3:37 PM, Xuechen Liu ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In egs/aishell2/s5/local/chain/tuning/run_tdnn_1b.sh <#2522 (comment)>: > @@ -1,24 +1,30 @@ #!/bin/bash -# _1d is as _1c, but with dropout schedule added, referenced from wsj +# _1b is as _1a, but with pitch feats, i-vector and dropout schedule added, referenced from wsj +# this script is for 'normal' mode it would only run correctly in 'normal' mode due to feature dimension. I think it has been indicated in run.sh, where setting different modes leads to different scripts. But of course I can make changes to make it potentially simpler. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2522 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuxUiiKadQwiesaI3meOrhMhhcaH5ks5uDRlggaJpZM4U4Cv9> .

underdogliu · 2018-07-04T21:16:59Z

For alignments and question tree generation there may be some dimension mismatch. For example when doing alignment conversion, if the number of columns of final.mat from ali_dir is something like, for example, 144=16x9 (this is with pitch) while the dimension of input features is 13, then it would fail. But there is a possibility that I got the wrong memory from experience. Also, even if they is not, it's still good practice from my point of view to keep the dimension consistent. But if it's not please point out sharply and I'd be happy to correct my mistakes in the script.

danpovey · 2018-07-04T21:37:28Z

The only thing I see in the script that wouldn't be comptible with the "simple" setting is that it does steps/make_mfcc_pitch.sh instead of steps/make_mfcc.sh. But that shouldn't cause a crash, it's just that it isn't really compatible with your intentions in the "simple" setting. Anyway, I suppose my main problem is as follows: if your training script is going to have the "simple" option, that option should actually work. Currently it wouldn't work, because local/chain/run_tdnn.sh points to a script which does use pitch. IMO there are two reasonable possibilities: (1) Take out the "simple" option (2) Split the scripts into local/chain_simple/ and local/chain/ (the latter for the 'normal' setup), with separate tuning sequences (e.g. 1a in both). And obviously, in the "simple" case, the script would call the version in local/chain_simple/. Dan

…

On Wed, Jul 4, 2018 at 5:17 PM, Xuechen Liu ***@***.***> wrote: For alignments and question tree generation there may be some dimension mismatch. For example when doing alignment conversion, if the number of columns of final.mat from ali_dir is something like, for example, 144=16x9 (this is with pitch) while the dimension of input features is 13, then it would fail. But there is a possibility that I got the wrong memory from experience. Also, even if they is not, it's still good practice from my point of view to keep the dimension consistent. But if it's not please point out sharply and I'd be happy to correct my mistakes in the script. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2522 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu4wCXXPJiYnAE8lYxqmovLTZA2Rrks5uDTDQgaJpZM4U4Cv9> .

underdogliu · 2018-07-05T09:25:39Z

@danpovey After some discussion with @dophist, I have removed the mode option and decided to simply use the most decent model with pitch, ivector and dropout. Hope this will make everything clearer. Also I changed the cmd options back to queue.pl, using GridEngine by default.

Hope that helps and please make sharp comments if there is anything still need to be modified or cleaned.

danpovey · 2018-07-05T20:27:35Z

OK, great. You might want to remove the references to "mode" in the RESULTS file now that it no longer exists. And I think the "--stage 5" in the TDNN script invocation probably shouldn't be there.

underdogliu · 2018-07-06T08:24:49Z

@danpovey Thanks for pointing it out and I've made the change on RESULTS file. However, for '--stage 5', I think it's necessary since we need to extract hires features and training ivector extractor then extract ivectors from it.

danpovey · 2018-07-06T18:36:15Z

Regarding --stage 5: those scripts, if run without the --stage option, should always run from the beginning. Since that was the first nnet3 or chain script in the run.sh, it should run from the beginning when invoked there, so it shouldn't require the --stage option.

…

On Fri, Jul 6, 2018 at 4:24 AM, Xuechen Liu ***@***.***> wrote: @danpovey <https://github.com/danpovey> Thanks for pointing it out and I've made the change on RESULTS file. However, for '--stage 5', I think it's necessary since we need to extract hires features and training ivector extractor then extract ivectors from it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2522 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVuzg0e53l2Cd1tc6EyYYWb1A2jeGbks5uDx7WgaJpZM4U4Cv9> .

underdogliu · 2018-07-06T20:31:26Z

ah I got what you meant. Fixed.

danpovey · 2018-07-06T20:32:36Z

Thanks! Merging.

searcher1994 and others added 30 commits April 17, 2018 16:12

update nnet/chain tdnn runner

28364d8

fix ivec dir in chain system

8fb613e

nj mods on chain, proved to be ran

12aeb50

remove affix default option

7ed676d

create tuning directory

620e6cf

same for chain

6a56fea

minor fix on stage for run_ivector_common.sh

1dcd3dc

add exp 2b for chain: list struc for frames_per_eg

39df270

fix typos and change ivector transform from lda to pca

79fa7ae

changes in terms of arameters for training, referencing wsj

ad4a0a9

add wer computation script

1080b34

some fixes for wer script

f108e2d

delete trial comments

57984fc

add numbers, comments and minor changes on wer computing script

2ad7775

add results and more comments

1a29c64

space

c97a8c2

minor comment fix

f9cd305

delete scripts that are not useful

e70aed9

add 2b runner script, which references 7p from swbd

05abf10

change dimension to suggested

1d8fd41

change number of jobs

b0f37f5

Merge branch 'master' of https://github.com/kaldi-asr/kaldi

ce82b67

intermediate changes for exp

9ed6c27

remove copyrights in tdnn runners

1a7fa0e

add separate wer calculation scripts for nnet3 and chain, for chain w…

5aac8e0

…e output probs for diff. objectives

style and grep fix

887d8a4

Merge branch 'master' of https://github.com/underdogliu/kaldi

d9d7bc5

update results from chain and change softlink, deleting files from chain

5aed27c

delete 1a which has been deprecated from nnet3 directory

88d4937

update manual in compre_wer.sh

3380e4c

danpovey reviewed Jun 27, 2018

View reviewed changes

xuechen added 6 commits June 28, 2018 13:02

more fixes on scripts&comments

e6608d5

add softlinks about scripts

cfb012b

acquire mode name to chain script caller

09f4423

add RESULTS file. will have gmm results from two modes

2685ead

result updated

a477b15

more comments

4dad44d

xuechen added 3 commits June 30, 2018 11:31

exp name fix

3e23ce0

update results and add info from steps/info/chain_dir_info.pl

67e5fe3

small fix on dev&test nj

3b52cc0

dophist reviewed Jul 2, 2018

View reviewed changes

xuechen added 2 commits July 2, 2018 11:25

more foo fix

835132c

change 'set -euxo pipefail' to 'set -e'

e672bc3

danpovey reviewed Jul 4, 2018

View reviewed changes

remove mode options; change cmd to take advantage of gridengine

fc742f7

remove results from old scripts(simple mode)

0012458

change stage option

5815dbc

danpovey merged commit 998a4d6 into kaldi-asr:master Jul 6, 2018

dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018

[egs] Some fixes and cleanup in Aishell2 scripts (kaldi-asr#2522)

4950974

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[egs] Some fixes and cleanup in Aishell2 scripts (kaldi-asr#2522)

1427154

Conversation

underdogliu commented Jun 26, 2018

Uh oh!

danpovey Jun 27, 2018

Choose a reason for hiding this comment

Uh oh!

danpovey commented Jun 29, 2018

Uh oh!

dophist Jul 2, 2018

Choose a reason for hiding this comment

Uh oh!

danpovey Jul 4, 2018

Choose a reason for hiding this comment

Uh oh!

underdogliu Jul 4, 2018

Choose a reason for hiding this comment

Uh oh!

danpovey commented Jul 4, 2018 via email

Uh oh!

underdogliu commented Jul 4, 2018

Uh oh!

danpovey commented Jul 4, 2018 via email

Uh oh!

underdogliu commented Jul 5, 2018

Uh oh!

danpovey commented Jul 5, 2018

Uh oh!

underdogliu commented Jul 6, 2018

Uh oh!

danpovey commented Jul 6, 2018 via email

Uh oh!

underdogliu commented Jul 6, 2018

Uh oh!

danpovey commented Jul 6, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants