Skip to content

Small fix on scripts of Aishell2#2522

Merged
danpovey merged 52 commits intokaldi-asr:masterfrom
underdogliu:aishell2_release
Jul 6, 2018
Merged

Small fix on scripts of Aishell2#2522
danpovey merged 52 commits intokaldi-asr:masterfrom
underdogliu:aishell2_release

Conversation

@underdogliu
Copy link
Contributor

This PR includes some slight modifications on Aishell2 scripts, including:

  • Add a mode option, which decides whether to train simple model with 40-dim mfcc or 'normal' model with 43-dim pitch-added mfcc, i-vector and dropout.

  • Some minor comment fix.

@danpovey This PR has been primarily checked by @dophist although I believe you may have more comments before merging. Thanks for checking!

searcher1994 and others added 30 commits April 17, 2018 16:12
#!/bin/bash

# _1d is as _1c, but with dropout schedule added, referenced from wsj
# _1d is as _1a, but with i-vector and dropout schedule added, referenced from wsj
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I am assuming this is not the entire difference because this one has pitch too.

@danpovey
Copy link
Contributor

can you please rename 1d to 1b? Also, please make clear at the top of each chain script, which 'mode' it was run with, i.e. whether it was done in 'normal' or 'simple' mode. Make sure that local/chain/compare_wer.sh prints out the number of parameters, and also please include the output chain_dir_info.pl in the comments at the top. (That will also help clarify what the feature type was).

# nj for dev and test
dev_nj=$(wc -l data/dev/utt2spk | awk '${print $1}' || exit 1;)
test_nj=$(wc -l data/test/utt2spk | awk '${print $1}' || exit 1;)
dev_nj=$(wc -l data/dev/spk2utt | awk '${print $1}' || exit 1;)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@underdogliu hi, Xuechen, still syntax error in awk command


# _1d is as _1c, but with dropout schedule added, referenced from wsj
# _1b is as _1a, but with pitch feats, i-vector and dropout schedule added, referenced from wsj
# this script is for 'normal' mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say this script is for 'normal' mode: do you mean that the results you show here are from running it in 'normal' mode, or that it would only run correctly in 'normal' mod?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would only run correctly in 'normal' mode due to feature dimension. I think it has been indicated in run.sh, where setting different modes leads to different scripts. But of course I can make changes to make it potentially simpler.

@danpovey
Copy link
Contributor

danpovey commented Jul 4, 2018 via email

@underdogliu
Copy link
Contributor Author

For alignments and question tree generation there may be some dimension mismatch. For example when doing alignment conversion, if the number of columns of final.mat from ali_dir is something like, for example, 144=16x9 (this is with pitch) while the dimension of input features is 13, then it would fail. But there is a possibility that I got the wrong memory from experience. Also, even if they is not, it's still good practice from my point of view to keep the dimension consistent. But if it's not please point out sharply and I'd be happy to correct my mistakes in the script.

@danpovey
Copy link
Contributor

danpovey commented Jul 4, 2018 via email

@underdogliu
Copy link
Contributor Author

@danpovey After some discussion with @dophist, I have removed the mode option and decided to simply use the most decent model with pitch, ivector and dropout. Hope this will make everything clearer. Also I changed the cmd options back to queue.pl, using GridEngine by default.

Hope that helps and please make sharp comments if there is anything still need to be modified or cleaned.

@danpovey
Copy link
Contributor

danpovey commented Jul 5, 2018

OK, great. You might want to remove the references to "mode" in the RESULTS file now that it no longer exists. And I think the "--stage 5" in the TDNN script invocation probably shouldn't be there.

@underdogliu
Copy link
Contributor Author

@danpovey Thanks for pointing it out and I've made the change on RESULTS file. However, for '--stage 5', I think it's necessary since we need to extract hires features and training ivector extractor then extract ivectors from it.

@danpovey
Copy link
Contributor

danpovey commented Jul 6, 2018 via email

@underdogliu
Copy link
Contributor Author

ah I got what you meant. Fixed.

@danpovey
Copy link
Contributor

danpovey commented Jul 6, 2018

Thanks! Merging.

@danpovey danpovey merged commit 998a4d6 into kaldi-asr:master Jul 6, 2018
dpriver pushed a commit to dpriver/kaldi that referenced this pull request Sep 13, 2018
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants