Update from original #4

dresen · 2016-11-10T08:06:15Z

No description provided.

…shared script in utils/data, and delete the unnecessary scripts

… (improves results)

Bug fix in steps/paste_feats.sh

…g with nnet3; refactoring the nnet3 decodable code.

Adding nnet-latgen-faster-parallel program for multi-threaded decodin…

…hones

… exists.

- using GetVerboseLevel(), - avoiding 'WriteIntegerVector' for writing to KALDI_LOG by introducing: 'operator<< (std::ostream, std::vector<T>)' in kaldi-error.h

…r-parallel.cc

…cleanup; and nnet3+no-cleanup results

…interval

…ed (#960) the results. Fixed a bug in steps/nnet3/align.sh when supplying online-ivector-period option to nnet3-align-compiled

… and add the output to RESULTS files for tedlium/s5_r2 and ami/s5b

…esults

nnet1: redesigning LSTM, BLSTM code,

Scripts for model info

updating the 'rm' (B)LSTM results,

* Remove gawk dependency from s5c swbd1_prepare_dict.sh (for #1143) * Suppress warning about non-existent lexiconp.txt * Check for 2,435 or 2,438 data files in SWB corpus

Add the capability to enforce a maximum change per component in nnet3, at the minibatch level. This CHANGES THE DEFAULT BEHAVIOR OF EXISTING SCRIPTS. nnet3 and nnet3+chain config-generation scripts will now by default apply a max-change at the per component level. You can override this (for back compatibility) by adding --max-change-per-component=0.0 --max-change-per-component-final=0.0 to the make_configs.py scripts. However, we don't recommend doing this unless it is part of an existing experimental setup that you need to keep consistent, because we found that the max-change-per-component is slightly helpful (0.1% or 0.2% absolute in our experiments).

* Fix gawk-specific features problem in prepare dict scripts

…ortion) (#1141)

* every third field as separator * using internal get_ctm.edits.py * remove other files have copies in interval * fixed error messages * removed duplicated make_one_biased_lm

a) Fix get_ctm_edits.py to be not too greedy with id's. It would go wrong if an id was a prefix of another id b) Add lattice-1best command in ctm generation for the non-position-dependent-phones case

#1146) * Some changes to the unk-model implementation to ensure determinizability and to keep L.fst small. * Fix a pathname in error message in make_unk_lm.sh (thanks: Jasper Ooster), comment out results in local/run_unk_model.sh

#1157) * fix the scripts by putting lattice-1best * Copy the descriptions from Dan and add a example for lattice-align-words-lexicon #1155 (comment)

…isher data. Resolves issue #1139.

The latest version of sw-ms98-dict.text from http://www.openslr.org/resources/5/switchboard_word_alignments.tar.gz contains the following header on line 1: file: $SWB/data/dictionary/sw-ms98-dict.text This ends up in lexicon0.txt, which causes utils/validate_dict_dir.pl to fail with the following error: --> ERROR: phone "$swb/data/dictionary/sw-ms98-dict.text" is not in {, non}silence.txt (line 10399) This in turn causes utils/prepare_lang.sh to fail. Update dict.patch so that it removes the file header from sw-ms98-dict.text.

sendMail -> SendMail SendMail() removed from train_rnn.py (already declared in nnet3_train_lib)

…run for the first time (#1160)

)

For wider perl version compatibility

* env.sh is under tools not tools/extras. * Fixed a couple typos.

Print correct location of env.sh

* Make sure orig2utt will be sorted, otherwise decoding will later fail * Don't use a syntax that causes 'Experimental keys on scalar is now forbidden' error with Perl 5.23 and later

Depending on how LDC2002T43.tgz is unpacked, the transcripts might be under 2000_hub5_eng_eval_tr. As requested in #1169, we append it to tdir if that directory exists.

fix cpp style guide dead link

* added recipe for swedish * requested changes made * requested changes made * exit 1 added and copyright * removed dict_prep.sh * changed sph2pipe

* Reverberation based augmentation recipe added for swbd. Improves over the best TDNN recipe. BLSTM/LSTM recipes pending. Added minor modifications to reverberate_data_dir.py to enable inclusion of the original data directory.

Fixes #1173 See http://mailman.speech.sri.com/pipermail/srilm-user/2016q4/001726.html

danpovey and others added 30 commits August 5, 2016 20:05

Replace invocations of local versions of remove_dup_utts.sh with the …

30eea4e

…shared script in utils/data, and delete the unnecessary scripts

Modified AMI s5b recipe to use left biphone context for TDNN training…

c486f29

… (improves results)

Bug fix in steps/paste_feats.sh

6f771c1

Several unrelated small fixes, mostly cosmetic.

475df25

Merge pull request #954 from vijayaditya/bugfix4

c6428c1

Bug fix in steps/paste_feats.sh

fixup of travis compilation issue

b068f1b

adding bwd-compatibility for LSTM/BLSTM models

eee1830

introducing 'ReadData' for getting 1 valid sentence

232e1a5

Bug fix to steps/get_ctm.sh (thanks: @xiaohui-zhang)

6954b54

Adding nnet-latgen-faster-parallel program for multi-threaded decodin…

0bea1be

…g with nnet3; refactoring the nnet3 decodable code.

Merge pull request #956 from danpovey/nnet3-parallel-decoding

727d39a

Adding nnet-latgen-faster-parallel program for multi-threaded decodin…

fix overlapping line bug

3720678

make make_index.sh able to index lattices with position-independent p…

0ca100f

…hones

Fix to egs/wsj/s5/steps/make_index.sh to check that align_lexicon.int…

d85a110

… exists.

Adding assert to probabilities sum up to one

194be8f

integrating changes proposed by Dan,

72ba09e

- using GetVerboseLevel(), - avoiding 'WriteIntegerVector' for writing to KALDI_LOG by introducing: 'operator<< (std::ostream, std::vector<T>)' in kaldi-error.h

Add file accidentally omitted from commit 86417db, nnet3-latgen-faste…

e1dd41d

…r-parallel.cc

Adding tedlium release 2 results with chain system, with and without …

cad94a6

…cleanup; and nnet3+no-cleanup results

Adding more checks in input arguments

8d69299

Minor documentation change, regarding how to set GridEngine schedule …

82a7ae6

…interval

Fixed some bugs in librispeech tdnn+{xent,chain}+sMBR recipes and add…

3f7d404

…ed (#960) the results. Fixed a bug in steps/nnet3/align.sh when supplying online-ivector-period option to nnet3-align-compiled

Adding support of piped command in the rir_list and noise_list

9792b28

Adding scripts that briefly summarize logs from training directories,…

c5e6eff

… and add the output to RESULTS files for tedlium/s5_r2 and ami/s5b

Fix bug regarding MLLT logdet in steps/info/gmm_dir_info.pl; adjust r…

7262c7e

…esults

Merge pull request #950 from vesis84/recurrent

500b2eb

nnet1: redesigning LSTM, BLSTM code,

Merge pull request #961 from danpovey/add-info

ddbf31e

Scripts for model info

updating the 'rm' (B)LSTM results,

d2dc80a

Merge pull request #962 from vesis84/nnet1_rm_results

2d391d2

updating the 'rm' (B)LSTM results,

update argument names and comment

a16f437

Clarifying usage message of lattice-combine, and removing unused option.

22bfec3

aevernon and others added 29 commits October 26, 2016 17:02

Remove gawk dependency from s5c swbd1_prepare_dict.sh (#1144)

001e56e

* Remove gawk dependency from s5c swbd1_prepare_dict.sh (for #1143) * Suppress warning about non-existent lexiconp.txt * Check for 2,435 or 2,438 data files in SWB corpus

Fix the gawk-specific feature problems in prepare dict scripts. (#1147)

84c4738

* Fix gawk-specific features problem in prepare dict scripts

printed out stats for self-repair in nonlinearity (self-repaired-prop…

3fe38c8

…ortion) (#1141)

Fixed typos (#1150)

5f9e6a7

get_ctm_edits.py parses the transcriptions containing semicolon (#1151)

706afe6

* every third field as separator * using internal get_ctm.edits.py * remove other files have copies in interval * fixed error messages * removed duplicated make_one_biased_lm

Fix two issues with clean and segment (#1154)

b606490

a) Fix get_ctm_edits.py to be not too greedy with id's. It would go wrong if an id was a prefix of another id b) Add lattice-1best command in ctm generation for the non-position-dependent-phones case

cast double to BaseFloat (#1156)

419f78c

Put lattice-1best between lattice-align-words-lexicon and nbest-to-ctm (

e13dd85

#1157) * fix the scripts by putting lattice-1best * Copy the descriptions from Dan and add a example for lattice-align-words-lexicon #1155 (comment)

Change to get_num_frames.sh to work with mawk.

17d8834

Make the AMI recipes clearer about what to do if you don't have the f…

c4c08c4

…isher data. Resolves issue #1139.

wer_per_spk_details.pl support that utt2spk contains unicode (#1149)

7de64a5

Fixed function calls + removed redundant def (#1158)

1829885

sendMail -> SendMail SendMail() removed from train_rnn.py (already declared in nnet3_train_lib)

build pipeline change -- the dependencies are generated when make is …

14a1365

…run for the first time (#1160)

Fix small bug in nnet3 (relates to printing of a warning message) (#1161

fd83871

)

Update of lre07/v2 scripts (#999)

08869e3

Modify nnet3-acc-lda-stats so it works when supervision is non-sparse.

cfb1999

Modify read log info (#1166)

d5fcad4

For wider perl version compatibility

Print correct location of env.sh

b5d7c4e

* env.sh is under tools not tools/extras. * Fixed a couple typos.

Merge pull request #1168 from aevernon/master

8462798

Print correct location of env.sh

Cleanup segmentation perl fixes (#1171)

f8b746f

* Make sure orig2utt will be sorted, otherwise decoding will later fail * Don't use a syntax that causes 'Experimental keys on scalar is now forbidden' error with Perl 5.23 and later

Try harder to find 2000 HUB5 transcripts (#1172)

993612a

Depending on how LDC2002T43.tgz is unpacked, the transcripts might be under 2000_hub5_eng_eval_tr. As requested in #1169, we append it to tdir if that directory exists.

fix dead link readme (#1177)

f2dd45c

fix cpp style guide dead link

Recipe for Swedish (#1102)

20889ae

* added recipe for swedish * requested changes made * requested changes made * exit 1 added and copyright * removed dict_prep.sh * changed sph2pipe

Augmentation recipe for swbd (#1112)

f4495be

* Reverberation based augmentation recipe added for swbd. Improves over the best TDNN recipe. BLSTM/LSTM recipes pending. Added minor modifications to reverberate_data_dir.py to enable inclusion of the original data directory.

adding CER scoring capability (#1174)

0de048e

Patch SRILM to fix invalid pointer in ngram (#1181)

0e3adc0

Fixes #1173 See http://mailman.speech.sri.com/pipermail/srilm-user/2016q4/001726.html

dresen merged commit 1472b0b into dresen:master Nov 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update from original #4

Update from original #4

Uh oh!

dresen commented Nov 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Update from original #4

Update from original #4

Uh oh!

Conversation

dresen commented Nov 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants