Skip to content

Conversation

@dresen
Copy link
Owner

@dresen dresen commented Dec 2, 2016

No description provided.

binghe and others added 30 commits November 10, 2016 19:28
* Fixed OpenFST building on Windows/cygwin64, OSTYPE doesn't exist, OS = Windows_NT

* Old OSTYPE branch is kept for safety purposes.
…n device' (#1183)

* Change Travis build to use shared libraries to avoid 'no space left on device' error

* Change instructions to make --shared the default (faster compile)
Added reverberation based data augmentation recipe for AMI. Gives gains in IHM, SDM and MDM settings. (TDNN + Chain recipe checked in).
This commit changes the way gradient clipping is done in LSTMs and BLSTMs to be a bit more similar to "truncated BPTT", where we zero out the gradient at the edges of blocks (default block size: 30).  In fact, we only do this if the gradient is above a certain size (3.0 by default).  As before, on all frames, we clip gradients that are too large (default threshold: 30.0).
This improves results slightly (or leaves them the same) and seems to be helpful in controlling instability that we used to occasionally see in BLSTM training.
Caution: the default options of the 'make_configs' scripts have changed, so if you rerun an old BLSTM setup from the config generation stage it will not be quite the same.
fixing arpa2fst to allow traling whitespace in the  headers
Fix nnet3 endpointing to correctly use frame subsampling factor (#1184)
… for computation in nnet3 setup. For more details see Issue #1190 (#1194)
Fix an asymmetry in how the derivatives were truncated outside the chunk for BLSTM training.  [Caution: may change BLSTM results.]

In the nnet3 training code there is a mechanism --{min,max}-deriv-time to stop processing derivatives outside of a particular time range, which can be used to stop wasteful and possibly harmful computation in the e.g. +-40-frame context outside of the chunk boundaries where the supervision lies.  [E.g. the gradients may blow up there.]   Due to a previous oversight, this was previously only applied on the left, i.e. the python script set the --min-deriv-time not the --max-deriv-time.  This commit fixes that, and also tunes the time values used in the scripts, to limit the derivatives to +-10 frames around the supervised chunk.

Results for BLSTM training are improved where tested.  Caution: if you are tuning BLSTM things, you may need to re-run baselines after you merge this change.
Added (B)LSTM scripts for ami/s5b and tedlium/s5_r2
danpovey and others added 29 commits November 21, 2016 17:24
Modify TransitionModel for more compact chain-model graphs
…odel]; also, cosmetic fix to steps/nnet3/chain/train.py
1. Added TDNN+LSTM recipe which performs similar to BLSTM
model with significantly smaller latency (21 frames vs 51 frames).
2. Added BLSTM results in xconfig setup, without layer-wise
discriminative pre-training (2.7% rel. improvement)
3. Added an example TDNN recipe which uses subset of feature vector from
neighboring time steps (results pending).

xconfig : Added a tdnn layer which can deal with subset-dim option.
… (atomicAdd not supported there, needed for chain models).
…ow ';' as a word when those scripts are used. Bug fix in egs/wsj/s5/local/run_segmentation.sh.
…th zero learning rates, backprop does not have to be done.
…n is slow if using older-style LM-formatting script. Do this by disabling a recently introduced optimization if disambig-symbol is not specified.
Fix bugs for DOUBLE_PRECISION = 1
move propagate of norm componet to cu math

move to cu math

fix cu math bug

CuMatrix::NormalizePerRow<float>,    16   0.015   0.001  13.16x
CuMatrix::NormalizePerRow<float>,    32   0.062   0.005  13.54x
CuMatrix::NormalizePerRow<float>,    64   0.239   0.019  12.77x
CuMatrix::NormalizePerRow<float>,   128   0.748   0.074  10.16x
CuMatrix::NormalizePerRow<float>,   256   2.255   0.289  7.79x
CuMatrix::NormalizePerRow<float>,   512   5.399   1.001  5.39x
CuMatrix::NormalizePerRow<float>,  1024  10.010   2.731  3.67x
CuMatrix::NormalizePerRow<double>,    16   0.015   0.001  12.45x
CuMatrix::NormalizePerRow<double>,    32   0.059   0.005  12.69x
CuMatrix::NormalizePerRow<double>,    64   0.236   0.018  12.81x
CuMatrix::NormalizePerRow<double>,   128   0.701   0.072  9.78x
CuMatrix::NormalizePerRow<double>,   256   1.738   0.279  6.23x
CuMatrix::NormalizePerRow<double>,   512   4.415   0.903  4.89x
CuMatrix::NormalizePerRow<double>,  1024   7.392   2.154  3.43x

fix small bug.

strictly follow the original impl.

fix kernel bug

add comment to the cuda kernel function
speed test for normlize per row

correctness test for normalize per row

move test to cu math test

fix test bug
New CUDA kernel for NormalizeComponet::propagate
Look in right location for new style subdirectories
* This commit modifies the tedlium s5_r2 setup to use the original LM from TedliumRelease2 (instead of an LM built from the cantab-tedlium data).
#1224)

* changed the definition of deriv-truncate-margin option. Now the margin is defined on top of the model contexts, which applies to more general network archs.  Change the values used in the scripts from 10 to 8 to compensate for the script change, so we don't have to rerun experiments.
…de crash; compile problem on nvcc 8.0; fix thread-sync errors. (#1228)

This fixes a synchronization problem introduced by PR #1217 (merged yesterday) that can cause crashes in TDNN training.
swbd : added results for the TDNN recipe which uses the subset-dim
option
…ailover to openslr faster (server is often down).
* This pull request implements an automatic method of finding likely bugs in a lexicon, and providing suggested fixes.  Useful if your lexicon is incomplete or contains errors.
@dresen dresen merged commit e840588 into dresen:master Dec 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.