-
Notifications
You must be signed in to change notification settings - Fork 5.4k
nnet1: redesigning LSTM, BLSTM code, #950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- introducing interface 'MultistreamComponent', - handles stream-lengths and stream-resets, - rewritten most of training tools 'nnet-train-lstm-streams', 'nnet-train-blstm-streams', - introducing 'RecurrentComponent' with simple forward recurrency. - the LSTM/BLSTM components have clipping presets we recently found helpful for BLSTM-CTC system. - renaming tools and components (removing 'streams' from names) - updating the scripts for generating lstm/blstm prototypes - updating 'rm' lstm/blstm examples
| // number of frames we'll pack as the streams, | ||
| std::vector<int32> frame_num_utt; | ||
|
|
||
| // pack the parallel data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me like the code would be easier to follow if you broke some things out into functions.
This is way past the "soft limit" of 40 lines per function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, okay, I rewrote this place. I understand that the short functions are easier to understand than the 'script-like' code of the binary with many dependencies. The benefit of 'script-like' code is that it is easy to modify. As a compromise I used the brace-blocks with a comment in some places. Yes, that particular place was a tedious... Thank you for pointing it out.
What happens in the 'nnet-train-multistream' code is that first, whole sentences are filled into 'vector<Matrix<>>', from which they are sliced into a multi-stream mini-batch with interleaved layout of frames:
[feaRow1_stream1,
feaRow1_stream2,
...
feaRow1_streamN,
feaRow2_stream1,
feaRow2_stream2,
...
feaRow2_streamN
]
Similar thing can be found in EESEN, where the 'batch' is formed from whole utterances of roughly same length. The multi-stream training with whole sentences is then implemented in 'nnet-train-multistream-perutt'.
[The compilation bug is fixed, the backward compatibility of models also]
|
Karel, regarding back-compatibility, I'm not sure if it can read the old models-- if not, can you at least make sure it dies with an informative error message? |
|
Also, notice that the build failed. |
| // pass the info about padding, | ||
| nnet.SetSeqLengths(frame_num_utt); | ||
| // Show the 'utt' lengths in the VLOG[2], | ||
| if (kaldi::g_kaldi_verbose_level >= 2) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're supposed to check the verbose level with GetVerboseLevel().
- using GetVerboseLevel(), - avoiding 'WriteIntegerVector' for writing to KALDI_LOG by introducing: 'operator<< (std::ostream, std::vector<T>)' in kaldi-error.h
Any volunteer for the review?
'nnet-train-blstm-streams',
found helpful for BLSTM-CTC system.