Skip to content

Conversation

@KarelVesely84
Copy link
Contributor

Any volunteer for the review?

  • introducing interface 'MultistreamComponent',
    • handles stream-lengths and stream-resets,
  • rewritten most of training tools 'nnet-train-lstm-streams',
    'nnet-train-blstm-streams',
  • introducing 'RecurrentComponent' with simple forward recurrency.
  • the LSTM/BLSTM components have clipping presets we recently
    found helpful for BLSTM-CTC system.
  • renaming tools and components (removing 'streams' from names)
  • updating the scripts for generating lstm/blstm prototypes
  • updating 'rm' lstm/blstm examples

- introducing interface 'MultistreamComponent',
  - handles stream-lengths and stream-resets,
- rewritten most of training tools 'nnet-train-lstm-streams',
  'nnet-train-blstm-streams',
- introducing 'RecurrentComponent' with simple forward recurrency.
- the LSTM/BLSTM components have clipping presets we recently
  found helpful for BLSTM-CTC system.
- renaming tools and components (removing 'streams' from names)
- updating the scripts for generating lstm/blstm prototypes
- updating 'rm' lstm/blstm examples
// number of frames we'll pack as the streams,
std::vector<int32> frame_num_utt;

// pack the parallel data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like the code would be easier to follow if you broke some things out into functions.
This is way past the "soft limit" of 40 lines per function.

Copy link
Contributor Author

@KarelVesely84 KarelVesely84 Aug 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, okay, I rewrote this place. I understand that the short functions are easier to understand than the 'script-like' code of the binary with many dependencies. The benefit of 'script-like' code is that it is easy to modify. As a compromise I used the brace-blocks with a comment in some places. Yes, that particular place was a tedious... Thank you for pointing it out.

What happens in the 'nnet-train-multistream' code is that first, whole sentences are filled into 'vector<Matrix<>>', from which they are sliced into a multi-stream mini-batch with interleaved layout of frames:
[feaRow1_stream1,
feaRow1_stream2,
...
feaRow1_streamN,
feaRow2_stream1,
feaRow2_stream2,
...
feaRow2_streamN
]

Similar thing can be found in EESEN, where the 'batch' is formed from whole utterances of roughly same length. The multi-stream training with whole sentences is then implemented in 'nnet-train-multistream-perutt'.

[The compilation bug is fixed, the backward compatibility of models also]

@danpovey
Copy link
Contributor

danpovey commented Aug 5, 2016

Karel, regarding back-compatibility, I'm not sure if it can read the old models-- if not, can you at least make sure it dies with an informative error message?

@danpovey
Copy link
Contributor

danpovey commented Aug 5, 2016

Also, notice that the build failed.

// pass the info about padding,
nnet.SetSeqLengths(frame_num_utt);
// Show the 'utt' lengths in the VLOG[2],
if (kaldi::g_kaldi_verbose_level >= 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're supposed to check the verbose level with GetVerboseLevel().

- using GetVerboseLevel(),
- avoiding 'WriteIntegerVector' for writing to KALDI_LOG by introducing:
  'operator<< (std::ostream, std::vector<T>)' in kaldi-error.h
@danpovey danpovey merged commit 500b2eb into kaldi-asr:master Aug 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants