WIP : nnet3 & python : rewrote the components.py and tdnn/lstm config gener…#1092
WIP : nnet3 & python : rewrote the components.py and tdnn/lstm config gener…#1092vijayaditya wants to merge 8 commits intokaldi-asr:masterfrom
Conversation
…ators. 1. We ensure lstm and tdnn config generators are similar to each other i.e., xent-separate-forward-affine is supported in both architectures. 2. We print the number of learnable parameters in the main branch of the network and xent branch
|
@freewym @vimalmanohar Could you both please review this ? |
|
OK |
|
@danpovey Could you please comment on the specification for interleaving, prepending and appending TDNNs with LSTMs. I am currently following the below method. + parser = argparse.ArgumentParser(description="Writes config files and variables for LSTMs creation and training,"
+ " it also supports adding TDNN layers before, in between and after LSTMs."
+ " This is done by interpreting --splice-indexes, --num-lstm-layers and --lstm-start-layer-index.\n"
+ " When the splicing indexes at a layer corresponding to an LSTM is not [0] a TDNN layer is added before it.\n"
+ " e.g.\n --splice-indexes '-2,-1,0,1,2 0 0 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
+ " This will add 3 lstm layers.\n"
+ " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
+ " This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
+ " TDNN layer with splicing -3,0,3 + LSTM layer,\n"
+ " TDNN layer with splicing -3,0,3 + LSTM layer,\n"
+ " and an LSTM layer\n"
+ " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0 0' --num-lstm-layers 3 --lstm-start-layer-index 1 \n"
+ " This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
+ " TDNN layer with splicing -3,0,3 \n"
+ " TDNN layer with splicing -3,0,3 + LSTM layer,\n"
+ " LSTM layer,\n"
+ " and an LSTM layer\n"
+ " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0 0 -3,0,3 -3,0,3 -3,0,3' --num-lstm-layers 3 --lstm-start-layer-index 1 \n"
+ " This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
+ " TDNN layer with splicing -3,0,3 \n"
+ " TDNN layer with splicing -3,0,3 + LSTM layer,\n"
+ " TDNN layer with splicing -3,0,3 + LSTM layer,\n"
+ " and an LSTM layer\n"
+ ,But I think something more easy to read would be preferable. --layers "INPUT:-2,-1,0,1,2 AFFINE:-3,0,3 AFFINE:0 AFFINE:-3,0,3 LSTM:-3 AFFINE:-3,0,3 BLSTM:[-3,3] AFFINE:0 LSTM:-3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:0" |
|
BTW the numbers beside LSTM layers correspond to delays. |
| 'dimension': output_dim} | ||
| return {'output' : {'descriptor': '{0}_renorm'.format(name), | ||
| 'dimension': output_dim}, | ||
| 'num_learnable_params' : input['dimension'] * output_dim } |
There was a problem hiding this comment.
You may also need to add counts for bias params
| " This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n" | ||
| " TDNN layer with splicing -3,0,3 \n" | ||
| " TDNN layer with splicing -3,0,3 + LSTM layer,\n" | ||
| " TDNN layer with splicing -3,0,3 + LSTM layer,\n" |
There was a problem hiding this comment.
Should it be: LDA layer, TDNN with splicing -3,0,3, TDNN with splicing -3,0,3 + LSTM layer, LSTM layer, LSTM layer, TDNN with splicing -3,0,3, TDNN with splicing -3,0,3 ?
| parser.add_argument("--num-lstm-layers", type=int, | ||
| help="Number of LSTM layers to be stacked", default=1) | ||
| parser.add_argument("--lstm-start-layer-index", type=int, | ||
| help="Number of LSTM layers to be stacked", default=0) |
|
|
||
| if (num_hidden_layers < lstm_start_layer_index + num_lstm_layers): | ||
| raise Exception("num-lstm-layers : (number of lstm layers + lstm start layer index) " | ||
| " has to be greater than number of layers, decided based on splice-indexes") |
There was a problem hiding this comment.
"has to be smaller than..."
| " This is done by interpreting --splice-indexes, --num-lstm-layers and --lstm-start-layer-index.\n" | ||
| " When the splicing indexes at a layer corresponding to an LSTM is not [0] a TDNN layer is added before it.\n" | ||
| " e.g.\n --splice-indexes '-2,-1,0,1,2 0 0 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n" | ||
| " This will add 3 lstm layers.\n" |
There was a problem hiding this comment.
It would be better to ad the line "This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer" here as well.
egs/wsj/s5/steps/nnet3/components.py
Outdated
| 'dimension': output_dim} | ||
| return {'output' : {'descriptor': '{0}_affine'.format(name), | ||
| 'dimension': output_dim}, | ||
| 'num_learnable_params' : input['dimension'] * output_dim } |
There was a problem hiding this comment.
(input['dimension'] + 1) for bias
|
|
||
| # a final layer is added after each new layer as we are generating | ||
| # configs for layer-wise discriminative training | ||
| num_params_final, num_params_final_xent = nodes.AddFinalLayerWithXentRegularizer(config_lines, |
There was a problem hiding this comment.
The indentation is incorrect here.
| prev_layer_output = nodes.AddAffPnormLayer(config_lines, "Tdnn_{0}".format(i), | ||
| prev_layer_output, nonlin_input_dim, nonlin_output_dim, | ||
| norm_target_rms = 1.0 if i < num_hidden_layers -1 else final_layer_normalize_target) | ||
| if splice_indexes[i] == ]: |
5b041f1 to
26d0795
Compare
|
@vijayaditya, I just saw your question. I had some stuff that I had included in the reply here, but I've put it in a separate issue #1124 since I think it belongs there. The stuff I include in this post is a bit more specific to this issue. I think a specification like: However, I'm concerned about back compatibility. Perhaps you could give the option a different name in the command line, and have a mechanism to convert the old specifications into the new one (and print warnings if people use the old mechanism and say what the new-style equivalent would be)? Also, I see no reason in principle why you can't use spliced input from the previous layer, as an input to an LSTM or BLSTM layer. Is there a reason why this is never done, e.g. it doesn't work? In that case, it might be more natural to have the splicing indexes be optional, defaulting to 0, but make the LSTM and BLSTM delays like function parameters, and make it look like "... affine:-3,0,3 lstm(-3) affine:-3,0,3 blstm(-3,3) affine" or something like that. That way, if you wanted to have splicing on the LSTM or BLSTM layers, you could say e.g. "... lstm(-3,3):0,3... ". It seems to me that that would lead to a more "orthogonal" implementation. If we're going to re-do this stuff, I have another suggestion that relates to the way the dimensions are specified. @freewym found that it can be helpful to have geometrically increasing layer sizes [and I think this was merged.] It seems to me that we can specify the dimensions in this same string in an elegant way by saying e.g.: --layers "input:-2,-1,0,1,2 affine(dim=512):-3,0,3 affine affine:-3,0,3 lstm(delay=-3) affine(splice=-3,0,3) blstm(delay=-3,3) affine:-3,0,3 blstm(delay=-3,3) affine(dim=1024)" ... and wherever the dim was not specified, it would be geometrically interpolated between the specified dims. Note: there could be a process of taking a user-specified --layers input and "expanding it out" by setting implicitly set parameters, and if that generated a string, the string could be printed as a help to the user and for debugging. Because the blstm delay parameter contains a comma in this case, a different separator would be needed for multiple parameters, e.g. semicolon. Also bear in mind that @pegahgh is getting nice improvements from adding skip-layer connections with dropout that we gradually increase and then drop the skip-layer connection. There are various forms of this that she's trying. In this scenario, it turns out not to be necessary to initialize the network layer by layer. If there is going to be a rewrite of the config-generation, it might be a good idea to wait until that's done, or at least try to anticipate it, because getting rid of the layer-by-layer training would be a nice simplification. But it might be at least a couple of weeks before that's done, probably longer realistically, and would require more testing. |
|
I'm OK to commit these changes in principle, but be aware that I am working on the 'xconfig' stuff which will supersede all of this. So I think it might be better for you to focus your energy on helping make sure the 'xconfig' framework is working, and is compatible with the old setup, to the extent possible (e.g. it won't support discriminative pretraining since we haven't been finding that helpful). |
|
no longer relevant in the context of #1170 , so closing this. |
…ators.
Testing in progress.