Skip to content

WIP : nnet3 & python : rewrote the components.py and tdnn/lstm config gener…#1092

Closed
vijayaditya wants to merge 8 commits intokaldi-asr:masterfrom
vijayaditya:nnet3_python_refactor
Closed

WIP : nnet3 & python : rewrote the components.py and tdnn/lstm config gener…#1092
vijayaditya wants to merge 8 commits intokaldi-asr:masterfrom
vijayaditya:nnet3_python_refactor

Conversation

@vijayaditya
Copy link
Contributor

…ators.

  1. We ensure lstm and tdnn config generators are similar to each other i.e., xent-separate-forward-affine is supported in both architectures.
  2. We print the number of learnable parameters in the main branch of the network and xent branch

Testing in progress.

…ators.

1. We ensure lstm and tdnn config generators are similar to each other i.e., xent-separate-forward-affine is supported in both architectures.
2. We print the number of learnable parameters in the main branch of the network and xent branch
@vijayaditya
Copy link
Contributor Author

@freewym @vimalmanohar Could you both please review this ?

@freewym
Copy link
Contributor

freewym commented Oct 3, 2016

OK

@vijayaditya
Copy link
Contributor Author

@danpovey Could you please comment on the specification for interleaving, prepending and appending TDNNs with LSTMs. I am currently following the below method.

+    parser = argparse.ArgumentParser(description="Writes config files and variables for LSTMs creation and training,"
 +                                                 " it also supports adding TDNN layers before, in between and after LSTMs."
 +                                                 " This is done by interpreting --splice-indexes, --num-lstm-layers and --lstm-start-layer-index.\n"
 +                                                 " When the splicing indexes at a layer corresponding to an LSTM is not [0] a TDNN layer is added before it.\n"
 +                                                 " e.g.\n --splice-indexes '-2,-1,0,1,2 0 0 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
 +                                                 "      This will add 3 lstm layers.\n"
 +                                                 " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
 +                                                 "      This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          and an LSTM layer\n"
 +                                                 " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0 0' --num-lstm-layers 3 --lstm-start-layer-index 1 \n"
 +                                                 "      This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
 +                                                 "          TDNN layer with splicing -3,0,3 \n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          LSTM layer,\n"
 +                                                 "          and an LSTM layer\n"
 +                                                 " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0 0 -3,0,3 -3,0,3 -3,0,3' --num-lstm-layers 3 --lstm-start-layer-index 1 \n"
 +                                                 "      This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
 +                                                 "          TDNN layer with splicing -3,0,3 \n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          and an LSTM layer\n"
 +                                                 ,

But I think something more easy to read would be preferable.
e.g.

--layers "INPUT:-2,-1,0,1,2 AFFINE:-3,0,3 AFFINE:0 AFFINE:-3,0,3 LSTM:-3 AFFINE:-3,0,3 BLSTM:[-3,3] AFFINE:0 LSTM:-3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:0"

@vijayaditya
Copy link
Contributor Author

BTW the numbers beside LSTM layers correspond to delays.

'dimension': output_dim}
return {'output' : {'descriptor': '{0}_renorm'.format(name),
'dimension': output_dim},
'num_learnable_params' : input['dimension'] * output_dim }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also need to add counts for bias params

" This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
" TDNN layer with splicing -3,0,3 \n"
" TDNN layer with splicing -3,0,3 + LSTM layer,\n"
" TDNN layer with splicing -3,0,3 + LSTM layer,\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be: LDA layer, TDNN with splicing -3,0,3, TDNN with splicing -3,0,3 + LSTM layer, LSTM layer, LSTM layer, TDNN with splicing -3,0,3, TDNN with splicing -3,0,3 ?

parser.add_argument("--num-lstm-layers", type=int,
help="Number of LSTM layers to be stacked", default=1)
parser.add_argument("--lstm-start-layer-index", type=int,
help="Number of LSTM layers to be stacked", default=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help info mismatches


if (num_hidden_layers < lstm_start_layer_index + num_lstm_layers):
raise Exception("num-lstm-layers : (number of lstm layers + lstm start layer index) "
" has to be greater than number of layers, decided based on splice-indexes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"has to be smaller than..."

" This is done by interpreting --splice-indexes, --num-lstm-layers and --lstm-start-layer-index.\n"
" When the splicing indexes at a layer corresponding to an LSTM is not [0] a TDNN layer is added before it.\n"
" e.g.\n --splice-indexes '-2,-1,0,1,2 0 0 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
" This will add 3 lstm layers.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to ad the line "This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer" here as well.

'dimension': output_dim}
return {'output' : {'descriptor': '{0}_affine'.format(name),
'dimension': output_dim},
'num_learnable_params' : input['dimension'] * output_dim }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(input['dimension'] + 1) for bias


# a final layer is added after each new layer as we are generating
# configs for layer-wise discriminative training
num_params_final, num_params_final_xent = nodes.AddFinalLayerWithXentRegularizer(config_lines,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is incorrect here.

prev_layer_output = nodes.AddAffPnormLayer(config_lines, "Tdnn_{0}".format(i),
prev_layer_output, nonlin_input_dim, nonlin_output_dim,
norm_target_rms = 1.0 if i < num_hidden_layers -1 else final_layer_normalize_target)
if splice_indexes[i] == ]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here?

@vijayaditya vijayaditya force-pushed the nnet3_python_refactor branch from 5b041f1 to 26d0795 Compare October 4, 2016 22:50
@danpovey
Copy link
Contributor

@vijayaditya, I just saw your question.

I had some stuff that I had included in the reply here, but I've put it in a separate issue #1124 since I think it belongs there. The stuff I include in this post is a bit more specific to this issue.
We'll have to decide the best way to proceed, regarding whether to continue with this pull request or not. Thinks about it and let me know your thoughts.

I think a specification like:
--layers "INPUT:-2,-1,0,1,2 AFFINE:-3,0,3 AFFINE:0 AFFINE:-3,0,3 LSTM:-3 AFFINE:-3,0,3 BLSTM:[-3,3] AFFINE:0 LSTM:-3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:0"
is clearer, although I'd user lower case, and I don't see why the brackets are needed in BLSTM:[-3,3].

However, I'm concerned about back compatibility. Perhaps you could give the option a different name in the command line, and have a mechanism to convert the old specifications into the new one (and print warnings if people use the old mechanism and say what the new-style equivalent would be)?

Also, I see no reason in principle why you can't use spliced input from the previous layer, as an input to an LSTM or BLSTM layer. Is there a reason why this is never done, e.g. it doesn't work? In that case, it might be more natural to have the splicing indexes be optional, defaulting to 0, but make the LSTM and BLSTM delays like function parameters, and make it look like "... affine:-3,0,3 lstm(-3) affine:-3,0,3 blstm(-3,3) affine" or something like that. That way, if you wanted to have splicing on the LSTM or BLSTM layers, you could say e.g. "... lstm(-3,3):0,3... ". It seems to me that that would lead to a more "orthogonal" implementation.

If we're going to re-do this stuff, I have another suggestion that relates to the way the dimensions are specified. @freewym found that it can be helpful to have geometrically increasing layer sizes [and I think this was merged.] It seems to me that we can specify the dimensions in this same string in an elegant way by saying e.g.:

--layers "input:-2,-1,0,1,2 affine(dim=512):-3,0,3 affine affine:-3,0,3 lstm(delay=-3) affine(splice=-3,0,3) blstm(delay=-3,3) affine:-3,0,3 blstm(delay=-3,3) affine(dim=1024)"

... and wherever the dim was not specified, it would be geometrically interpolated between the specified dims. Note: there could be a process of taking a user-specified --layers input and "expanding it out" by setting implicitly set parameters, and if that generated a string, the string could be printed as a help to the user and for debugging. Because the blstm delay parameter contains a comma in this case, a different separator would be needed for multiple parameters, e.g. semicolon.
Here, I'm proposing to keep the splicing specification separate from the "optional parameters of the layers" because the splicing is in some sense separate from the layer itself, it's more about the input of the layer.

Also bear in mind that @pegahgh is getting nice improvements from adding skip-layer connections with dropout that we gradually increase and then drop the skip-layer connection. There are various forms of this that she's trying. In this scenario, it turns out not to be necessary to initialize the network layer by layer. If there is going to be a rewrite of the config-generation, it might be a good idea to wait until that's done, or at least try to anticipate it, because getting rid of the layer-by-layer training would be a nice simplification. But it might be at least a couple of weeks before that's done, probably longer realistically, and would require more testing.

@danpovey
Copy link
Contributor

danpovey commented Nov 4, 2016

I'm OK to commit these changes in principle, but be aware that I am working on the 'xconfig' stuff which will supersede all of this. So I think it might be better for you to focus your energy on helping make sure the 'xconfig' framework is working, and is compatible with the old setup, to the extent possible (e.g. it won't support discriminative pretraining since we haven't been finding that helpful).
My plan is to finish it in the next couple of days to the extent that it supports non-recurrent setups, and have you help finish it by adding layer-types for LSTMs and the like, if you have time.

@vijayaditya
Copy link
Contributor Author

no longer relevant in the context of #1170 , so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants