WIP : nnet3 & python : rewrote the components.py and tdnn/lstm config gener… by vijayaditya · Pull Request #1092 · kaldi-asr/kaldi

vijayaditya · 2016-10-03T23:02:34Z

…ators.

We ensure lstm and tdnn config generators are similar to each other i.e., xent-separate-forward-affine is supported in both architectures.
We print the number of learnable parameters in the main branch of the network and xent branch

Testing in progress.

…ators. 1. We ensure lstm and tdnn config generators are similar to each other i.e., xent-separate-forward-affine is supported in both architectures. 2. We print the number of learnable parameters in the main branch of the network and xent branch

vijayaditya · 2016-10-03T23:03:56Z

@freewym @vimalmanohar Could you both please review this ?

freewym · 2016-10-03T23:06:08Z

OK

…LSTMs

vijayaditya · 2016-10-04T01:17:18Z

@danpovey Could you please comment on the specification for interleaving, prepending and appending TDNNs with LSTMs. I am currently following the below method.

+    parser = argparse.ArgumentParser(description="Writes config files and variables for LSTMs creation and training,"
 +                                                 " it also supports adding TDNN layers before, in between and after LSTMs."
 +                                                 " This is done by interpreting --splice-indexes, --num-lstm-layers and --lstm-start-layer-index.\n"
 +                                                 " When the splicing indexes at a layer corresponding to an LSTM is not [0] a TDNN layer is added before it.\n"
 +                                                 " e.g.\n --splice-indexes '-2,-1,0,1,2 0 0 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
 +                                                 "      This will add 3 lstm layers.\n"
 +                                                 " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
 +                                                 "      This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          and an LSTM layer\n"
 +                                                 " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0 0' --num-lstm-layers 3 --lstm-start-layer-index 1 \n"
 +                                                 "      This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
 +                                                 "          TDNN layer with splicing -3,0,3 \n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          LSTM layer,\n"
 +                                                 "          and an LSTM layer\n"
 +                                                 " --splice-index '-2,-1,0,1,2 -3,0,3 -3,0,3 0 0 -3,0,3 -3,0,3 -3,0,3' --num-lstm-layers 3 --lstm-start-layer-index 1 \n"
 +                                                 "      This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
 +                                                 "          TDNN layer with splicing -3,0,3 \n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
 +                                                 "          and an LSTM layer\n"
 +                                                 ,

But I think something more easy to read would be preferable.
e.g.

--layers "INPUT:-2,-1,0,1,2 AFFINE:-3,0,3 AFFINE:0 AFFINE:-3,0,3 LSTM:-3 AFFINE:-3,0,3 BLSTM:[-3,3] AFFINE:0 LSTM:-3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:0"

vijayaditya · 2016-10-04T01:18:15Z

BTW the numbers beside LSTM layers correspond to delays.

freewym · 2016-10-04T06:23:23Z

egs/wsj/s5/steps/nnet3/components.py

-            'dimension': output_dim}
+    return {'output' : {'descriptor':  '{0}_renorm'.format(name),
+                        'dimension': output_dim},
+            'num_learnable_params' : input['dimension'] * output_dim }


You may also need to add counts for bias params

freewym · 2016-10-04T06:27:08Z

egs/wsj/s5/steps/nnet3/lstm/make_configs.py

+                                                 "      This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer, \n"
+                                                 "          TDNN layer with splicing -3,0,3 \n"
+                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"
+                                                 "          TDNN layer with splicing -3,0,3 + LSTM layer,\n"


Should it be: LDA layer, TDNN with splicing -3,0,3, TDNN with splicing -3,0,3 + LSTM layer, LSTM layer, LSTM layer, TDNN with splicing -3,0,3, TDNN with splicing -3,0,3 ?

freewym · 2016-10-04T06:29:52Z

egs/wsj/s5/steps/nnet3/lstm/make_configs.py

    parser.add_argument("--num-lstm-layers", type=int,
                        help="Number of LSTM layers to be stacked", default=1)
+    parser.add_argument("--lstm-start-layer-index", type=int,
+                        help="Number of LSTM layers to be stacked", default=0)


help info mismatches

freewym · 2016-10-04T06:30:56Z

egs/wsj/s5/steps/nnet3/lstm/make_configs.py

+
+    if (num_hidden_layers < lstm_start_layer_index + num_lstm_layers):
+        raise Exception("num-lstm-layers : (number of lstm layers + lstm start layer index) "
+                        " has to be greater than number of layers, decided based on splice-indexes")


"has to be smaller than..."

vimalmanohar · 2016-10-04T18:22:56Z

egs/wsj/s5/steps/nnet3/lstm/make_configs.py

+                                                 " This is done by interpreting --splice-indexes, --num-lstm-layers and --lstm-start-layer-index.\n"
+                                                 " When the splicing indexes at a layer corresponding to an LSTM is not [0] a TDNN layer is added before it.\n"
+                                                 " e.g.\n --splice-indexes '-2,-1,0,1,2 0 0 0' --num-lstm-layers 3 --lstm-start-layer-index 0 \n"
+                                                 "      This will add 3 lstm layers.\n"


It would be better to ad the line "This will add input layer with splicing -2,-1,0,1,2 followed by LDA layer" here as well.

vimalmanohar · 2016-10-04T20:14:36Z

egs/wsj/s5/steps/nnet3/components.py

-            'dimension': output_dim}
+    return {'output' : {'descriptor':  '{0}_affine'.format(name),
+                        'dimension': output_dim},
+            'num_learnable_params' : input['dimension'] * output_dim }


(input['dimension'] + 1) for bias

vimalmanohar · 2016-10-04T21:59:03Z

egs/wsj/s5/steps/nnet3/lstm/make_configs.py

+
+            # a final layer is added after each new layer as we are generating
+            # configs for layer-wise discriminative training
+            num_params_final, num_params_final_xent = nodes.AddFinalLayerWithXentRegularizer(config_lines,


The indentation is incorrect here.

vimalmanohar · 2016-10-04T22:05:15Z

egs/wsj/s5/steps/nnet3/tdnn/make_configs.py

-                prev_layer_output = nodes.AddAffPnormLayer(config_lines, "Tdnn_{0}".format(i),
-                                                           prev_layer_output, nonlin_input_dim, nonlin_output_dim,
-                                                           norm_target_rms = 1.0 if i < num_hidden_layers -1 else final_layer_normalize_target)
+            if splice_indexes[i] == ]:


danpovey · 2016-10-16T22:05:12Z

@vijayaditya, I just saw your question.

I had some stuff that I had included in the reply here, but I've put it in a separate issue #1124 since I think it belongs there. The stuff I include in this post is a bit more specific to this issue.
We'll have to decide the best way to proceed, regarding whether to continue with this pull request or not. Thinks about it and let me know your thoughts.

I think a specification like:
--layers "INPUT:-2,-1,0,1,2 AFFINE:-3,0,3 AFFINE:0 AFFINE:-3,0,3 LSTM:-3 AFFINE:-3,0,3 BLSTM:[-3,3] AFFINE:0 LSTM:-3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:-3,0,3 AFFINE:0"
is clearer, although I'd user lower case, and I don't see why the brackets are needed in BLSTM:[-3,3].

However, I'm concerned about back compatibility. Perhaps you could give the option a different name in the command line, and have a mechanism to convert the old specifications into the new one (and print warnings if people use the old mechanism and say what the new-style equivalent would be)?

Also, I see no reason in principle why you can't use spliced input from the previous layer, as an input to an LSTM or BLSTM layer. Is there a reason why this is never done, e.g. it doesn't work? In that case, it might be more natural to have the splicing indexes be optional, defaulting to 0, but make the LSTM and BLSTM delays like function parameters, and make it look like "... affine:-3,0,3 lstm(-3) affine:-3,0,3 blstm(-3,3) affine" or something like that. That way, if you wanted to have splicing on the LSTM or BLSTM layers, you could say e.g. "... lstm(-3,3):0,3... ". It seems to me that that would lead to a more "orthogonal" implementation.

If we're going to re-do this stuff, I have another suggestion that relates to the way the dimensions are specified. @freewym found that it can be helpful to have geometrically increasing layer sizes [and I think this was merged.] It seems to me that we can specify the dimensions in this same string in an elegant way by saying e.g.:

--layers "input:-2,-1,0,1,2 affine(dim=512):-3,0,3 affine affine:-3,0,3 lstm(delay=-3) affine(splice=-3,0,3) blstm(delay=-3,3) affine:-3,0,3 blstm(delay=-3,3) affine(dim=1024)"

... and wherever the dim was not specified, it would be geometrically interpolated between the specified dims. Note: there could be a process of taking a user-specified --layers input and "expanding it out" by setting implicitly set parameters, and if that generated a string, the string could be printed as a help to the user and for debugging. Because the blstm delay parameter contains a comma in this case, a different separator would be needed for multiple parameters, e.g. semicolon.
Here, I'm proposing to keep the splicing specification separate from the "optional parameters of the layers" because the splicing is in some sense separate from the layer itself, it's more about the input of the layer.

Also bear in mind that @pegahgh is getting nice improvements from adding skip-layer connections with dropout that we gradually increase and then drop the skip-layer connection. There are various forms of this that she's trying. In this scenario, it turns out not to be necessary to initialize the network layer by layer. If there is going to be a rewrite of the config-generation, it might be a good idea to wait until that's done, or at least try to anticipate it, because getting rid of the layer-by-layer training would be a nice simplification. But it might be at least a couple of weeks before that's done, probably longer realistically, and would require more testing.

danpovey · 2016-11-04T01:22:37Z

I'm OK to commit these changes in principle, but be aware that I am working on the 'xconfig' stuff which will supersede all of this. So I think it might be better for you to focus your energy on helping make sure the 'xconfig' framework is working, and is compatible with the old setup, to the extent possible (e.g. it won't support discriminative pretraining since we haven't been finding that helpful).
My plan is to finish it in the next couple of days to the extent that it supports non-recurrent setups, and have you help finish it by adding layer-types for LSTMs and the like, if you have time.

vijayaditya · 2016-11-10T18:16:02Z

no longer relevant in the context of #1170 , so closing this.

vijayaditya added 5 commits October 3, 2016 19:16

modified Lstm's ParseSpliceString method to support TDNNs after LSTMs

b5c9c34

corrected some inconsistencies in using ng_affine_options

428ef11

Better splice-indexes parsing

028afe0

Added support for adding TDNNs before LSTM layers

44cbb36

Added support for interleaving, pre-prending or appending TDNNs with …

0a155fd

…LSTMs

Minor bug corrections in lstm config generator

989654a

freewym reviewed Oct 4, 2016

View reviewed changes

vimalmanohar reviewed Oct 4, 2016

View reviewed changes

Few corrections addressing the comments in review

26d0795

vijayaditya force-pushed the nnet3_python_refactor branch from 5b041f1 to 26d0795 Compare October 4, 2016 22:50

vijayaditya closed this Nov 10, 2016

Conversation

vijayaditya commented Oct 3, 2016

Uh oh!

vijayaditya commented Oct 3, 2016

Uh oh!

freewym commented Oct 3, 2016

Uh oh!

vijayaditya commented Oct 4, 2016

Uh oh!

vijayaditya commented Oct 4, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Oct 16, 2016

Uh oh!

danpovey commented Nov 4, 2016

Uh oh!

vijayaditya commented Nov 10, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants