-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Xconfigs : extension #1197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xconfigs : extension #1197
Conversation
|
@GaofengCheng Do you have time to write an xconfig class for the HLSTM. This would help us find any modifications necessary to support architectures which access internal nodes of "layers". |
1f990a4 to
dc26602
Compare
|
@vijayaditya I could do if a slight delay is tolerable |
|
@GaofengCheng no hurry do it at your convenience. |
|
I think this is ready for a round of review. |
danpovey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, some small comments.
| @@ -0,0 +1,233 @@ | |||
| #!/bin/bash | |||
|
|
|||
| # 6i is based on run_lstm_6h.sh, but changing the HMM context from triphone to left biphone. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment out of date? Perhaps give it a different letter and run the comparison script so you can update the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
| train_stage=-1 | ||
| get_egs_stage=-10 | ||
| speed_perturb=true | ||
| dir=exp/chain/lstm_6i_xconf # Note: _sp will get added to this if $speed_perturb == true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest 6i_xconf -> 6j or something like that.. don't want to start using _xconf in these names, we can just quietly switch over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
| @@ -0,0 +1,30 @@ | |||
| # This library has classes and methods to form neural network computation graphs, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to add copyright header.
|
|
||
|
|
||
| # 'ref.config' : which is a version of the config file used to generate | ||
| # a model for getting left and right context it doesn't read anything for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it -> (it
|
|
||
| class XconfigLayerBase(object): | ||
| """ A base-class for classes representing layers of xconfig files. | ||
| This mainly just sets self.layer_type, self.name and self.config/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove "This mainly just sets self.layer_type, self.name and self.config", as the base-class has a bunch of functions... that comment was outdated I think.
| def set_default_configs(self): | ||
| raise Exception("Child classes must override set_default_configs().") | ||
|
|
||
| # this is expected to be called after set_configs and before check_configs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is expected to be called -> is called in the constructor.
I think this default function definition is potentially dangerous, but I guess I could be OK with it.
|
|
||
| def input_dim(self): | ||
| dim = 0 | ||
| for input_name in self.get_input_descriptor_names(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this function should exist, at least not with this implementation-- in general we don't want to make any assumptions that if there are multiple input descriptors, they will be appended together. Having just 1 input is the norm, and auxiliary inputs might not be appended with the primary one [if you wanted that, you could just use Append() in the descriptor.]
Elsewhere the code sets
input_dim = self.descriptors['input']['dim']
which I think makes more sense [if there is an 'input' descriptor]. If you must have this function, I'd prefer it to use that expression.
|
|
||
| def set_default_configs(self): | ||
| self.config = {'input' : '[-1]', | ||
| 'cell-dim' : -1, # this is a compulsary argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compulsory
| configs.append("component name={0}.c1 type=ElementwiseProductComponent input-dim={1} output-dim={2}".format(name, 2 * cell_dim, cell_dim)) | ||
| configs.append("component name={0}.c2 type=ElementwiseProductComponent input-dim={1} output-dim={2}".format(name, 2 * cell_dim, cell_dim)) | ||
| configs.append("component name={0}.m type=ElementwiseProductComponent input-dim={1} output-dim={2}".format(name, 2 * cell_dim, cell_dim)) | ||
| configs.append("component name={0}.c type=ClipGradientComponent dim={1} {2}".format(name, cell_dim, clipgrad_str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vijay, we have checked in a change to how the gradients are clipped, and that change should be applied to this PR. The component name is different and the list of options is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not included this for now as I am running comparison experiments with the 6i setup (to see if I forgot some {param,bias}-stddev initialization .
I will make the change when I am done with the experiments.
| # Apache 2.0. | ||
|
|
||
| """ This module contains the top level xconfig parsing functions. | ||
| It has been separated from the utils module to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't have to explain changes-- no-one saw the original.
75bed1c to
d6fd64e
Compare
|
@freewym Could you please help check the blstm model. It is available here |
|
ok will do On Fri, Nov 18, 2016 at 7:05 PM Vijayaditya Peddinti <
|
|
Removing pretraining seems to help a lot. # System 7f 7g
# WER on train_dev(tg) 14.46 13.85
# WER on train_dev(fg) 13.23 12.67
# WER on eval2000(tg) 17.0 16.5
# WER on eval2000(fg) 15.4 14.8
# Final train prob -0.0882071 -0.0885075
# Final valid prob -0.107545 -0.113462
# Final train prob (xent) -1.26246 -1.25788
# Final valid prob (xent) -1.35525 -1.37058
|
|
Great! What is the improvement from, do you think? On Sat, Nov 19, 2016 at 5:16 PM, Vijayaditya Peddinti <
|
|
The only thing that changed was the removal of layer-wise pretraining. (I updated the previous message when I realized that I had forgotten to write the details of the two experiments, but github doesn't seem to send email alerts when an existing message is updated.) |
|
@vijayaditya It seems that the max-change option is missing for all diagonal matrices in BLSTM, according to /export/b01/vpeddinti/xconfig/egs/swbd/s5c/exp/chain/blstm_6j_sp/configs/final.config |
|
OK, thanks I will modify that. --Vijay On Sat, Nov 19, 2016 at 5:35 PM, Yiming Wang notifications@github.com
|
|
I'd like to get this xconfig stuff checked in ASAP, assuming it won't break On Sat, Nov 19, 2016 at 5:36 PM, Vijayaditya Peddinti <
|
|
The lstm training will be done in few hours. I would like to check it in There are still some python style changes to be made, but I can do this --Vijay On Sat, Nov 19, 2016 at 5:44 PM, Daniel Povey notifications@github.com
|
72e3261 to
4611e41
Compare
|
I think this is ready for merge. I will complete any style changes necessary over the next few days. Note : I will cleanup the local/chain/run_{tdnn,lstm,blstm}.sh scripts before merge. |
|
It's OK with me-- merge it yourself when you think it's ready. Thanks!! |
|
@vijayaditya I will help test after you have merged. |
|
trying to fix it by code change |
…ontext. This prevents a crash that has been happening since pull request #1197.
9a817a3 to
5026607
Compare
2. Added ability to provide strings with '=' as values in
xconfig.
3. Added swbd recipes for TDNN, LSTM and BLSTM using xconfig.
5026607 to
22c45ba
Compare
This is an extension of PR #1170 . I am currently testing the full pipeline. This also includes the commits from Xconfigs.