Skip to content

Conversation

@freewym
Copy link
Contributor

@freewym freewym commented Nov 28, 2016

…n is defined on top of the model contexts, which applies to more general network archs

@danpovey
Copy link
Contributor

I wonder if this will affect what the optimal deriv-truncate option is, e.g. from 10 to 5? Do you plan to do some experimentation before this is merged? I assume it would change the behavior of the existing setups.

@freewym
Copy link
Contributor Author

freewym commented Nov 28, 2016

Since the model contexts include the input splicing (usually [-2,-1,0,1,2]), so now {min|max}-deriv-time are actually offset by 2. If we want to make them the same as before, we just need to set deriv-truncate-margin=8.

…n is defined on top of the model contexts, which applies to more general network archs
"e.g., During BLSTM model training if the chunk-width=150 and deriv-truncate-margin=5, then the derivative will be "
"backpropagated up to t=-5 and t=154 in the forward and backward LSTM sequence respectively; "
"e.g., if chunk-width=150, model-left-context=2, model-right-context=10 and deriv-truncate-margin=5, "
"then the derivative will be backpropagated up to t=-5-2=7 and t=149+5+10=164 to left and right respectively; "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines are too long. Also -5-2 != 7. I think this option is unclear though. It might be better to say:

(Relevant only for recurrent models). If specified, gives the margin (in input frames) around
the 'required' part of each chunk that the derivatives are propagated to. If unset, the
derivatives are propagated all the way to the boundaries of the input data. E.g. 10 is a
reasonable setting. Note: the 'required' part of the chunk is defined by the
model's {left,right}-context.

…max deriv time is consistent with their xent counterparts and the previous chain recipes
@freewym
Copy link
Contributor Author

freewym commented Nov 28, 2016

I changed the margin to 8. Now it is equivalent to the previous margin=10. I think We could change it in the future if we find a better margin.

@danpovey
Copy link
Contributor

OK, I guess this is pretty harmless. Merging.

@danpovey danpovey merged commit e95aeee into kaldi-asr:master Nov 29, 2016
@freewym freewym deleted the deriv-trunc branch November 29, 2016 00:42
@freewym
Copy link
Contributor Author

freewym commented Nov 29, 2016

@vimalmanohar you might also need to make a minor change according to the change made in train.py here.

@danpovey
Copy link
Contributor

danpovey commented Nov 29, 2016 via email

@danpovey
Copy link
Contributor

danpovey commented Nov 29, 2016 via email

@freewym
Copy link
Contributor Author

freewym commented Nov 29, 2016

Right now train_rnn.py uses --trainer.rnn.num-bptt-steps to determine {min,max}-deriv-time, which assumes it network is "pure" recurrent; if necessary I can add the option --trainer.deriv-truncate-margin and deprecate num-bptt-steps.

@danpovey
Copy link
Contributor

danpovey commented Nov 29, 2016 via email

@freewym
Copy link
Contributor Author

freewym commented Nov 29, 2016

OK. + @vimalmanohar in case you didn't get noticed from here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants