-
Notifications
You must be signed in to change notification settings - Fork 5.4k
changed the definition of deriv-truncate-margin option. Now the margi… #1224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I wonder if this will affect what the optimal deriv-truncate option is, e.g. from 10 to 5? Do you plan to do some experimentation before this is merged? I assume it would change the behavior of the existing setups. |
|
Since the model contexts include the input splicing (usually [-2,-1,0,1,2]), so now {min|max}-deriv-time are actually offset by 2. If we want to make them the same as before, we just need to set deriv-truncate-margin=8. |
…n is defined on top of the model contexts, which applies to more general network archs
| "e.g., During BLSTM model training if the chunk-width=150 and deriv-truncate-margin=5, then the derivative will be " | ||
| "backpropagated up to t=-5 and t=154 in the forward and backward LSTM sequence respectively; " | ||
| "e.g., if chunk-width=150, model-left-context=2, model-right-context=10 and deriv-truncate-margin=5, " | ||
| "then the derivative will be backpropagated up to t=-5-2=7 and t=149+5+10=164 to left and right respectively; " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these lines are too long. Also -5-2 != 7. I think this option is unclear though. It might be better to say:
(Relevant only for recurrent models). If specified, gives the margin (in input frames) around
the 'required' part of each chunk that the derivatives are propagated to. If unset, the
derivatives are propagated all the way to the boundaries of the input data. E.g. 10 is a
reasonable setting. Note: the 'required' part of the chunk is defined by the
model's {left,right}-context.
…max deriv time is consistent with their xent counterparts and the previous chain recipes
|
I changed the margin to 8. Now it is equivalent to the previous margin=10. I think We could change it in the future if we find a better margin. |
|
OK, I guess this is pretty harmless. Merging. |
|
@vimalmanohar you might also need to make a minor change according to the change made in train.py here. |
|
I think it shows up as a merge conflict-- I know because I merged this
commit into my branch that has Vimal's changes. But it's straightforward to
resolve.
…On Mon, Nov 28, 2016 at 7:46 PM, Yiming Wang ***@***.***> wrote:
@vimalmanohar <https://github.com/vimalmanohar> you might also need to
make a minor change according to the change made in train.py here.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1224 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu2FBe8OsEaHBmRMUruhNqtrsi-oWks5rC3XhgaJpZM4K-T_x>
.
|
|
Hmm... the same option should probably appear in train_rnn.py and
train_raw_rnn.py, but it doesn't. Vimal, you should probably take care of
that (if you think that options should be there, which I think makes sense.)
I was also a little confused by this code (from Vimal's branch).. I assume
it's for back compatibility?
parser.add_argument("--trainer.num-chunk-per-minibatch",
"--trainer.rnn.num-chunk-per-minibatch",
type=int,
dest='num_chunk_per_minibatch', default=512,
help="Number of sequences to be processed in "
"parallel every minibatch")
…On Mon, Nov 28, 2016 at 8:10 PM, Daniel Povey ***@***.***> wrote:
I think it shows up as a merge conflict-- I know because I merged this
commit into my branch that has Vimal's changes. But it's straightforward to
resolve.
On Mon, Nov 28, 2016 at 7:46 PM, Yiming Wang ***@***.***>
wrote:
> @vimalmanohar <https://github.com/vimalmanohar> you might also need to
> make a minor change according to the change made in train.py here.
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <#1224 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ADJVu2FBe8OsEaHBmRMUruhNqtrsi-oWks5rC3XhgaJpZM4K-T_x>
> .
>
|
|
Right now train_rnn.py uses --trainer.rnn.num-bptt-steps to determine {min,max}-deriv-time, which assumes it network is "pure" recurrent; if necessary I can add the option --trainer.deriv-truncate-margin and deprecate num-bptt-steps. |
|
I'm concerned that any changes you make to currently-checked-in code will
clash with Vimal's PR. Is it possible for you to make a patch against
Vimal's PR, and create a WIP PR that contains Vimal's PR plus your changes
on top of it? Then Vimal or I can pull changes from that branch.
Also, Vimal, can you please merge master into your branch? It will make my
life easier in figuring out what changes I have in my LSTM-integration
branch, versus your branch. There may be some bug fixes from me that
should be included.
BTW, one option I am considering is to merge your changes along with my
fast-lstm changes, since I'm working on top of your PR.
…On Mon, Nov 28, 2016 at 8:47 PM, Yiming Wang ***@***.***> wrote:
Right now train_rnn.py uses --trainer.rnn.num-bptt-steps to determine
{min,max}-deriv-time, which assumes it network is "pure" recurrent; if
necessary I can add the option --trainer.deriv-truncate-margin and
deprecate num-bptt-steps.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1224 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu62KUQt2A52tmO4ychctKe9p1qHVks5rC4RNgaJpZM4K-T_x>
.
|
|
OK. + @vimalmanohar in case you didn't get noticed from here |
…n is defined on top of the model contexts, which applies to more general network archs