fixed "max_deriv_time unset" issue for BLSTM #1165

freewym · 2016-11-01T21:28:15Z

No description provided.

danpovey · 2016-11-01T21:39:43Z

egs/wsj/s5/steps/nnet3/chain/train.py

        deriv_time_opts += " --optimization.min-deriv-time={0}".format(left_deriv_truncate)
    if right_deriv_truncate is not None:
-        deriv_time_opts += " --optimization.max-deriv-time={0}".format(int(chunk-width-right_deriv_truncate))
+        deriv_time_opts += " --optimization.max-deriv-time={0}".format(int(chunk_width - 1 - right_deriv_truncate))


This was an issue with the code before your change, but we should still address it:
This code would only make sense if the "correct" settings of right-deriv-truncate were zero or negative.
E.g. suppose you wanted to process derivatives for up to 5 frames before and after the end of the supervision, you'd have to set --left-deriv-truncate=-5 and --right-deriv-truncate=-5.
This is kind of weird and unintuitive, IMO.
Also, it's not clear to me why these options are part of the 'chain' namespace (e.g. --chain.left-deriv-truncate) since they relate to the generic nnet3 framework and not to the chain models specifically.
What I propose is to add a new option --trainer.deriv-truncate-margin [default -1 meaning unset; but you can set it to any value >= 0].
Setting this to x >= 0 would lead it to set the command-line options --optimization.min-deriv-time=-x and --optimization.max-deriv-time=chunk_width - 1 + x
The --chain.min-deriv-time option would be retained only for back compatibility; if used it would print a warning and would set deriv-truncate-margin to the negative of that value.

I guess left-deriv-truncate was originally intended to be non-negative to truncate the deriv within chunk_width. Anyway, I have made changes to add --trainer.deriv-truncate-margin

danpovey · 2016-11-01T23:46:39Z

egs/wsj/s5/steps/nnet3/chain/train.py

+        args.deriv_truncate_margin = -args.left_deriv_truncate
+        logger.warning("--chain.left-deriv-truncate (deprecated) is set by user, so --trainer.deriv-truncate-margin is set to negative of that value={0}.".format(args.deriv_truncate_margin))
+
+    if (not args.deriv_truncate_margin is None) and args.deriv_truncate_margin < 0:


Actually, since you're using None for the default, there is no need to specify that it must be >= 0. You can remove that check.

danpovey · 2016-11-01T23:47:00Z

egs/wsj/s5/steps/nnet3/chain/train.py

                        help="Number of sequences to be processed in parallel every minibatch" )
+    parser.add_argument("--trainer.deriv-truncate-margin", type=int, dest='deriv_truncate_margin',
+                        default = None,
+                        help="If specified, it is the number of frames that the deriv will be backproped through out of the range [0, chunk_width-1];"


please watch line length.
backproped -> backpropagated

danpovey · 2016-11-01T23:47:27Z

Thanks- please run a test on on of those setups with setting that value to 5.

freewym · 2016-11-02T00:59:51Z

I set the value to 5 ,and the min-deriv-time and max-deriv-time are set as expected.

danpovey · 2016-11-02T01:18:26Z

@vijayaditya, if this is OK with you I can merge now.

danpovey · 2016-11-02T01:20:59Z

OK good, but I was more wondering about the effect on WER.

On Tue, Nov 1, 2016 at 8:59 PM, Yiming Wang notifications@github.com
wrote:

I set the value to 5 ,and the min-deriv-time and max-deriv-time are set as
expected.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1165 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu-8RDcCRYonCO1o7FRX4zDxyKv_hks5q5-CKgaJpZM4KmmiU
.

freewym · 2016-11-02T01:24:11Z

I will run it to completion.

vijayaditya · 2016-11-03T16:15:31Z

egs/wsj/s5/steps/nnet3/chain/train.py

                        help="Number of sequences to be processed in parallel every minibatch" )
+    parser.add_argument("--trainer.deriv-truncate-margin", type=int, dest='deriv_truncate_margin',
+                        default = None,
+                        help="If specified, it is the number of frames that the deriv will be backpropagated through "


deriv --> derivative. Please provide an example of how this parameter is used.

help="If specified, it is the number of time steps the derivative will be backpropagated through. It takes the values between [0, chunk_width - 1]. e.g. During BLSTM model training if the chunk-width is 150, chunk-left-context is 40 and chunk-right-context is 40 specifying --trainer.deriv-truncate-margin as ......\

vijayaditya · 2016-11-03T16:16:56Z

egs/wsj/s5/steps/nnet3/chain/train.py

    if args.chunk_right_context < 0:
        raise Exception("--egs.chunk-right-context should be non-negative")

+    if not args.left_deriv_truncate is None:


We recommend using the option --trainer.deriv-truncate-margin.

vijayaditya · 2016-11-03T16:20:35Z

egs/wsj/s5/steps/nnet3/chain/train.py

@@ -463,10 +467,10 @@ def TrainOneIteration(dir, iter, srand, egs_dir,
    TrainNewModels(dir, iter, srand, num_jobs, num_archives_processed, num_archives,


Use named arguments to avoid user errors during function call.

Are you suggesting use named arguments for all of them? if so we might also need to use named arguments for other function calls for the same reason. I think it might not be necessary since this function is called once within only this script.

I am suggesting that you use named arguments while calling the function and not change the function definition.

We have been constantly updating the argument list for these function, so better change it to use named arguments to avoid user errors. Vimal and I have been modifying all the function calls with more than a few arguments to use named arguments so I would recommend that here too.

danpovey · 2016-11-03T22:05:52Z

@vijayaditya, merge this when you think it's ready.

vijayaditya · 2016-11-03T22:20:24Z

@freewym I am assuming you took care of blstm scripts in local for all egs. I will merge once you rebase.

danpovey · 2016-11-03T22:21:53Z

Vijay, lately I have been using the "squash and merge" button. You can
edit the list of commit names in a little pop-up box if there are
objectionable things there.

On Thu, Nov 3, 2016 at 6:20 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

@freewym https://github.com/freewym I am assuming you took care of
blstm scripts in local for all egs. I will merge once you rebase.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1165 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu3Uf8yN0docKRF4oSoHkCczSgKxhks5q6l4sgaJpZM4KmmiU
.

freewym · 2016-11-03T22:23:58Z

Right now the fixed BLSTM on swbd is 0.3 worse in WER. I am testing if extending the backprop to some more frames would help.

danpovey · 2016-11-03T22:25:45Z

Remember to look at the WERs on all of eval2000 (subset numbers add no
value) and also the train_dev. It's the sum of those two WER differences
that are the most meaningful number (lowest variance, if you want to get
technical). What were the WER differences on those two test sets?

On Thu, Nov 3, 2016 at 6:24 PM, Yiming Wang notifications@github.com
wrote:

Right now the fixed BLSTM on swbd is 0.3 worse in WER. I am testing if
extending the backprop to some more frames would help.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1165 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5_AWevylw1VuO_PXYcEXUbj2JXuks5q6l8CgaJpZM4KmmiU
.

vijayaditya · 2016-11-03T22:27:30Z

I am comfortable using squash and merge if the branch is up-to-date. But in this case I am concerned about the staleness of the branch. I sometimes find auto-merges can mess up the logic, so I am usually recommending that the developers run their unit-tests once they rebase.
What would you suggest ?

danpovey · 2016-11-03T22:29:15Z

OK I guess.

On Thu, Nov 3, 2016 at 6:27 PM, Vijayaditya Peddinti <
notifications@github.com> wrote:

I am comfortable using squash and merge if the branch is up-to-date. But
in this case I am concerned about the staleness of the branch. I sometimes
find auto-merges can mess up the logic, so I am usually recommending that
the developers run their unit-tests once they rebase.
What would you suggest ?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1165 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu5DuB2cfAKYL9YbY15K5pPu4sKomks5q6l_WgaJpZM4KmmiU
.

freewym · 2016-11-03T22:38:28Z

They are both worse, by 0.3 and <0.18 respectively

%WER 15.2 | 4459 42989 | 86.4 9.2 4.4 1.6 15.2 51.5 | exp/nnet3/lstm_bidirectional_max_deriv_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 16.3 | 4459 42989 | 85.5 9.9 4.6 1.7 16.3 53.3 | exp/nnet3/lstm_bidirectional_max_deriv_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 14.09 [ 6828 / 48460, 745 ins, 1794 del, 4289 sub ] exp/nnet3/lstm_bidirectional_max_deriv_sp/decode_train_dev_sw1_tg//wer_10_0.0
%WER 13.23 [ 6409 / 48460, 676 ins, 1807 del, 3926 sub ] exp/nnet3/lstm_bidirectional_max_deriv_sp/decode_train_dev_sw1_fsh_fg//wer_11_0.0

%WER 14.9 | 4459 42989 | 86.7 9.1 4.2 1.6 14.9 50.7 | exp/nnet3/lstm_bidirectional_adversary0.0_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 16.0 | 4459 42989 | 85.7 9.8 4.5 1.7 16.0 52.7 | exp/nnet3/lstm_bidirectional_adversary0.0_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 13.91 [ 6739 / 48460, 730 ins, 1790 del, 4219 sub ] exp/nnet3/lstm_bidirectional_adversary0.0_sp/decode_train_dev_sw1_tg//wer_10_0.0
%WER 13.19 [ 6394 / 48460, 718 ins, 1768 del, 3908 sub ] exp/nnet3/lstm_bidirectional_adversary0.0_sp/decode_train_dev_sw1_fsh_fg//wer_10_0.0

danpovey · 2016-11-03T22:43:29Z

Was the objective function worse than the baseline?
You can run local/info/nnet3_dir_info.pl, that shows the objfs very compactly.

freewym · 2016-11-03T22:49:02Z

Yes, also a little worse in objf:
exp/nnet3/lstm_bidirectional_max_deriv_sp:
loglike:train/valid[454,683,combined]=(-0.77,-0.63,-0.61/-0.94,-0.91,-0.90)

exp/nnet3/lstm_bidirectional_adversary0.0_sp:
loglike:train/valid[454,683,combined]=(-0.76,-0.61,-0.60/-0.95,-0.89,-0.88)

danpovey · 2016-11-05T20:46:09Z

@freewym, have the experiments with margin=5 finished?

freewym · 2016-11-05T20:55:12Z

Its WER is worse by 0.2(for eval2000) and 0.13 (for train_dev) using blstm chain model on swbd. I am increasing the margin to 20.

danpovey · 2016-11-05T20:56:49Z

It's possible that it's just random noise-- you might want to rerun the
baseline with a different srand seed. And check that nothing changed
regarding, for instance, per-component max change.

On Sat, Nov 5, 2016 at 4:55 PM, Yiming Wang notifications@github.com
wrote:

Its WER is worse by 0.2(for eval2000) and 0.13 (for train_dev) using blstm
chain model on swbd. I am increasing the margin to 20.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#1165 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu_BmEQ3NHERbk-0W901kQKb1oIfpks5q7O0zgaJpZM4KmmiU
.

danpovey · 2016-11-13T00:38:41Z

@freewym, have you got any further results on this?

freewym · 2016-11-13T00:53:11Z

On ami ihm, the WER is margin=10 < margin=5 < "old" setup using blstm+xent model, which shows the fix can at least achieve the same performance on this data. I am now testing on sdm1.

danpovey · 2016-11-17T02:44:06Z

@freewym, let me know when you think this is ready to merge.

freewym · 2016-11-17T03:01:05Z

egs/wsj/s5/steps/nnet3/chain/train.py

-        deriv_time_opts += " --optimization.min-deriv-time={0}".format(left_deriv_truncate)
-    if right_deriv_truncate is not None:
-        deriv_time_opts += " --optimization.max-deriv-time={0}".format(int(chunk-width-right_deriv_truncate))
+    if not left_deriv_truncate is None:


@danpovey do you think it is better to pass in {min|max}_deriv_time instead of {left|right}_deriv_truncate? In that way 1) we don't need to pass in the argument chunk_width all along the way 2) we can compute the deriv_time in a much more outer function like in Train(); 3) it is consistent with train_rnn.py

danpovey · 2016-11-17T03:04:39Z

That would be fine with me.

On Wed, Nov 16, 2016 at 10:01 PM, Yiming Wang notifications@github.com
wrote:

@freewym commented on this pull request.

In egs/wsj/s5/steps/nnet3/chain/train.py
#1165 (review):

@@ -340,10 +348,10 @@ def TrainNewModels(dir, iter, srand, num_jobs, num_archives_processed, num_archi
# but we use the same script for consistency with FF-DNN code
 deriv_time_opts=""
if left_deriv_truncate is not None:
   deriv_time_opts += " --optimization.min-deriv-time={0}".format(left_deriv_truncate)
if right_deriv_truncate is not None:
   deriv_time_opts += " --optimization.max-deriv-time={0}".format(int(chunk-width-right_deriv_truncate))
if not left_deriv_truncate is None:
@danpovey https://github.com/danpovey do you think it is better to pass
in {min|max}_deriv_time instead of {left|right}_deriv_truncate? In that way

we don't need to pass in the argument chunk_width all along the way 2)
we can compute the deriv_time in a much more outer function like in
Train(); 3) it is consistent with train_rnn.py

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1165 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu77WLK7EsALT8eCq8CAR5mMpJD2Sks5q-8NzgaJpZM4KmmiU
.

…t lists

freewym · 2016-11-17T18:22:21Z

@danpovey BLSTM+xent on sdm1 using the margin of 10 improves WER by 0.6 on dev and 0.2 on eval, respectively. All of those tests are using the old ClipGradientComponent. I think it is ready to merge. Perhaps I need to further tune the zeroing threshold in BackpropTruncationComponent with this fix.

danpovey · 2016-11-17T19:33:52Z

I'll merge this now, you can make a separate simple commit to change the
zeroing threshold.

On Thu, Nov 17, 2016 at 1:22 PM, Yiming Wang notifications@github.com
wrote:

@danpovey https://github.com/danpovey BLSTM+xent on sdm1 using the
margin of 10 improves WER by 0.6 on dev and 0.2 on eval, respectively. All
of those tests are using the old ClipGradientComponent. I think it is ready
to merge. Perhaps I need to further tune the zeroing threshold in
BackpropTruncationComponent with this fix.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1165 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu7pfV9P9nr9IlypfZx4sMbaGakHmks5q_JtfgaJpZM4KmmiU
.

freewym · 2016-11-17T19:54:29Z

@vimalmanohar You may have to make the changes in #1066

danpovey reviewed Nov 1, 2016

View reviewed changes

vijayaditya reviewed Nov 3, 2016

View reviewed changes

freewym force-pushed the max_deriv_time branch 2 times, most recently from adb336a to bb54853 Compare November 4, 2016 03:09

freewym force-pushed the max_deriv_time branch from bb54853 to 0049568 Compare November 16, 2016 22:15

fixed max_deriv_time unset issue for BLSTM

0482e82

freewym force-pushed the max_deriv_time branch from 0049568 to 0482e82 Compare November 16, 2016 22:33

freewym commented Nov 17, 2016

View reviewed changes

change {left|right}_deriv_truncate to {min|max}_deriv_time in argumen…

52fabe5

…t lists

freewym force-pushed the max_deriv_time branch from 2e03607 to 52fabe5 Compare November 17, 2016 18:15

danpovey merged commit 5874bc4 into kaldi-asr:master Nov 17, 2016

freewym deleted the max_deriv_time branch November 18, 2016 18:27

		@@ -463,10 +467,10 @@ def TrainOneIteration(dir, iter, srand, egs_dir,
		TrainNewModels(dir, iter, srand, num_jobs, num_archives_processed, num_archives,

fixed "max_deriv_time unset" issue for BLSTM #1165

fixed "max_deriv_time unset" issue for BLSTM #1165

Uh oh!

Conversation

freewym commented Nov 1, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Nov 1, 2016

Uh oh!

freewym commented Nov 2, 2016

Uh oh!

danpovey commented Nov 2, 2016

Uh oh!

danpovey commented Nov 2, 2016

Uh oh!

freewym commented Nov 2, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Nov 3, 2016

Uh oh!

vijayaditya commented Nov 3, 2016

Uh oh!

danpovey commented Nov 3, 2016

Uh oh!

freewym commented Nov 3, 2016

Uh oh!

danpovey commented Nov 3, 2016

Uh oh!

vijayaditya commented Nov 3, 2016

Uh oh!

danpovey commented Nov 3, 2016

Uh oh!

freewym commented Nov 3, 2016

Uh oh!

danpovey commented Nov 3, 2016

Uh oh!

freewym commented Nov 3, 2016

Uh oh!

danpovey commented Nov 5, 2016

Uh oh!

freewym commented Nov 5, 2016

Uh oh!

danpovey commented Nov 5, 2016

Uh oh!

danpovey commented Nov 13, 2016

Uh oh!

freewym commented Nov 13, 2016

Uh oh!

danpovey commented Nov 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Nov 17, 2016

@freewym commented on this pull request.

Uh oh!

freewym commented Nov 17, 2016

Uh oh!

danpovey commented Nov 17, 2016

Uh oh!

freewym commented Nov 17, 2016

Uh oh!

Reviewers

Assignees

Labels