Max change per component #918

freewym · 2016-07-25T20:12:00Z

According to the commit:
9569e57

danpovey · 2016-07-25T20:16:57Z

Vijay, please review this..

On Mon, Jul 25, 2016 at 1:12 PM, Yiming Wang [email protected]
wrote:

According to the commit:
9569e57

9569e57

You can view, comment on, or merge this pull request online at:

#918
Commit Summary

add back max-change-per-component

update blstm results on swbd

File Changes

M egs/swbd/s5c/RESULTS
https://github.com/kaldi-asr/kaldi/pull/918/files#diff-0 (14)

M src/nnet3/nnet-simple-component.cc
https://github.com/kaldi-asr/kaldi/pull/918/files#diff-1 (71)

M src/nnet3/nnet-simple-component.h
https://github.com/kaldi-asr/kaldi/pull/918/files#diff-2 (13)

Patch Links:

https://github.com/kaldi-asr/kaldi/pull/918.patch

https://github.com/kaldi-asr/kaldi/pull/918.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#918, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuxIzq1FcHsA6MITZ64spsGMIra5tks5qZRiTgaJpZM4JUf8N
.

vijayaditya · 2016-07-25T20:31:48Z

OK will do.

vijayaditya · 2016-07-29T21:23:35Z

Starting the review.

vijayaditya · 2016-07-29T21:26:04Z

egs/swbd/s5c/RESULTS

-%WER 16.0 | 4459 42989 | 85.6 9.9 4.5 1.6 16.0 52.7 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys
-%WER 19.6 | 2628 21594 | 82.5 12.1 5.5 2.1 19.6 54.8 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.callhm.filt.sys
-%WER 20.7 | 2628 21594 | 81.4 12.9 5.7 2.2 20.7 56.8 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.callhm.filt.sys
+# bidirectional LSTM with the same configuration as the above experiment, with self-repair of all nonliearities and clipgradient, and max-change-per-component activated


Please preserve these results somewhere, may be in the actual script next to the max-change option or at the bottom of the results file.

danpovey · 2016-08-27T23:58:29Z

src/nnet3/nnet-training.cc

+                      << " change too big: " << std::sqrt(dot_prod) << " > "
+                      << "--max-change=" << max_param_change_per_comp
+                      << ", scaling by " << scale_factors(i);
+      } else


google style guides mandates braces on else if 'if' had braces.

danpovey · 2016-08-27T23:59:35Z

@freewym, what is the status of this commit? Are we still waiting on experiments?

freewym · 2016-08-28T00:03:33Z

Yes. Once the babel jobs are complete, I will resume the experiments.

vijayaditya · 2016-09-15T22:30:04Z

@freewym Is this ready for review ?

freewym · 2016-09-15T22:34:47Z

Yes, but I am still running experiments on different dataset trying to
figure out the best value of max changes. The current one works fine
for the experiments
I ran so far.

On Thursday, September 15, 2016, Vijayaditya Peddinti <
[email protected]> wrote:

@freewym https://github.com/freewym Is this ready for review ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADWAklZAAR7cOUPEIkbfiSKuV3P6nY2rks5qqcb1gaJpZM4JUf8N
.

Sent from my iPhone

freewym · 2016-09-19T20:35:03Z

I have tested "max-change per component" with the current thresholds (1.5 for final layer, and 0.75 for others) on swbd, ami ihm, tedium, and babel georgian using BLSTM xent model. WERs of all of these experiments get improved over the global-max-change baseline or remain the same, although the llhoods get a little worse. Specifically:
on swbd:
%WER 14.6 | 4459 42989 | 86.9 8.9 4.2 1.5 14.6 50.5 | exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 15.7 | 4459 42989 | 85.9 9.7 4.4 1.6 15.7 52.2 | exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

%WER 14.9 | 4459 42989 | 86.7 9.1 4.3 1.6 14.9 50.6 | exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 16.0 | 4459 42989 | 85.7 9.8 4.5 1.7 16.0 52.5 | exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

on ami ihm:
%WER 22.6 | 13098 94494 | 80.6 11.7 7.6 3.2 22.6 55.5 | -0.589 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.5 | 12643 89986 | 80.1 12.6 7.3 2.7 22.5 53.6 | -0.484 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

%WER 22.6 | 13098 94486 | 80.6 11.8 7.6 3.2 22.6 55.8 | -0.592 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.6 | 12643 89987 | 80.1 12.5 7.4 2.7 22.6 53.8 | -0.480 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

on tedium:
%WER 11.1 | 507 17783 | 90.5 6.8 2.7 1.6 11.1 80.7 | -0.251 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 10.6 | 507 17783 | 91.0 6.5 2.5 1.6 10.6 79.3 | -0.275 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.2 | 1155 27500 | 91.0 6.4 2.6 1.2 10.2 75.5 | -0.278 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.9 | 1155 27500 | 91.3 6.1 2.6 1.2 9.9 74.1 | -0.306 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

%WER 11.5 | 507 17783 | 90.2 7.1 2.8 1.7 11.5 79.9 | -0.317 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 11.2 | 507 17783 | 90.5 6.8 2.7 1.7 11.2 78.9 | -0.318 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.4 | 1155 27500 | 90.7 6.5 2.7 1.2 10.4 76.7 | -0.312 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 10.1 | 1155 27500 | 91.1 6.2 2.7 1.2 10.1 75.3 | -0.337 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

on babel Georgia:
%WER 46.3 | 19252 60586 | 57.4 32.1 10.6 3.7 46.3 31.7 | -1.297 | exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_dev10h.pem/score_12/penalty_1.0/dev10h.pem.ctm.sys

%WER 46.2 | 19252 60586 | 57.9 32.4 9.7 4.2 46.2 31.6 | -1.241 | exp/nnet3/lstm_bidirectional_sp/decode_dev10h.pem/score_12/penalty_0.5/dev10h.pem.ctm.sys

after the changes to nnet-training.cc were reviewed and confirmed, I will apply similar changes to nnet-chain-training.cc

vijayaditya · 2016-09-19T20:37:32Z

Do you have an idea how this affects TDNN or any other feed-forward network
?

On Mon, Sep 19, 2016 at 4:35 PM, Yiming Wang [email protected]
wrote:

I have tested "max-change per component" with the current thresholds (1.5
for final layer, and 0.75 for others) on swbd, ami ihm, tedium, and babel
georgian using BLSTM xent model. WERs of all of these experiments get
improved over the global-max-change baseline, although the llhoods get a
little worse. Specifically:
on swbd:
%WER 14.6 | 4459 42989 | 86.9 8.9 4.2 1.5 14.6 50.5 |
exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_
eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 15.7 | 4459 42989 | 85.9 9.7 4.4 1.6 15.7 52.2 |
exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_
eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

%WER 14.9 | 4459 42989 | 86.7 9.1 4.3 1.6 14.9 50.6 |
exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/
decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 16.0 | 4459 42989 | 85.7 9.8 4.5 1.7 16.0 52.5 |
exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/
decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

on ami ihm:
%WER 22.6 | 13098 94494 | 80.6 11.7 7.6 3.2 22.6 55.5 | -0.589 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.5 | 12643 89986 | 80.1 12.6 7.3 2.7 22.5 53.6 | -0.484 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

%WER 22.6 | 13098 94486 | 80.6 11.8 7.6 3.2 22.6 55.8 | -0.592 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.6 | 12643 89987 | 80.1 12.5 7.4 2.7 22.6 53.8 | -0.480 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

on tedium:
%WER 11.1 | 507 17783 | 90.5 6.8 2.7 1.6 11.1 80.7 | -0.251 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 10.6 | 507 17783 | 91.0 6.5 2.5 1.6 10.6 79.3 | -0.275 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.2 | 1155 27500 | 91.0 6.4 2.6 1.2 10.2 75.5 | -0.278 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.9 | 1155 27500 | 91.3 6.1 2.6 1.2 9.9 74.1 | -0.306 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

%WER 11.5 | 507 17783 | 90.2 7.1 2.8 1.7 11.5 79.9 | -0.317 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 11.2 | 507 17783 | 90.5 6.8 2.7 1.7 11.2 78.9 | -0.318 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.4 | 1155 27500 | 90.7 6.5 2.7 1.2 10.4 76.7 | -0.312 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 10.1 | 1155 27500 | 91.1 6.2 2.7 1.2 10.1 75.3 | -0.337 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

on babel Georgia:
%WER 46.3 | 19252 60586 | 57.4 32.1 10.6 3.7 46.3 31.7 | -1.297 |
exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_
dev10h.pem/score_12/penalty_1.0/dev10h.pem.ctm.sys
%WER 46.2 | 19252 60586 | 57.9 32.4 9.7 4.2 46.2 31.6 | -1.241 |
exp/nnet3/lstm_bidirectional_sp/decode_dev10h.pem/score_12/
penalty_0.5/dev10h.pem.ctm.sys

after the changes to nnet-training.cc were reviewed and confirmed, I will
apply similar changes to nnet-chain-training.cc

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoKiLeqVELVxmBAsv5c_n9UZBHtZWks5qrvH7gaJpZM4JUf8N
.

freewym · 2016-09-19T20:39:57Z

I only did tdnn comparison on swbd, max-change-per-component gets 0.1-0.2 improvement

vijayaditya · 2016-09-19T20:45:58Z

Could you check what changes when moving from global-max-change
to max-change-per-component ? e.g. You could see how gradient scaling
coefficients and the clipping proportions change.

--Vijay

On Mon, Sep 19, 2016 at 4:40 PM, Yiming Wang [email protected]
wrote:

I only did tdnn comparison on swbd, max-change-per-component gets 0.1-0.2
improvement

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoAezYo3EDXDec_1KVCA-4KGXmJcCks5qrvMggaJpZM4JUf8N
.

freewym · 2016-09-19T21:35:50Z

Usually BLstm{1|2|3}_{forward,backward}_W_c-xr hit the upper bound of max-change-per-componnet and get rescaled. Hard to tell from clipped-proportion.

danpovey · 2016-10-08T22:02:28Z

There are conflicts in this branch.
I'd like to get this max-change thing checked in sooner rather than later, as it seems like it will benefit stability as well as improving results slightly.

freewym · 2016-10-08T22:04:24Z

Will resolve it.

danpovey · 2016-10-11T18:39:50Z

src/nnet3/nnet-simple-component.cc


-  BaseFloat clipped_proportion =
-    (count_ > 0 ? static_cast<BaseFloat>(num_clipped_) / count_ : 0);
+  BaseFloat clipped_proportion = (count_ > 0 ?


do you mean to_update->count_ here?
And I don't see how this change is related to the rest of this PR.

freewym · 2016-10-11T18:43:23Z

It's irrelevant. I will restore this part in this PR and make it in a
separate PR.

On Tue, Oct 11, 2016 at 2:40 PM, Daniel Povey [email protected]
wrote:

@danpovey commented on this pull request.

In src/nnet3/nnet-simple-component.cc
#918 (review):
   RandUniform() > repair_probability)
 return;
KALDI_ASSERT(self_repair_target_ >= 0.0 && self_repair_scale_ > 0.0);

BaseFloat clipped_proportion =

(count_ > 0 ? static_cast(num_clipped_) / count_ : 0);

BaseFloat clipped_proportion = (count_ > 0 ?

do you mean to_update->count_ here?
And I don't see how this change is related to the rest of this PR.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADWAkqKkxZ-5B3_ep5kHSDHLKr_bZtmSks5qy9gAgaJpZM4JUf8N
.

Yiming Wang
Department of Computer Science
The Johns Hopkins University
3400 N. Charles St.
Baltimore, MD 21218

danpovey · 2016-10-11T18:40:22Z

src/nnet3/nnet-training.cc

    bool is_gradient = false;  // setting this to true would disable the
                               // natural-gradient updates.
    SetZero(is_gradient, delta_nnet_);
+    const int32 num_ucs = NumUpdatableComponents(*delta_nnet_);


call this num_updatable

danpovey · 2016-10-11T18:43:51Z

src/nnet3/nnet-training.cc

+    }
+  }
+  KALDI_ASSERT(i == scale_factors.Dim());
+  param_delta = std::sqrt(param_delta);


it's not a good practice to change variables in-place like this because it confuses the meaning of a variables.
Better to rename param_delta above this to param_delta_squared, and have BaseFloat param_delta = std::sqrt(param_delta_squared);

danpovey · 2016-10-11T18:44:10Z

src/nnet3/nnet-training.cc

+    param_delta *= scale;
+    if (param_delta > config_.max_param_change) {
+      if (param_delta - param_delta != 0.0) {
+        KALDI_WARN << "Infinite parameter change, will not apply.";


might be good to mention the component name here.

It is the "global delta".

danpovey · 2016-10-11T18:47:27Z

src/nnet3/nnet-training.cc

+      } else {
+        scale *= config_.max_param_change / param_delta;
+        num_max_changes_global_applied_++;
+        KALDI_LOG << "Parameters change too big: " << param_delta << " > "


I think you should rework how the log messages are printed. Here you print out a log message per minibatch if you apply the global max change, but not if you apply the per-component max change. If the quota of verboseness is one log message of reasonable length per minibatch, then you could do a better job of compressing more information in that. How about you print out one message if any max-change was applied, and find a way to neatly summarize how many components had a per-component max-change limit applied (and maybe print the component-name of the one that had the smallest scale applied, together with its max-change value), and also say what the global max-change limit was, if any, and what the global max-change was.

danpovey · 2016-10-11T18:47:53Z

src/nnet3/nnet-training.cc

    ans = ans || info.PrintTotalStats(name);
  }
+  if (delta_nnet_ != NULL)
+    PrintMaxChangesStats();


rename to PrintMaxChangeStats()

danpovey · 2016-10-11T18:48:16Z

src/nnet3/nnet-training.h

+  /// Applies per-component max-changes and global max-change to all updatable
+  /// components in *delta_nnet_, and use *delta_nnet_ to update parameters
+  /// in *nnet_.
+  void UpdateParamsWithMaxChanges();


Rename Changes() -> Change()

danpovey · 2016-10-11T18:48:31Z

src/nnet3/nnet-training.h

  int32 num_minibatches_processed_;

+  // stats for max-changes.
+  std::vector<int32> num_max_changes_per_component_applied_;


rename changes->change in these 2 variables

freewym · 2016-10-12T01:18:03Z

src/nnet3/nnet-training.cc

+      KALDI_LOG << "Parameters change too big: " << param_delta << " > "
+                << "--max-param-change=" << config_.max_param_change
+                << ", scaling by " << config_.max_param_change / param_delta;
+  }


@danpovey This is now the way how the log message prints out. I decided to split it in two lines since a single line seems too long to include these info. How do you think of the log message?

danpovey · 2016-10-12T01:24:11Z

appied->applied.

How about this:
If both types are applied, say:
Per-component max-change active on 5 / 25 updatable components (smallest
factor = 0.943 on Lstm.forward_c_x with max-change=1.0); global max-change
factor was 0.82 with max-change=2.0.
If just the global one was applied then just print
Global max-change factor was 0.82 with max-change=2.0.
and if just the per-component one, then
Per-component max-change active on 5 / 25 updatable components (smallest
factor = 0.943 on Lstm.forward_c_x with max-change=1.0)
I know it looks almost the same as printing two lines, but remember that
the preamble with the line number is quite long so it makes a difference to
the log file size.

Dan

On Tue, Oct 11, 2016 at 9:18 PM, Yiming Wang [email protected]
wrote:

@freewym commented on this pull request.

In src/nnet3/nnet-training.cc
#918 (review):
   }
 }
}
if ((config_.max_param_change != 0.0 &&
 param_delta > config_.max_param_change &&
 param_delta - param_delta == 0.0) || min_scale < 1.0) {
if (min_scale < 1.0)
 KALDI_LOG << "Per-component max-change is applied on "
     << num_max_change_per_component_applied_per_minibatch
     << "/" << scale_factors.Dim() << " Updatable Components. "
     << "The smallest scaling factor " << min_scale << " is appied on "
     << component_name_with_min_scale
     << " with max-change=" << max_change_with_min_scale;
if (param_delta > config_.max_param_change)
 KALDI_LOG << "Parameters change too big: " << param_delta << " > "
           << "--max-param-change=" << config_.max_param_change
           << ", scaling by " << config_.max_param_change / param_delta;
}
@danpovey https://github.com/danpovey This is now the way how the log
message prints out. I decided to split it in two lines since a single line
seems too long to include these info. How do you think of the log message?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu38pwPuKSMiYwe7Z85NmO3biG_srks5qzDVOgaJpZM4JUf8N
.

freewym · 2016-10-17T18:24:23Z

I tested the max-change-per-component on chain training using tdnn and blstm on swbd. And they gave almost the same results as global max-change.

danpovey · 2016-10-17T18:32:13Z

egs/swbd/s5c/RESULTS


-# bidirectional LSTM with the same configuration as the above experiment, plus self-repair of all nonliearities and clipgradient activated
+# bidirectional LSTM with the same configuration as the above experiment, with self-repair of all nonliearities and clipgradient, and max-change-per-component activated
+%WER 10.2 | 1831 21395 | 90.8 6.1 3.2 1.0 10.2 44.4 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_fsh_fg/score_11_0.0/eval2000_hires.ctm.swbd.filt.sys


please make sure the results are in the same order as those above, for easy comparison (i.e. whole-test-set firs).

danpovey · 2016-10-17T18:32:59Z

egs/swbd/s5c/local/chain/tuning/run_blstm_6i.sh

    --recurrent-projection-dim 256 \
    --non-recurrent-projection-dim 256 \
    --label-delay $label_delay \
+    --self-repair-scale-clipgradient 1.0 \


is this the default, and how does this relate to your other changs?

1.0 is the default value set for self-repair-scale-clipgradient in make_configs.py. Here I just explicitly tell the user there is such option.

danpovey · 2016-10-17T18:34:13Z

egs/wsj/s5/steps/nnet3/components.py

            'dimension': input['dimension']}

-def AddAffineLayer(config_lines, name, input, output_dim, ng_affine_options = ""):
+def AddAffineLayer(config_lines, name, input, output_dim, ng_affine_options = "", max_change_per_component = 0.25):


I notice that in different parts of the code, for different layer types, you have different values for max_change_per_component. Can you give me some idea of how you tuned this? Was this based just on WER, or did you also look at the diagnostics?

All max-change value is 0.75 for non-fianl layers and 1.5 for the final layer. I need to change those defaults in components.py the same as those in make_configs.py.

All max-change-per-componnet values have been set to 0.75 (for non-final layers ) / 1.5 (for final layers)

danpovey · 2016-10-25T22:10:38Z

I'm close to merging this, but there is one small thing to fix. In the training setups there is code like this:

  if (opts.nnet_config.momentum == 0.0 &&
      opts.nnet_config.max_param_change == 0.0) {
    delta_nnet_= NULL;
  } else {

and this wouldn't do the right thing if the global max-change is zero but the individual max-changes are zero.
In any case the original code (before this PR) wouldn't have worked right for things like BLSTMs if we set delta_nnet_ to NULL, because the parameters change and then you do the backward pass. I think it's best to make both training setups so that they always set delta_nnet_. You can then remove all checks like if (delta_nnet_ != NULL) in the code. Once you do this I'll merge.

freewym · 2016-10-26T06:14:08Z

Will squash after the last commit is reviewed and confirmed.

danpovey · 2016-10-26T15:39:25Z

Looks OK, but no need to squash, I am using github's 'squash and merge'
button.
I will merge to-morrow though.

On Wed, Oct 26, 2016 at 2:14 AM, Yiming Wang [email protected]
wrote:

Will squash after the last commit is reviewed and confirmed.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuyVmQdS0ovNG-Pg6QScXG0lLNVZaks5q3u-0gaJpZM4JUf8N
.

danpovey · 2016-10-26T18:57:09Z

Yiming, actually before I merge this I want you to provide a mechanism whereby people can make it equivalent to the old models, in case they are in the middle of experiments and want it back-compatible. You can add a --max-change-per-component and --max-change-per-component-final option to the top-level make_configs.py scripts. Then people can set these to zero if they want to disable the max-change.

vijayaditya reviewed Jul 29, 2016
View reviewed changes

freewym force-pushed the max_change_per_component branch 2 times, most recently from c13e553 to 675d726 Compare August 15, 2016 19:55

freewym force-pushed the max_change_per_component branch from 675d726 to e417fce Compare August 26, 2016 19:42

danpovey reviewed Aug 27, 2016
View reviewed changes

freewym force-pushed the max_change_per_component branch from 118f18a to 59fb914 Compare October 9, 2016 20:31

danpovey reviewed Oct 11, 2016

View reviewed changes

freewym force-pushed the max_change_per_component branch 2 times, most recently from b2671d0 to e3687dc Compare October 12, 2016 01:05

freewym commented Oct 12, 2016

View reviewed changes

freewym force-pushed the max_change_per_component branch from e3687dc to a63dd7a Compare October 12, 2016 01:55

add back max-change-per-component

11e840f

freewym added 8 commits October 17, 2016 14:16

update blstm results on swbd

30420c6

apply max-change per component in trainer code

33351ce

changes(to be tested)

5a39d12

fix

c2d316b

fix

10fe210

changes based on comments

6ce191f

applied the same changes to chain training

3c6c728

fix

7c9eeb7

freewym force-pushed the max_change_per_component branch from 73ce4c9 to 7c9eeb7 Compare October 17, 2016 18:20

danpovey reviewed Oct 17, 2016

View reviewed changes

freewym added 2 commits October 17, 2016 15:09

fix2

ca130ce

reorder resutls

21cf53f

make delta_nnet_ always != NULL

9b31ad7

added max-change-per-component options to allow them configurable

bf7ba54

danpovey merged commit 6d66e72 into kaldi-asr:master Oct 26, 2016

freewym deleted the max_change_per_component branch November 1, 2016 23:36

Max change per component #918

Max change per component #918

Uh oh!

Conversation

freewym commented Jul 25, 2016

Uh oh!

danpovey commented Jul 25, 2016

Uh oh!

vijayaditya commented Jul 25, 2016

Uh oh!

vijayaditya commented Jul 29, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Aug 27, 2016

Uh oh!

freewym commented Aug 28, 2016

Uh oh!

vijayaditya commented Sep 15, 2016

Uh oh!

freewym commented Sep 15, 2016

Uh oh!

freewym commented Sep 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vijayaditya commented Sep 19, 2016

Uh oh!

freewym commented Sep 19, 2016

Uh oh!

vijayaditya commented Sep 19, 2016

Uh oh!

freewym commented Sep 19, 2016

Uh oh!

danpovey commented Oct 8, 2016

Uh oh!

freewym commented Oct 8, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

freewym commented Oct 11, 2016

@danpovey commented on this pull request.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Oct 12, 2016

@freewym commented on this pull request.

Uh oh!

freewym commented Oct 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danpovey commented Oct 25, 2016

freewym commented Sep 19, 2016 •

edited

Loading