Skip to content

Conversation

@freewym
Copy link
Contributor

@freewym freewym commented Jul 25, 2016

According to the commit:
9569e57

@danpovey
Copy link
Contributor

Vijay, please review this..

On Mon, Jul 25, 2016 at 1:12 PM, Yiming Wang [email protected]
wrote:

According to the commit:
9569e57

9569e57

You can view, comment on, or merge this pull request online at:

#918
Commit Summary

  • add back max-change-per-component
  • update blstm results on swbd

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#918, or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVuxIzq1FcHsA6MITZ64spsGMIra5tks5qZRiTgaJpZM4JUf8N
.

@vijayaditya
Copy link
Contributor

OK will do.

@vijayaditya
Copy link
Contributor

Starting the review.

%WER 16.0 | 4459 42989 | 85.6 9.9 4.5 1.6 16.0 52.7 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 19.6 | 2628 21594 | 82.5 12.1 5.5 2.1 19.6 54.8 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.callhm.filt.sys
%WER 20.7 | 2628 21594 | 81.4 12.9 5.7 2.2 20.7 56.8 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.callhm.filt.sys
# bidirectional LSTM with the same configuration as the above experiment, with self-repair of all nonliearities and clipgradient, and max-change-per-component activated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please preserve these results somewhere, may be in the actual script next to the max-change option or at the bottom of the results file.

@freewym freewym force-pushed the max_change_per_component branch 2 times, most recently from c13e553 to 675d726 Compare August 15, 2016 19:55
@freewym freewym force-pushed the max_change_per_component branch from 675d726 to e417fce Compare August 26, 2016 19:42
<< " change too big: " << std::sqrt(dot_prod) << " > "
<< "--max-change=" << max_param_change_per_comp
<< ", scaling by " << scale_factors(i);
} else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

google style guides mandates braces on else if 'if' had braces.

@danpovey
Copy link
Contributor

@freewym, what is the status of this commit? Are we still waiting on experiments?

@freewym
Copy link
Contributor Author

freewym commented Aug 28, 2016

Yes. Once the babel jobs are complete, I will resume the experiments.

@vijayaditya
Copy link
Contributor

@freewym Is this ready for review ?

@freewym
Copy link
Contributor Author

freewym commented Sep 15, 2016

Yes, but I am still running experiments on different dataset trying to
figure out the best value of max changes. The current one works fine
for the experiments
I ran so far.

On Thursday, September 15, 2016, Vijayaditya Peddinti <
[email protected]> wrote:

@freewym https://github.com/freewym Is this ready for review ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADWAklZAAR7cOUPEIkbfiSKuV3P6nY2rks5qqcb1gaJpZM4JUf8N
.

Sent from my iPhone

@freewym
Copy link
Contributor Author

freewym commented Sep 19, 2016

I have tested "max-change per component" with the current thresholds (1.5 for final layer, and 0.75 for others) on swbd, ami ihm, tedium, and babel georgian using BLSTM xent model. WERs of all of these experiments get improved over the global-max-change baseline or remain the same, although the llhoods get a little worse. Specifically:
on swbd:
%WER 14.6 | 4459 42989 | 86.9 8.9 4.2 1.5 14.6 50.5 | exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 15.7 | 4459 42989 | 85.9 9.7 4.4 1.6 15.7 52.2 | exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

%WER 14.9 | 4459 42989 | 86.7 9.1 4.3 1.6 14.9 50.6 | exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 16.0 | 4459 42989 | 85.7 9.8 4.5 1.7 16.0 52.5 | exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

on ami ihm:
%WER 22.6 | 13098 94494 | 80.6 11.7 7.6 3.2 22.6 55.5 | -0.589 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.5 | 12643 89986 | 80.1 12.6 7.3 2.7 22.5 53.6 | -0.484 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

%WER 22.6 | 13098 94486 | 80.6 11.8 7.6 3.2 22.6 55.8 | -0.592 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.6 | 12643 89987 | 80.1 12.5 7.4 2.7 22.6 53.8 | -0.480 | exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

on tedium:
%WER 11.1 | 507 17783 | 90.5 6.8 2.7 1.6 11.1 80.7 | -0.251 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 10.6 | 507 17783 | 91.0 6.5 2.5 1.6 10.6 79.3 | -0.275 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.2 | 1155 27500 | 91.0 6.4 2.6 1.2 10.2 75.5 | -0.278 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.9 | 1155 27500 | 91.3 6.1 2.6 1.2 9.9 74.1 | -0.306 | exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

%WER 11.5 | 507 17783 | 90.2 7.1 2.8 1.7 11.5 79.9 | -0.317 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 11.2 | 507 17783 | 90.5 6.8 2.7 1.7 11.2 78.9 | -0.318 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.4 | 1155 27500 | 90.7 6.5 2.7 1.2 10.4 76.7 | -0.312 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 10.1 | 1155 27500 | 91.1 6.2 2.7 1.2 10.1 75.3 | -0.337 | exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

on babel Georgia:
%WER 46.3 | 19252 60586 | 57.4 32.1 10.6 3.7 46.3 31.7 | -1.297 | exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_dev10h.pem/score_12/penalty_1.0/dev10h.pem.ctm.sys

%WER 46.2 | 19252 60586 | 57.9 32.4 9.7 4.2 46.2 31.6 | -1.241 | exp/nnet3/lstm_bidirectional_sp/decode_dev10h.pem/score_12/penalty_0.5/dev10h.pem.ctm.sys

after the changes to nnet-training.cc were reviewed and confirmed, I will apply similar changes to nnet-chain-training.cc

@vijayaditya
Copy link
Contributor

Do you have an idea how this affects TDNN or any other feed-forward network
?

On Mon, Sep 19, 2016 at 4:35 PM, Yiming Wang [email protected]
wrote:

I have tested "max-change per component" with the current thresholds (1.5
for final layer, and 0.75 for others) on swbd, ami ihm, tedium, and babel
georgian using BLSTM xent model. WERs of all of these experiments get
improved over the global-max-change baseline, although the llhoods get a
little worse. Specifically:
on swbd:
%WER 14.6 | 4459 42989 | 86.9 8.9 4.2 1.5 14.6 50.5 |
exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_
eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 15.7 | 4459 42989 | 85.9 9.7 4.4 1.6 15.7 52.2 |
exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_
eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

%WER 14.9 | 4459 42989 | 86.7 9.1 4.3 1.6 14.9 50.6 |
exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/
decode_eval2000_sw1_fsh_fg/score_10_0.0/eval2000_hires.ctm.filt.sys
%WER 16.0 | 4459 42989 | 85.7 9.8 4.5 1.7 16.0 52.5 |
exp/nnet3/lstm_bidirectional_maxchange_test_nomax_sp/
decode_eval2000_sw1_tg/score_10_0.0/eval2000_hires.ctm.filt.sys

on ami ihm:
%WER 22.6 | 13098 94494 | 80.6 11.7 7.6 3.2 22.6 55.5 | -0.589 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.5 | 12643 89986 | 80.1 12.6 7.3 2.7 22.5 53.6 | -0.484 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

%WER 22.6 | 13098 94486 | 80.6 11.8 7.6 3.2 22.6 55.8 | -0.592 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_dev/ascore_10/dev_hires.ctm.filt.sys
%WER 22.6 | 12643 89987 | 80.1 12.5 7.4 2.7 22.6 53.8 | -0.480 |
exp/ihm/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_eval/ascore_10/eval_hires.ctm.filt.sys

on tedium:
%WER 11.1 | 507 17783 | 90.5 6.8 2.7 1.6 11.1 80.7 | -0.251 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 10.6 | 507 17783 | 91.0 6.5 2.5 1.6 10.6 79.3 | -0.275 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.2 | 1155 27500 | 91.0 6.4 2.6 1.2 10.2 75.5 | -0.278 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 9.9 | 1155 27500 | 91.3 6.1 2.6 1.2 9.9 74.1 | -0.306 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange0.75_
sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

%WER 11.5 | 507 17783 | 90.2 7.1 2.8 1.7 11.5 79.9 | -0.317 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_dev/score_10_0.0/ctm.filt.filt.sys
%WER 11.2 | 507 17783 | 90.5 6.8 2.7 1.7 11.2 78.9 | -0.318 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_dev_rescore/score_10_0.0/ctm.filt.filt.sys
%WER 10.4 | 1155 27500 | 90.7 6.5 2.7 1.2 10.4 76.7 | -0.312 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_test/score_10_0.0/ctm.filt.filt.sys
%WER 10.1 | 1155 27500 | 91.1 6.2 2.7 1.2 10.1 75.3 | -0.337 |
exp/nnet3_cleaned/lstm_bidirectional_maxchange_nomax_
sp/decode_test_rescore/score_10_0.0/ctm.filt.filt.sys

on babel Georgia:
%WER 46.3 | 19252 60586 | 57.4 32.1 10.6 3.7 46.3 31.7 | -1.297 |
exp/nnet3/lstm_bidirectional_maxchange0.75_sp/decode_
dev10h.pem/score_12/penalty_1.0/dev10h.pem.ctm.sys
%WER 46.2 | 19252 60586 | 57.9 32.4 9.7 4.2 46.2 31.6 | -1.241 |
exp/nnet3/lstm_bidirectional_sp/decode_dev10h.pem/score_12/
penalty_0.5/dev10h.pem.ctm.sys

after the changes to nnet-training.cc were reviewed and confirmed, I will
apply similar changes to nnet-chain-training.cc


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoKiLeqVELVxmBAsv5c_n9UZBHtZWks5qrvH7gaJpZM4JUf8N
.

@freewym
Copy link
Contributor Author

freewym commented Sep 19, 2016

I only did tdnn comparison on swbd, max-change-per-component gets 0.1-0.2 improvement

@vijayaditya
Copy link
Contributor

Could you check what changes when moving from global-max-change
to max-change-per-component ? e.g. You could see how gradient scaling
coefficients and the clipping proportions change.

--Vijay

On Mon, Sep 19, 2016 at 4:40 PM, Yiming Wang [email protected]
wrote:

I only did tdnn comparison on swbd, max-change-per-component gets 0.1-0.2
improvement


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADtwoAezYo3EDXDec_1KVCA-4KGXmJcCks5qrvMggaJpZM4JUf8N
.

@freewym
Copy link
Contributor Author

freewym commented Sep 19, 2016

Usually BLstm{1|2|3}_{forward,backward}_W_c-xr hit the upper bound of max-change-per-componnet and get rescaled. Hard to tell from clipped-proportion.

@danpovey
Copy link
Contributor

danpovey commented Oct 8, 2016

There are conflicts in this branch.
I'd like to get this max-change thing checked in sooner rather than later, as it seems like it will benefit stability as well as improving results slightly.

@freewym
Copy link
Contributor Author

freewym commented Oct 8, 2016

Will resolve it.

@freewym freewym force-pushed the max_change_per_component branch from 118f18a to 59fb914 Compare October 9, 2016 20:31

BaseFloat clipped_proportion =
(count_ > 0 ? static_cast<BaseFloat>(num_clipped_) / count_ : 0);
BaseFloat clipped_proportion = (count_ > 0 ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean to_update->count_ here?
And I don't see how this change is related to the rest of this PR.

@freewym
Copy link
Contributor Author

freewym commented Oct 11, 2016

It's irrelevant. I will restore this part in this PR and make it in a
separate PR.

On Tue, Oct 11, 2016 at 2:40 PM, Daniel Povey [email protected]
wrote:

@danpovey commented on this pull request.

In src/nnet3/nnet-simple-component.cc
#918 (review):

   RandUniform() > repair_probability)
 return;

KALDI_ASSERT(self_repair_target_ >= 0.0 && self_repair_scale_ > 0.0);

  • BaseFloat clipped_proportion =
  • (count_ > 0 ? static_cast(num_clipped_) / count_ : 0);
  • BaseFloat clipped_proportion = (count_ > 0 ?

do you mean to_update->count_ here?
And I don't see how this change is related to the rest of this PR.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADWAkqKkxZ-5B3_ep5kHSDHLKr_bZtmSks5qy9gAgaJpZM4JUf8N
.

Yiming Wang
Department of Computer Science
The Johns Hopkins University
3400 N. Charles St.
Baltimore, MD 21218

bool is_gradient = false; // setting this to true would disable the
// natural-gradient updates.
SetZero(is_gradient, delta_nnet_);
const int32 num_ucs = NumUpdatableComponents(*delta_nnet_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this num_updatable

}
}
KALDI_ASSERT(i == scale_factors.Dim());
param_delta = std::sqrt(param_delta);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a good practice to change variables in-place like this because it confuses the meaning of a variables.
Better to rename param_delta above this to param_delta_squared, and have BaseFloat param_delta = std::sqrt(param_delta_squared);

param_delta *= scale;
if (param_delta > config_.max_param_change) {
if (param_delta - param_delta != 0.0) {
KALDI_WARN << "Infinite parameter change, will not apply.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to mention the component name here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the "global delta".

} else {
scale *= config_.max_param_change / param_delta;
num_max_changes_global_applied_++;
KALDI_LOG << "Parameters change too big: " << param_delta << " > "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should rework how the log messages are printed. Here you print out a log message per minibatch if you apply the global max change, but not if you apply the per-component max change. If the quota of verboseness is one log message of reasonable length per minibatch, then you could do a better job of compressing more information in that. How about you print out one message if any max-change was applied, and find a way to neatly summarize how many components had a per-component max-change limit applied (and maybe print the component-name of the one that had the smallest scale applied, together with its max-change value), and also say what the global max-change limit was, if any, and what the global max-change was.

ans = ans || info.PrintTotalStats(name);
}
if (delta_nnet_ != NULL)
PrintMaxChangesStats();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to PrintMaxChangeStats()

/// Applies per-component max-changes and global max-change to all updatable
/// components in *delta_nnet_, and use *delta_nnet_ to update parameters
/// in *nnet_.
void UpdateParamsWithMaxChanges();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename Changes() -> Change()

int32 num_minibatches_processed_;

// stats for max-changes.
std::vector<int32> num_max_changes_per_component_applied_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename changes->change in these 2 variables

@freewym freewym force-pushed the max_change_per_component branch 2 times, most recently from b2671d0 to e3687dc Compare October 12, 2016 01:05
KALDI_LOG << "Parameters change too big: " << param_delta << " > "
<< "--max-param-change=" << config_.max_param_change
<< ", scaling by " << config_.max_param_change / param_delta;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danpovey This is now the way how the log message prints out. I decided to split it in two lines since a single line seems too long to include these info. How do you think of the log message?

@danpovey
Copy link
Contributor

appied->applied.

How about this:
If both types are applied, say:
Per-component max-change active on 5 / 25 updatable components (smallest
factor = 0.943 on Lstm.forward_c_x with max-change=1.0); global max-change
factor was 0.82 with max-change=2.0.
If just the global one was applied then just print
Global max-change factor was 0.82 with max-change=2.0.
and if just the per-component one, then
Per-component max-change active on 5 / 25 updatable components (smallest
factor = 0.943 on Lstm.forward_c_x with max-change=1.0)
I know it looks almost the same as printing two lines, but remember that
the preamble with the line number is quite long so it makes a difference to
the log file size.

Dan

On Tue, Oct 11, 2016 at 9:18 PM, Yiming Wang [email protected]
wrote:

@freewym commented on this pull request.

In src/nnet3/nnet-training.cc
#918 (review):

   }
 }

}

  • if ((config_.max_param_change != 0.0 &&
  •  param_delta > config_.max_param_change &&
    
  •  param_delta - param_delta == 0.0) || min_scale < 1.0) {
    
  • if (min_scale < 1.0)
  •  KALDI_LOG << "Per-component max-change is applied on "
    
  •      << num_max_change_per_component_applied_per_minibatch
    
  •      << "/" << scale_factors.Dim() << " Updatable Components. "
    
  •      << "The smallest scaling factor " << min_scale << " is appied on "
    
  •      << component_name_with_min_scale
    
  •      << " with max-change=" << max_change_with_min_scale;
    
  • if (param_delta > config_.max_param_change)
  •  KALDI_LOG << "Parameters change too big: " << param_delta << " > "
    
  •            << "--max-param-change=" << config_.max_param_change
    
  •            << ", scaling by " << config_.max_param_change / param_delta;
    
  • }

@danpovey https://github.com/danpovey This is now the way how the log
message prints out. I decided to split it in two lines since a single line
seems too long to include these info. How do you think of the log message?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu38pwPuKSMiYwe7Z85NmO3biG_srks5qzDVOgaJpZM4JUf8N
.

@freewym freewym force-pushed the max_change_per_component branch from e3687dc to a63dd7a Compare October 12, 2016 01:55
@freewym freewym force-pushed the max_change_per_component branch from 73ce4c9 to 7c9eeb7 Compare October 17, 2016 18:20
@freewym
Copy link
Contributor Author

freewym commented Oct 17, 2016

I tested the max-change-per-component on chain training using tdnn and blstm on swbd. And they gave almost the same results as global max-change.


# bidirectional LSTM with the same configuration as the above experiment, plus self-repair of all nonliearities and clipgradient activated
# bidirectional LSTM with the same configuration as the above experiment, with self-repair of all nonliearities and clipgradient, and max-change-per-component activated
%WER 10.2 | 1831 21395 | 90.8 6.1 3.2 1.0 10.2 44.4 | exp/nnet3/lstm_bidirectional_sp/decode_eval2000_sw1_fsh_fg/score_11_0.0/eval2000_hires.ctm.swbd.filt.sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please make sure the results are in the same order as those above, for easy comparison (i.e. whole-test-set firs).

--recurrent-projection-dim 256 \
--non-recurrent-projection-dim 256 \
--label-delay $label_delay \
--self-repair-scale-clipgradient 1.0 \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the default, and how does this relate to your other changs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.0 is the default value set for self-repair-scale-clipgradient in make_configs.py. Here I just explicitly tell the user there is such option.

'dimension': input['dimension']}

def AddAffineLayer(config_lines, name, input, output_dim, ng_affine_options = ""):
def AddAffineLayer(config_lines, name, input, output_dim, ng_affine_options = "", max_change_per_component = 0.25):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that in different parts of the code, for different layer types, you have different values for max_change_per_component. Can you give me some idea of how you tuned this? Was this based just on WER, or did you also look at the diagnostics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All max-change value is 0.75 for non-fianl layers and 1.5 for the final layer. I need to change those defaults in components.py the same as those in make_configs.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All max-change-per-componnet values have been set to 0.75 (for non-final layers ) / 1.5 (for final layers)

@danpovey
Copy link
Contributor

I'm close to merging this, but there is one small thing to fix. In the training setups there is code like this:

  if (opts.nnet_config.momentum == 0.0 &&
      opts.nnet_config.max_param_change == 0.0) {
    delta_nnet_= NULL;
  } else {

and this wouldn't do the right thing if the global max-change is zero but the individual max-changes are zero.
In any case the original code (before this PR) wouldn't have worked right for things like BLSTMs if we set delta_nnet_ to NULL, because the parameters change and then you do the backward pass. I think it's best to make both training setups so that they always set delta_nnet_. You can then remove all checks like if (delta_nnet_ != NULL) in the code. Once you do this I'll merge.

@freewym
Copy link
Contributor Author

freewym commented Oct 26, 2016

Will squash after the last commit is reviewed and confirmed.

@danpovey
Copy link
Contributor

Looks OK, but no need to squash, I am using github's 'squash and merge'
button.
I will merge to-morrow though.

On Wed, Oct 26, 2016 at 2:14 AM, Yiming Wang [email protected]
wrote:

Will squash after the last commit is reviewed and confirmed.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#918 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVuyVmQdS0ovNG-Pg6QScXG0lLNVZaks5q3u-0gaJpZM4JUf8N
.

@danpovey
Copy link
Contributor

Yiming, actually before I merge this I want you to provide a mechanism whereby people can make it equivalent to the old models, in case they are in the middle of experiments and want it back-compatible. You can add a --max-change-per-component and --max-change-per-component-final option to the top-level make_configs.py scripts. Then people can set these to zero if they want to disable the max-change.

@danpovey danpovey merged commit 6d66e72 into kaldi-asr:master Oct 26, 2016
@freewym freewym deleted the max_change_per_component branch November 1, 2016 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants