Skip to content

Conversation

@freewym
Copy link
Contributor

@freewym freewym commented Dec 2, 2016

…nent to 15; updated the results on AMI

BaseFloat clipping_threshold = 15.0;
BaseFloat zeroing_threshold = 2.0;
BaseFloat clipping_threshold = 30.0;
BaseFloat zeroing_threshold = 15.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Larger values of these quantities are more dangerous, i.e. more likely to lead to instability.
I don't think it's sufficient to just test this on one setup, because it's the potential for divergence that this is supposed to guard against. Have you done any other tests?

@danpovey
Copy link
Contributor

danpovey commented Dec 2, 2016 via email

@freewym
Copy link
Contributor Author

freewym commented Dec 2, 2016

This PR is already on top of fast_lstm.

The old WERs reported in RESULTS are obtained without zeroing (i.e. using ClipGradientComponent as the comment said). The results of tuning zeroing-threshold on ihm are:
threshold WER dev|eval
4.0 22.7|22.8
5.0 22.4|22.6
6.0 22.4|22.7
10.0 22.5|22.6
15.0 22.4|22.4

I have not tuned it on sdm1

After the fix of max-deriv-time, the gradient explosion did not happen even on the babel georgian multicondition data (which had the most severe problem before the fix): when I disabled the zeroing, the clipped-proportion is at most ~0.004.

I also tuned the zeroing-threshold on swbd blstm_6i:
threshold WER fg|tg
5.0 14.5|15.8
6.0 14.4|15.8
10.0 14.4|15.8
15.0 14.2|15.6

The reason I chose 15.0 as threshold rather than 5 or 10 is mainly based on the WER on swbd. Not sure how much variation there could be for different runs with the same settings

@danpovey
Copy link
Contributor

danpovey commented Dec 2, 2016

OK I will think about it.

@freewym
Copy link
Contributor Author

freewym commented Dec 2, 2016

FYI, added the zeroed-proportion stats of the 1st layer at the last iteration, which shows how often zeroing was activated:

ami ihm
threshold zeroed-prop
4.0 0.06
5.0 0.045
6.0 0.035
10.0 0.015
15.0 0.006

swbd
threshold zeroed-prop
5.0 0.018
6.0 0.014
10.0 0.003
15.0 0.0007

@danpovey
Copy link
Contributor

danpovey commented Dec 2, 2016

OK, I'll merge this.

@danpovey danpovey merged commit 1cab8bd into kaldi-asr:fast_lstm Dec 2, 2016
@freewym freewym deleted the fast_lstm branch December 2, 2016 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants