[src,scripts,egs] Implement and tune resnets. by danpovey · Pull Request #1620 · kaldi-asr/kaldi

danpovey · 2017-05-13T03:41:25Z

No description provided.

danpovey · 2017-05-13T03:54:33Z

@freewym, this (run_resnet_1b.sh) is probably the setup we want to use to test out backstitch, since it has quite competitive results: we are below 5% error (4.8%) in CIFAR-10 with only 1.3 million parameters and 16 convolutional layers, and I'm not aware of any other systems below 5% error that are anywhere near so small or so shallow. The CIFAR-100 error is 24%.

Interestingly, in our setup I did not see any benefit from per-channel mean subtraction, which is normally done on CIFAR. Perhaps the natural gradient makes it unnecessary.
In the PR #1613, there is a binary 'image-preprocess.cc' (https://github.com/kaldi-asr/kaldi/pull/1613/files#diff-3c933de8236d0eab9bcfc480733bc2a1) which can do various types of preprocessing such as this one; and the version of get_egs.sh there is modified to support its use; but I'm not checking that in here as I didn't see any benefit from any form of preprocessing.

freewym · 2017-05-13T03:59:07Z

Great! I will test based on this PR.

danpovey · 2017-05-13T04:08:21Z

I'm going to merge this now, since the checks completed.

@hhadian, there are some things that might improve this, that it would be helpful if you could test them out. But start from 1a for most tuning experiments, as it's faster than 1b (having fewer epochs).

I'm pretty sure a larger model would be better. E.g. try increasing $nf1 from 48 to 64. A much larger model would probably be quite a bit better, but let's not go overboard.
The --proportional-shrink value, which is similar to l2 regularization, is a critical tuning value, and the 50.0 is only tuned to within a factor of 2. 25.0 was worse, and 100.0 was a lot worse. So maybe try 40.0 and see if it's better.
I'd like to confirm that natural gradient helps -> try disabling it.
I suspect that setting alpha-out=2.0 (default is 4.0) for the natural gradient, would be better; can you please try that? It might affect the optimization speed more than the results.
The config generation doesn't support this yet, but IIRC in the wide-resnet paper they put dropout in the middle of the res-block, with probability 0.3, and it was helpful. So that might be worth a try.
Can you please try a similar setup on SVHN?

danpovey · 2017-05-13T04:19:45Z

Oh, @hhadian, one more thing.
In this CIFAR setup, the final combination seems to be important, it gives a couple of percent improvement. This is surprising, as for speech tasks it was never really critical. Because there are a lot of coefficients to estimate (15 * num-layers) and the training error is very small (and we estimate the coefficients on the training data), I'm concerned that there may be a lot of noise in the combination-parameter estimates, which could be degrading results a little. So can you please:

Modify image/nnet3/get_egs.sh so that it takes an option combine_subset_egs, defaulting to 5k, and instead of just making the combine subset the same as the train-diagnostic subset, use the tail not the head to make the combine.egs use different data (this will make the training diagnostics more meaningful after combination).
Run an experiment with a larger --combine-subset-egs value, e.g. 25k. It's already very slow with 5k (takes at least half an hour with the resnets I'm currently using), so this will be extremely slow, but it's just to see if it helps.

I am thinking of using importance sampling and per-example weighting to speed up both the training and the final combination, since I believe the derivative magnitudes in different examples will be extremely different, based on how easy different examples are to classify. But that is something we can definitely look at later on, not now-- i.e. after the NIPS deadline.

hhadian · 2017-05-13T04:39:17Z

Will do all

danpovey · 2017-05-13T20:10:16Z

And one more thing... it would be nice to know whether the self-repair makes any difference in this setup. So please try setting the self-repair-scale to 0.0 and see if it affects performance.

hhadian · 2017-05-13T20:15:37Z

OK will try it

hhadian · 2017-05-17T16:02:31Z

These are the results:

# System               baseline(1a)  bigger_1stlayer bigger_2nd pshrink40   pshrink55   noNG_2jobs   alpha2.0
# final test accuracy:      0.949        0.9481      0.9469      0.9441      0.9455      0.9305       0.947
# final train accuracy:     0.9992       0.9984       0.999      0.9994      0.9964      0.9802       0.999
# final test objf:        -0.169885     -0.170357   -0.170604   -0.176587   -0.166264   -0.200331   -0.172659
# final train objf:       -0.0117902    -0.0107631 -0.00934661 -0.00685971  -0.0177978   -0.071727  -0.0106905
# num-parameters:           1322730      1401578     1668490     1322730     1322730     1322730     1322730

hhadian · 2017-05-17T16:09:32Z

And the results for separate combine subset (from tail):

# System                 baseline(1a)  tail_combineset_5k  tail_combineset_25k
# final test accuracy:        0.949          0.9467          0.9458
# final train accuracy:       0.9992          0.9962         0.9976
# final test objf:         -0.169885       -0.172149      -0.169397
# final train objf:       -0.0117902       -0.015221      -0.0132601
# num-parameters:           1322730         1322730         1322730

The accuracies before the final step are (almost) similar to baseline's.
Baseline exp dir: /home/hhadian/xent/egs/cifar/v1/exp/resnet1a_cifar10/
And the exp dirs for the 5k and 25k experiments:
/home/hhadian/xent/egs/cifar/v1/exp/resnet1g5k_cifar10/
/home/hhadian/xent/egs/cifar/v1/exp/resnet1g25k_cifar10/

hhadian · 2017-05-19T18:39:40Z

# System                baseline(1a) dropout_in_resblock no_selfrepair noNG_1job
# final test accuracy:        0.949      0.9441      0.9469      0.9328
# final train accuracy:       0.9992      0.9944       0.998      0.9934
# final test objf:         -0.169885   -0.167914    -0.17233    -0.20722
# final train objf:       -0.0117902  -0.0247912  -0.0125956   -0.028442
# num-parameters:           1322730     1322730     1322730     1322730

I think it might be worth it to try dropout_in_resblock on CIFAR100 as it is the only one which has improved the objective function.

danpovey · 2017-05-19T18:41:07Z

Sure, makes sense. It could also be that dropout will be more helpful when the models are larger.

…

On Fri, May 19, 2017 at 2:39 PM, Hossein Hadian ***@***.***> wrote: # System baseline(1a) dropout_in_resblock no_selfrepair noNG_1job # final test accuracy: 0.949 0.9441 0.9469 0.9328 # final train accuracy: 0.9992 0.9944 0.998 0.9934 # final test objf: -0.169885 -0.167914 -0.17233 -0.20722 # final train objf: -0.0117902 -0.0247912 -0.0125956 -0.028442 # num-parameters: 1322730 1322730 1322730 1322730 I think it might be worth it to try dropout_in_resblock on CIFAR100 as it is the only one which has improved the objective function. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#1620 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADJVu58AOGUO0tZS-qoBks0TP0NrimMeks5r7eHugaJpZM4NZ8vR> .

YiwenShaoStephen · 2017-11-22T04:42:03Z

@hhadian I'm recently also working on tuning the number of filters (nf1,2,3). Previously I only try increasing nf3. I see you showed the result of "bigger_1stlayer bigger_2nd". Can you give me more detail about these experiments, like the number of nf1 and nf2?

hhadian · 2017-11-22T21:07:23Z

@YiwenShaoStephen, for bigger 1st layer I used 64 instead of 48 and for bigger 2nd layer I used 128 instead of 96.

[src,scripts,egs] Implement and tune resnets.

4551ed2

danpovey merged commit d2d0738 into kaldi-asr:kaldi_52 May 13, 2017

Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018

[src,scripts,egs] Implement and tune resnets. (kaldi-asr#1620)

920ebb0

Conversation

danpovey commented May 13, 2017

Uh oh!

danpovey commented May 13, 2017

Uh oh!

freewym commented May 13, 2017

Uh oh!

danpovey commented May 13, 2017

Uh oh!

danpovey commented May 13, 2017

Uh oh!

hhadian commented May 13, 2017

Uh oh!

danpovey commented May 13, 2017

Uh oh!

hhadian commented May 13, 2017

Uh oh!

hhadian commented May 17, 2017

Uh oh!

hhadian commented May 17, 2017

Uh oh!

hhadian commented May 19, 2017

Uh oh!

danpovey commented May 19, 2017 via email

Uh oh!

YiwenShaoStephen commented Nov 22, 2017

Uh oh!

hhadian commented Nov 22, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants