[src,scripts,egs] Implement and tune resnets.#1620
[src,scripts,egs] Implement and tune resnets.#1620danpovey merged 1 commit intokaldi-asr:kaldi_52from
Conversation
|
@freewym, this (run_resnet_1b.sh) is probably the setup we want to use to test out backstitch, since it has quite competitive results: we are below 5% error (4.8%) in CIFAR-10 with only 1.3 million parameters and 16 convolutional layers, and I'm not aware of any other systems below 5% error that are anywhere near so small or so shallow. The CIFAR-100 error is 24%. Interestingly, in our setup I did not see any benefit from per-channel mean subtraction, which is normally done on CIFAR. Perhaps the natural gradient makes it unnecessary. |
|
Great! I will test based on this PR. |
|
I'm going to merge this now, since the checks completed. @hhadian, there are some things that might improve this, that it would be helpful if you could test them out. But start from 1a for most tuning experiments, as it's faster than 1b (having fewer epochs).
|
|
Oh, @hhadian, one more thing.
I am thinking of using importance sampling and per-example weighting to speed up both the training and the final combination, since I believe the derivative magnitudes in different examples will be extremely different, based on how easy different examples are to classify. But that is something we can definitely look at later on, not now-- i.e. after the NIPS deadline. |
|
Will do all |
|
And one more thing... it would be nice to know whether the self-repair makes any difference in this setup. So please try setting the self-repair-scale to 0.0 and see if it affects performance. |
|
OK will try it |
|
These are the results: |
|
And the results for separate combine subset (from tail): The accuracies before the final step are (almost) similar to baseline's. |
I think it might be worth it to try |
|
Sure, makes sense. It could also be that dropout will be more helpful when
the models are larger.
…On Fri, May 19, 2017 at 2:39 PM, Hossein Hadian ***@***.***> wrote:
# System baseline(1a) dropout_in_resblock no_selfrepair noNG_1job
# final test accuracy: 0.949 0.9441 0.9469 0.9328
# final train accuracy: 0.9992 0.9944 0.998 0.9934
# final test objf: -0.169885 -0.167914 -0.17233 -0.20722
# final train objf: -0.0117902 -0.0247912 -0.0125956 -0.028442
# num-parameters: 1322730 1322730 1322730 1322730
I think it might be worth it to try dropout_in_resblock on CIFAR100 as it
is the only one which has improved the objective function.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#1620 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu58AOGUO0tZS-qoBks0TP0NrimMeks5r7eHugaJpZM4NZ8vR>
.
|
|
@hhadian I'm recently also working on tuning the number of filters (nf1,2,3). Previously I only try increasing nf3. I see you showed the result of "bigger_1stlayer bigger_2nd". Can you give me more detail about these experiments, like the number of nf1 and nf2? |
|
@YiwenShaoStephen, for bigger 1st layer I used 64 instead of 48 and for bigger 2nd layer I used 128 instead of 96. |
No description provided.