-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Batch normalization + LeNet with batchnorm example #140
Conversation
- add LeNet + batchnorm example
When it comes to modules I don't know if you'd be interested in what I have been playing with here https://github.com/jgbos/KFuddles I go a little further with the module to allow the macro |
That is some really nice work! As much as I love the transparency of Knet, I would also like for it to also offer that kind of higher level interface, has most deep learning frameworks do. Maybe @denizyuret has already some plans for that or would be interested in that sort of design discussion? |
yes, i think there is a case for building the weights and functions without high level interfaces. But I also utilize the |
also, working with layer types, who can keep an internal state, as for instance is needed in batch normalization, makes for cleaner code where you do not have to pass parameters all around, as I do with |
yeah, batch normalization was the main reason I started my code, the same reason you built |
I plan to work on a standard module interface ala #152 for 0.8.6. I will look at this in the context of the whole modular style. |
I decided to work on better CUDNN integration for 0.8.6. In particular the general CNN/RNN speed went up considerably. Please see #193 for @cgumeli's integration of the CUDNN batchnorm, we are still ironing out the interface. I have not benchmarked but suspect it will beat manual implementations. We will also look at dropout and softmax from CUDNN and replace existing implementations if they offer significant performance boost. |
I implemented both the version wich keeps the moment of each batch in the training set and the one with exponential decay average (now commented out), following some indications in the discussion in #139
In my experiments on cpu with the LeNet example the extra operations needed by batchnorm an increase of
+160% for decaying average
+130% for "keep each batch moments"
in computational time for each epoch with respect to
lenet.jl
example.With decaying average we spare on memory resources, which can have some impact on really large training set with small batchsizes. Which one should I keep?
@denizyuret @ilkerkesen @AStupidBear @mambuDL any comments?
Exports the
batchnorm
function and theBatchMoments
type. fixes #139