Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Batch normalization + LeNet with batchnorm example #140

Closed
wants to merge 2 commits into from

Conversation

CarloLucibello
Copy link
Collaborator

I implemented both the version wich keeps the moment of each batch in the training set and the one with exponential decay average (now commented out), following some indications in the discussion in #139

In my experiments on cpu with the LeNet example the extra operations needed by batchnorm an increase of
+160% for decaying average
+130% for "keep each batch moments"
in computational time for each epoch with respect to lenet.jl example.
With decaying average we spare on memory resources, which can have some impact on really large training set with small batchsizes. Which one should I keep?

@denizyuret @ilkerkesen @AStupidBear @mambuDL any comments?

Exports the batchnorm function and the BatchMoments type. fixes #139

@jgbos
Copy link

jgbos commented Jul 1, 2017

When it comes to modules I don't know if you'd be interested in what I have been playing with here

https://github.com/jgbos/KFuddles

I go a little further with the module to allow the macro @sequence to automatically build the vector of weights and naming of parameters for saving. It's a bit hacky for how I deal with indexing the weights though. Feel free to use any of the code if it's useful.

@CarloLucibello
Copy link
Collaborator Author

CarloLucibello commented Jul 1, 2017

When it comes to modules I don't know if you'd be interested in what I have been playing with here

https://github.com/jgbos/KFuddles

That is some really nice work! As much as I love the transparency of Knet, I would also like for it to also offer that kind of higher level interface, has most deep learning frameworks do. Maybe @denizyuret has already some plans for that or would be interested in that sort of design discussion?

@jgbos
Copy link

jgbos commented Jul 1, 2017

yes, i think there is a case for building the weights and functions without high level interfaces. But I also utilize the Sequential function in PyTorch all the time, so good to have both I think?

@CarloLucibello
Copy link
Collaborator Author

also, working with layer types, who can keep an internal state, as for instance is needed in batch normalization, makes for cleaner code where you do not have to pass parameters all around, as I do with bmom in this PR.

@jgbos
Copy link

jgbos commented Jul 1, 2017

yeah, batch normalization was the main reason I started my code, the same reason you built BatchMoments. I also wanted to way to automatically build the vector of weights and save parameters with unique names.

@denizyuret
Copy link
Owner

I plan to work on a standard module interface ala #152 for 0.8.6. I will look at this in the context of the whole modular style.

@denizyuret
Copy link
Owner

I decided to work on better CUDNN integration for 0.8.6. In particular the general CNN/RNN speed went up considerably. Please see #193 for @cgumeli's integration of the CUDNN batchnorm, we are still ironing out the interface. I have not benchmarked but suspect it will beat manual implementations. We will also look at dropout and softmax from CUDNN and replace existing implementations if they offer significant performance boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add batchnorm
3 participants