Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting to Cython #7

Open
mritunjaymusale opened this issue Mar 3, 2020 · 6 comments
Open

Porting to Cython #7

mritunjaymusale opened this issue Mar 3, 2020 · 6 comments

Comments

@mritunjaymusale
Copy link

Is it possible for you to port this into Cython and release it as PyPI package ?
It would be easy for existing DL users(tf and pytorch users) to use it natively in their code.

@keroro824
Copy link
Owner

Thanks for your suggestion. Let us make this the priority! We'll @ you when it is done.

@rahulunair
Copy link
Contributor

Thank you @keroro824 !

@wrathematics
Copy link
Contributor

Sort of related, but I've been building R bindings.

@keroro824
Copy link
Owner

@wrathematics Thanks for contributing 👍

@its-sandy
Copy link

Hi, are there any updates on this?

@nomadbl
Copy link

nomadbl commented Nov 6, 2021

Is it possible for you to port this into Cython and release it as PyPI package ? It would be easy for existing DL users(tf and pytorch users) to use it natively in their code.

I'm also interested in implementing such a thing. But it seems to me the way to do this would be to implement custom layers instead of builtin ones. This could be added to the main codebase once it is tested rather than a separate package.

For example in pytorch you would first subclass 'torch.autograd.Function' to implement forward and backward operations which calculate the hashing operations and take that into account in forward and back propagation. Cython might not be needed as you might be able to use numba and get better performance more easily.

@keroro824 I've actually started doing what I described. I have a question:
Do you have some justification for only propagating the gradient to active neurons? It's not obvious to me why this would be a good approximation of the true gradient.
There is another method the math would suggest: the gradient w.r.t the input of a linear layer is (repeated indices indicate a sum):
y_i = W_{ij} * x_{j} + b_i
dx_k = dy_i * W_{ik}
So we can use LSH for calculating the backprop but we need more hash tables than the paper suggests. The multiplications in the backprop are by columns of the weight matrix, and the forward prop is multiplication by rows of the weight matrix. Did you try something like this?

It would be very interesting to me to implement this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants