Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Feature request - einsum #10840

Closed
jaanli opened this issue May 7, 2018 · 13 comments
Closed

Feature request - einsum #10840

jaanli opened this issue May 7, 2018 · 13 comments

Comments

@jaanli
Copy link

jaanli commented May 7, 2018

Useful for writing models: https://rockt.github.io/2018/04/30/einsum

@roywei
Copy link
Member

roywei commented May 9, 2018

@sandeep-krishnamurthy could you help to add label Feature Request? Thanks

@jasonyu1996
Copy link
Contributor

I am interested in implementing this. Has anyone already started working on this?

@sandeep-krishnamurthy
Copy link
Contributor

@jasonyu1996 - Thanks for looking into this issue, as far as I know, nobody is working on this feature. Contributions are welcome. Let us know, if you need any help.

@jasonyu1996
Copy link
Contributor

@sandeep-krishnamurthy It seems that it is somewhat complicated to implement einsum in the backend. But I would like to have a try first. Thanks!

@jasonyu1996
Copy link
Contributor

jasonyu1996 commented Sep 4, 2018

@sandeep-krishnamurthy I am not sure whether this should be implemented in the backend. It needs to be based on several other operators, as it would be complicated and inefficient to directly forward and backward as a whole (especially backward). The HybridBlock in my opinion would be a good place to hold the implementation, but there are two problems:

  1. Firstly, implementing a HybridBlock would make the code language-dependent, which means supporting these many languages would be error-prone and labour-intensive.
  2. Secondly, this might violate the definition of a Block a little bit, because the doc tells us Block is the 'base class for neural network layers and models', whose subclasses are mostly in the gluon.nn package. einsum of course may or may not fit into this, as I believe it is somewhat ambiguous what 'neural network layer' actually means, and there is hardly any difference between a 'neural network layer' and an operator, especially a complex one (whether parameterized or not does not help, because many layers in gluon.nn have no parameters, and some layers are even simple capsulation of the corresponding operators).

Could you point to me where I should work if I need to use other operators? Or, alternatively, is there a way of storing temporary data in forward for use in backward? Thanks!

@sandeep-krishnamurthy
Copy link
Contributor

Operators are stateless, but, I remember there is a optimization switch, that enables saving data from forward pass to be used in backward pass to make the computation faster.
@azai91 - Can you please help here?

@jasonyu1996
Copy link
Contributor

Would it be good then to implement it as a HybridBlock?

@jasonyu1996
Copy link
Contributor

jasonyu1996 commented Sep 6, 2018

I have almost finished the implementation, save some polishing and testing. The current solution is not in the backend, but alongside Block. To allow one piece of code to support both ndarray and symbol at the same time, it uses the same idea as HybridBlock. The difference is that for HybridBlock there might be parameters inside and it is necessary to instantiate and possibly hybridize it before use, whereas an operator has its interfaces exposed in mxnet.ndarray and mxnet.symbol, and should look the same as other operators 'imported' from the backend.

I think this is probably a decision on the design, as it means a new interface for developing new operators which are relatively complex and high-level and should be implemented based on existing ones.

@yifeim
Copy link
Contributor

yifeim commented Nov 29, 2018

May I follow up what is the current state of einsum? It is convenient at times and most useful, perhaps, for its autograd computation.

@jasonyu1996
Copy link
Contributor

@yifeim I have actually implemented one based on the high-level interfaces. However, it supports only the Gluon interface. I guess to add support for Symbol I have to move onto a lower abstraction layer, or rely on some missing high-level interfaces (#12484 for example).

@altosaar @roywei @sandeep-krishnamurthy I would be grateful if you can help.

@yifeim
Copy link
Contributor

yifeim commented Nov 30, 2018

@jasonyu1996 Could you actually point me to the Gluon interface. A hacky way to get a symbol is to export gluon and load back as a symbol.

@yifeim
Copy link
Contributor

yifeim commented Nov 30, 2018

See #13244 for an example about how to export gluon models.

@sxjscience
Copy link
Member

sxjscience commented Apr 13, 2020

We now have einsum in the numpy interface of MXNet. Thus, close this issue. You may try

import mxnet as mx
import numpy as np
mx.npx.set_np()

lhs = mx.np.array(np.random.normal(0, 1, (64, 8, 128, 512)), dtype=np.float32, ctx=mx.gpu())
rhs = mx.np.array(np.random.normal(0, 1, (64, 8, 128, 512)), dtype=np.float32, ctx=mx.gpu())
mx.npx.waitall()

gt = mx.np.einsum('abiz,abjz->abij', lhs, rhs)
gt_np = gt.asnumpy()

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants