-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Feature Request] support of diag for N-d arrays #12327
Comments
@mxnet-label-bot [Feature Request, NDArray] |
Hi @jasonyu1996. Thanks for reporting this. Ive been looking to implement a trace operator in mxnet per this request #10500. In preparing for implementing trace (which is really just summing the diagonal of the matix) I also noticed the limited implementation of the diag operator. Given that many MXNet users have data in the form of WxHxC (width, height, channel), and then add a 4th dimension for number/batch, what are your thoughts on general N-dimensionality approach for this operator? Is it necessary to support the general case? And how about implementation of the diag operator, as you mentioned there are already existing and high performance implementation for each sub-computation required. Do you think there is opportunity for further performance improvement (memory, time, etc.) by fusing these together? Or do you think it would be best to just implement diag calling these sub-computations separately (inside the diag operator)? Let me know if you have thoughts on this. I would be interested in working with you to implement this as well. Heres the original diag issue: #9253 |
Hi! Thank you for your response! I just paid a visit to the numpy interfaces for computing the diagonal, and noticed that besides As for the implementation detail, I have to admit that I am not familiar with this and am therefore not sure about the possibility of further improving the performance by implementing it in ways other than simply fusing some high-level function calls together. I think it is possibly necessary to refer to the implementation of the 2-d case, which it seems does not depend on other high-level function calls ( |
@jasonyu1996 I have been prototyping a general diagonal operator based on the numpy implementation. The issue I see is that numpy uses views, so the diagonal operator simply returns a 'diagonal view' to the original data layout in memory. Where as we want an actual copy of the data in the diagonal operator in mxnet. Seemingly we should be able to just traverse the view to copy the resulting diagonal values. Im not a deep learning expert, so I dont understand the need/usage of a diagonal operator in a network. Can you provide some motivation or use-cases for where a diagonal operator might be needed in a model? |
@samskalicky Yes. Actually I have already implemented one version in this way (#12430), simply traversing and copying the data. However, I actually think it more efficient to just generate a new view, as is the case in numpy. The problem here is that MXNet seems to lack a good support for this, as I have reported here As for the possible usage of a diagonal operator, as far as I know, it is sometimes necessary to compute a regularizer based on the diagonal of a matrix (or the diagonals of a batch of matrices). |
Thanks @jasonyu1996 ! I'll review your PR tomorrow and see if we cant get it merged. Thanks for the explanation of regularization. Can you provide an example model using the diag/trace operator in the layer calculation to further motivate this contribution? |
@samskalicky Thanks! I have just thought of two models (both for relation extraction/classification) that would be made more convenient to implement with the help of the diag/trace operator:
|
Thanks @jasonyu1996 for your contribution. Resolving. |
Hi!
I just found that the
diag
operator does not support N-d arrays whereN > 2
. According to my own experience, it could be made more useful if theN > 2
cases are properly designed. For example, I find it troublesome to take the diagonals of several matrices of the same shape at the same time. I know this task could be accomplished with a combination ofarange
,tile
andpick
, but it would be very complicated, confusing and error-prone. To support this, the behaviour whenN > 2
could be designed as taking the diagonal of the last two axes, i.e., when fed with an array of shape[d1, d2, d3, ..., dn-2, dn-1, dn]
, where the diagonal of[dn-1, dn]
is of lengthk
,diag
would return an array of shape[d1, d2, d3, ..., dn-2, k]
. Of course, this could be designed to be more flexible (allowing specifying the axes to reduce, for example).PyTorch provides a
diag
operator that behaves in the same way. Tensorflow actually splits it into two operators,diag
anddiag_part
, the former of which constructs diagonal matrices and the latter takes diagonals from matrices. They are designed to supportN > 2
but not in a way I find useful or flexible.On the MXNet forum: https://discuss.mxnet.io/t/diag-for-n-d-arrays/1707
The text was updated successfully, but these errors were encountered: