-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Add matrix determinant operator in linalg #15007
Conversation
@mxnet-label-bot add[Operator, pr-work-in-progress] |
@eric-haibin-lin @reminisce @apeforest @anirudh2290 |
cc @asmushetzel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! A few comments on improving shape/type inference logic.
src/operator/tensor/la_op.cc
Outdated
|
||
Examples:: | ||
|
||
// Single matrix inversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think //
is needed here since this is not a piece of c++ code.
inversion -> determinant
src/operator/tensor/la_op.cc
Outdated
A = [[1., 4.], [2., 3.]] | ||
det(A) = [-5.] | ||
|
||
// Batch matrix inversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
mxnet::ShapeVector* out_attrs) { | ||
CHECK_EQ(in_attrs->size(), 1); | ||
CHECK_EQ(out_attrs->size(), onum + 2); | ||
const mxnet::TShape& in = (*in_attrs)[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this line of code here:
if (!ndim_is_known(in)) return false;
src/operator/tensor/la_op.h
Outdated
} | ||
SHAPE_ASSIGN_CHECK(*out_attrs, onum, in); /* LU */ | ||
SHAPE_ASSIGN_CHECK(*out_attrs, onum + 1, mxnet::TShape(in.begin(), in.end() - 1)); /* pivot */ | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace this with
return shape_is_known(in);
src/operator/tensor/la_op.h
Outdated
int dtype = (*in_type)[0]; | ||
CHECK_NE(dtype, -1) << "Input must have specified type"; | ||
|
||
out_type->clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use TYPE_ASSIGN_CHECK
for every output type assignment.
src/operator/tensor/la_op.h
Outdated
using namespace mshadow; | ||
CHECK_EQ(in_type->size(), 1U); | ||
int dtype = (*in_type)[0]; | ||
CHECK_NE(dtype, -1) << "Input must have specified type"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to
if (dtype == -1) return false;
src/operator/tensor/la_op.h
Outdated
std::vector<int>* out_type) { | ||
using namespace mshadow; | ||
CHECK_EQ(in_type->size(), 1U); | ||
int dtype = (*in_type)[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const int dtype
src/operator/tensor/la_op.h
Outdated
CHECK_EQ(in_type->size(), 1U); | ||
int dtype = (*in_type)[0]; | ||
CHECK_NE(dtype, -1) << "Input must have specified type"; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If only fp32/64 are supported, please add one line here for checking whether dtype is equal to kFloat32/kFloat64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All updated.
Regarding this point:
I've looked into pytorch code more carefully, I think their implementation is wrong. A simple example to show this:
Since in actual computing it's very hard to hit upon det == 0 using float, is it really necessary to implement non-invertible case? Also I haven't thought of a good method now, any suggestions are welcomed. |
@@ -939,5 +939,153 @@ NNVM_REGISTER_OP(_backward_linalg_inverse) | |||
.set_attr<nnvm::TIsBackward>("TIsBackward", true) | |||
.set_attr<FCompute>("FCompute<cpu>", LaOpBackward<cpu, 2, 2, 2, 1, inverse_backward>); | |||
|
|||
NNVM_REGISTER_OP(_linalg_det) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that we should add all types of variants like "det", "logdet" etc. Conceptually there is no end to that (why not have logdetminus1?). It also doesn't save any compute time to provide as the overwhelming compute is in the core determinant computation and not in a subsequent log or exp. And we end up with a lot of duplicate redundant code.
So I would propose to provide one generic method, which is likely signed logdet() and leave it to the user to apply additional operators when she/he needs some variant. You can even make the return of "sign" optional by a parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. If we only want to keep one method, then the slogdet
is the best choice.
Since the community is now working on providing numpy experience in mxnet, how about we follow the numpy.linalg package, and keep det
and slogdet
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello,
I developed the linalg operators with @asmushetzel.
I feel it is not very elegant to introduce a lot of MXNet operators, which are essentially just done by sticking existing operators together. It would be a lot cleaner just to provide Python functions for this (using F in {mx.nd, mx.sym} as first arg).
Note we ourselves made this mistake with sumlogdiag, which is ugly and should not be there, really (we could be excused, since diag wasn't there back then).
For example, if you really want logdet, you get it by a LQ decomp (linalg.geqlf), followed by log.sum over the diagonal of L, we have ops for all of this. In fact, you probably want log(abs(det)), because the determinant could be negative. You could return the sign as well.
I don't understand also why a det(.) op is needed, given there is logdet(.) with sign. You can get one from the other. Also, det(.) for large matrices is prone to over or underflow anyway.
It is also somewhat dangerous to offer such operators, because they end up recomputing the underlying factorizations every time. For example, to evaluate the log likelihood of a Gaussian, you need logdet and backsubstitution. You compute the Cholesky decomp. once, then use it twice. With your logdet, I cannot do that. It computes something inside, but does not return it.
Finally, I also find it dangerous to offer inverse. The matrix inverse is almost never needed, but people who lack numerical maths knowledge use it. They should not, it leads to bad code. They should use matrix factorizations, like Cholesky or LQ (i.e. QR), or SVD.
So, if you really want to do something useful, think about a set of Python functions for derived operators. An example:
def linalg_logdet_from_chol(F, lmat):
return F.sum(F.log(F.abs(F.diag(lmat))))
If you really want:
def linalg_logdet(F, amat):
return linalg_logdet_from_chol(F, F.potrf(amat))
src/operator/tensor/la_op.cc
Outdated
|
||
Examples:: | ||
|
||
Single matrix inversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this and similar other comments will be updated (here refering to "matrix inversion")
src/operator/tensor/la_op.cc
Outdated
|
||
Examples:: | ||
|
||
Single matrix inversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Matrix inversion"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/operator/tensor/la_op-inl.h
Outdated
} | ||
}; | ||
|
||
// partial pivoting LU decomposition: A = PLU, so det(A) = det(P)det(L)det(U) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading comment. This is the final computation of logdet and sign based on an existing LU decomposition, not the LU decomposition itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to explain the computation method in det, I'll change the comment to make it more clear.
src/operator/tensor/la_op.h
Outdated
} | ||
|
||
// Type inference function for linalg_inverse | ||
inline bool InverseType(const nnvm::NodeAttrs& attrs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a combined diff with #14963? Both PRs should be either one joined PR or be clearly separated in terms of code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's separate PRs, I'll remove the changes not related to this PR.
src/operator/tensor/la_op.cc
Outdated
@@ -920,7 +920,7 @@ Examples:: | |||
.set_attr<nnvm::FListInputNames>("FListInputNames", [](const NodeAttrs& attrs) | |||
{ return std::vector<std::string>{"A"}; } ) | |||
.set_attr<mxnet::FInferShape>("FInferShape", InverseShape) | |||
.set_attr<nnvm::FInferType>("FInferType", ElemwiseType<1, 1>) | |||
.set_attr<nnvm::FInferType>("FInferType", InverseType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need an "InverseType"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to add type checks in layer setup. Now many size or type checks in linalg operators are carried out in forward process, which is not the common practice in mxnet.
But this change is not related to this PR, I'll remove it.
src/operator/tensor/la_op-inl.h
Outdated
using namespace mshadow::expr; | ||
Kernel<SignedLogDet, xpu>::Launch(s, pivot.size(0), pivot.size(1), pivot.dptr_, | ||
LU.dptr_, sign.dptr_, logdet.dptr_); | ||
const_cast<Tensor<xpu, 1, DType>&>(logdet) = F<mshadow_op::log>(sign) + logdet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make too much sense: If sign==1 then log(1)=0 (so no effect) and if sign is "-1" then this crashes anyway. See my other comments regarding to provide only one variant of det() and leave any other one to the user (by applying additional operators).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove logdet
.
src/operator/tensor/la_op-inl.h
Outdated
@@ -825,6 +901,100 @@ struct inverse_backward { | |||
} | |||
}; | |||
|
|||
// Here we set grad to zero if det = 0 as a temporary method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not add "temporary" methods. It's either the way we want to handle the case of det()==0 or not. Fully ok to define that the gradient is assumed to be zero in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very sure how to handle det = 0 case. If it's ok to use zero here, I'll change the comment and add a note in operator docs.
Shape3(det.size(0), 1, 1)), mxnet::TShape(LU.shape_)) * \ | ||
transpose(LU, Shape3(0, 2, 1)); | ||
Stream<xpu> *s = ctx.get_stream<xpu>(); | ||
// stop grad for zero det temporarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"temporarily"
src/operator/linalg.h
Outdated
// Compute matrix inversion with LU and pivot using temp workspace, | ||
// the result stores back to LU | ||
template<typename xpu, typename DType> | ||
void linalg_batch_det_helper(const Tensor<xpu, 3, DType>& LU, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading comment and naming. It looks to me that this helper is not needed to compute the determinant, it is just used for the backward pass and it assumes that the determinant is actually computed and available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Updates:
|
Hello, I developed the linalg operators with @asmushetzel. I feel it is not very elegant to introduce a lot of MXNet operators, which are essentially just done by sticking existing operators together. It would be a lot cleaner just to provide Python functions for this (using F in {mx.nd, mx.sym} as first arg). Note we ourselves made this mistake with sumlogdiag, which is ugly and should not be there, really (we could be excused, since diag wasn't there back then). For example, if you really want logdet, you get it by a LQ decomp (linalg.geqlf), followed by log.sum over the diagonal of L, we have ops for all of this. In fact, you probably want log(abs(det)), because the determinant could be negative. You could return the sign as well. I don't understand also why a det(.) op is needed, given there is logdet(.) with sign. You can get one from the other. Also, det(.) for large matrices is prone to over or underflow anyway. It is also somewhat dangerous to offer such operators, because they end up recomputing the underlying factorizations every time. For example, to evaluate the log likelihood of a Gaussian, you need logdet and backsubstitution. You compute the Cholesky decomp. once, then use it twice. With your logdet, I cannot do that. It computes something inside, but does not return it. Finally, I also find it dangerous to offer inverse. The matrix inverse is almost never needed, but people who lack numerical maths knowledge use it. They should not, it leads to bad code. They should use matrix factorizations, like Cholesky or LQ (i.e. QR), or SVD. So, if you really want to do something useful, think about a set of Python functions for derived operators. An example: def linalg_logdet_from_chol(F, lmat): If you really want:def linalg_logdet(F, amat): |
@mseeger I think many ordinary users are not experts in linear algebra, so providing a set of basic linear blas/lapack routines and some advanced operators at the same time is the best way for all levels of users. My reason to improve linalg package comes from my daily usage and some users' feature requests. I agree that using basic operators is more efficient, but sometimes easy usage is more important and it doesn't matter if the computation is not fast enough as long as I've quickly finished the implementation. Operators like inverse, det and slogdet are implemented in numpy, tensorflow and pytorch, it's really annoying there are no mxnet equivalents when "translating" implementations from other platforms. (Actually, these related PRs comes from my project which involves translating thin plate spline algorithm.) Also for using python to create some derived operators, actually it's what I want to do in the first place but is restricted by the weak supports of mxnet on registering backward function. Using fine-grained operators to mimic high level operation is all right for forward pass, but the backward pass will be terrible because the combined simple gradient computation is split into backward passes of basic operators. Now the only way to overwrite backward function in mxnet is Custom operator, which is not elegant. In pytorch, it's more user-and-developer-friendly to do this, and the following is the backward of matrix inverse registered in yaml file:
I think a good linalg package is like scipy.linalg, in which all routines of blas and lapack are exposed, and there are also a lot of high level functions. |
@arcadiaphy What's the path forward on this PR ? :) |
@piyushghai Two paths forward for linalg:
This PR is for the 2nd option. If it's merged, the third part of linalg improvement is to add |
Tend to agree with @arcadiaphy concerning adding these operators though conceptually you can do the same more efficient by explicitly handling matrix factorization etc on your own. Providing a more seemless experience when porting models from one framework to the other is important for adoption. No objections from my side concerning merging this PR. |
@szha How about we merge this PR and move forward? |
@szha Is this PR good to go for merge? Thanks! |
@szha Gentle ping for review. |
We would like to merge this PR so that we can push forward adding more ops to match |
* add backbone * cpu forward det * refactor for gpu forward det * fix * register gpu det forward * add gpu det backward * register gpu det backward * fix * add logdet slogdet backward * stop grad for zero det * fix * fix * reduce grad transfer * fix docs * update comments * fix docs * fix lint * add test * update docs * add operator * update test * trigger CI * remove slash * update comments and docs * update det helper function * update operator check * remove logdet * add no grad when det = 0 * update comments and docs * remove remaining logdet
Description
The second PR on linalg enhancement. Three operators regarding matrix determinant is added: det, logdet, slogdet.
Something worth mentioning in this PR:
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments