[Numpy] Differentiable svd #15795

hzfan · 2019-08-08T06:50:35Z

Description

Differentiable svd

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

add np.linalg.svd
add forward and backward tests

Comments

Thank @reminisce and @haojin2 for review and guidance.

mseeger · 2019-08-13T08:28:46Z

src/operator/c_lapack_api.h

@@ -242,6 +249,20 @@ inline void flip(int m, int n, DType *b, int ldb, DType *a, int lda) {
  #define MXNET_LAPACK_sgetrf LAPACKE_sgetrf
  #define MXNET_LAPACK_dgetrf LAPACKE_dgetrf

+  #define MXNET_LAPACK_CWRAP_GESVD(prefix, dtype) \
+  inline int MXNET_LAPACK_##prefix##gesvd(int matrix_layout, int m, int n, dtype* ut, \


Not sure I understand this case. Maybe add a comment?

The LAPACK_gesvd function interface differs in signature from the MXNET_LAPACK-signature and have to be wrapped (as is stated here). So this is basically a wrapper of LAPACK_gesvd.

I added some comments about how to use the LAPACK_gesvd. Its official document can be found here.

mseeger · 2019-08-13T08:32:23Z

src/operator/c_lapack_api.h

@@ -361,6 +382,26 @@ inline void flip(int m, int n, DType *b, int ldb, DType *a, int lda) {
  MXNET_LAPACK_CWRAP_SYEVD(ssyevd, float)
  MXNET_LAPACK_CWRAP_SYEVD(dsyevd, double)

+  #define MXNET_LAPACK_CWRAP_GESVD(func, dtype) \


Add comment that due to row-major and internal column-major, the arguments are flipped and transposed, and m and n are flipped as well

mseeger · 2019-08-13T08:36:01Z

src/operator/linalg.h

+// CPU/GPU-versions of LAPACK function "gesvd". Please refer to the
+// LAPACK documentation for further details.
+// Note:
+// - V is input and output parameter (overwritten by A)


V is input and output parameter (it overwrites A)

mseeger · 2019-08-13T08:45:52Z

src/operator/linalg_impl.h

+                              Stream<cpu> *s) { \
+  check_gesvd(UT, L, V); \
+  DType lwork(0); \
+  MXNET_LAPACK_##fname(MXNET_LAPACK_ROW_MAJOR, V.size(0), V.size(1), \


I think this is wrong. You must have done the workspace query before (calling the other function), so the size of work will be fine to pass for lwork, right? So no need to do the workspace query again here.

The reason why I called the workspace query again in the syevd implementation, is that there, work consists of two different workspaces. But here, I think you can just use work.size(0) to pass for lwork, and don't have to do the query again.

Yes, you are right. I have removed the query.

mseeger · 2019-08-13T08:51:51Z

src/operator/linalg_impl.h

+LINALG_CPU_GESVD(sgesvd, float)
+LINALG_CPU_GESVD(dgesvd, double)
+
+// Mangle temp storage requirements for DType and int into a single


Remove this comment, it only applies to syevd. See my comment above, you have a single workspace here.

mseeger · 2019-08-13T08:56:39Z

src/operator/numpy/linalg/np_gesvd-inl.h

+
+// (UT, L, V) = gesvd(A) [singular value decomposition]
+// - V can overwrite A
+// - Needs workspace (both DType and int), size of which is determined by a


Only one workspace (DType) needed here

mseeger · 2019-08-13T08:59:14Z

src/operator/numpy/linalg/np_gesvd-inl.h

+  });
+}
+
+// Helper for gesvd_backward. See technical report for details


Is the report cited somewhere?

Not yet. The public technical report (https://arxiv.org/pdf/1710.08717.pdf) does not include details about svd.

Ah, you are right. I will have the new report version being uploaded.

Could you cite the arxiv paper in the code? The new version with SVD will be uploaded in the next few days, way before this CR will get merged. Thanks.

Yes. Cited.

mseeger · 2019-08-13T09:10:56Z

src/operator/numpy/linalg/np_gesvd-inl.h

+  return 1e-100;
+}
+
+struct GesvdBackHelper_dV {


Comment: dA overwritten by L^-1 dA

Comment added

mseeger · 2019-08-13T09:13:47Z

src/operator/numpy/linalg/np_gesvd-inl.h

+  }
+};
+
+struct GesvdBackHelper_G1 {


Comment: X (square) overwritten by L X

imho, X (square) overwritten by X L

Yes, it is all transposed, because of this row/col major issue

Comment added

mseeger · 2019-08-13T09:16:14Z

src/operator/numpy/linalg/np_gesvd-inl.h

+
+    // G1:
+    // This copy is just to make sure there are no invalid values (NaN, infinity) in tempM
+    Copy(tempMs, dUT, s);


Good! I hope our old code does that as well

Our old code does that. I learnt from our old code (syevd) about this.

mseeger · 2019-08-13T09:19:23Z

src/operator/numpy/linalg/np_gesvd-inl.h

+    // G1:
+    // This copy is just to make sure there are no invalid values (NaN, infinity) in tempM
+    Copy(tempMs, dUT, s);
+    Copy(tempMr, dA, s);


Damn, you are right, we need this temp space, because we cannot left-multiply with UT in place.

This could be a real problem for big matrices. It can be circumvented by implementing the left-multiplication with a small square matrix in-place. This is extra work, but would be needed to avoid the large extra temp space

mseeger · 2019-08-13T09:27:02Z

src/operator/numpy/linalg/np_gesvd-inl.h

+    }
+    for (int i = 0; i < m; ++i) {
+      elem = DType(0.0);
+      for (int j = 0; j < n; ++j) {


This is not needed. You have computed this before already, using gemm, just pass in the diagonal.
Also, the function does not need dA and V as input args.

Yes, fixed.

mseeger · 2019-08-13T09:28:13Z

src/operator/numpy/linalg/np_gesvd-inl.h

+    // This copy is just to make sure there are no invalid values (NaN, infinity) in tempM
+    Copy(tempMs, dUT, s);
+    Copy(tempMr, dA, s);
+    gemm::op(dA, V, tempMs, DType(1.0), DType(0.0), false, true, s);


After this line, you extract the diagonal of this matrix and then pass it to GesvdBackHelper_G2

Note that tempMs cannot be used to store the extracted diagonal (I have used tempMs to store G1). Do we need extra temp space (of size m) to store the diagonal ?

Yes true, but that is really small extra space. But yes, have to allocate that as well. Instead, we can get rid of tempMr (below)

mseeger · 2019-08-13T09:30:27Z

src/operator/numpy/linalg/np_gesvd-inl.h

+    if (dA.dptr_ != dV.dptr_) {
+      Copy(dA, dV, s);
+    }
+    // From here on, we work on dA only


Could you assign k = dA.size(0), m = dA.size(1), n = dA.size(2) here, and use them below?

mseeger · 2019-08-13T09:32:21Z

src/operator/numpy/linalg/np_gesvd-inl.h

+    gemm::op(dUT, UT, tempMs, DType(1.0), DType(1.0), true, false, s);
+
+    // G2:
+    Kernel<GesvdBackHelper_G2, xpu>::Launch


Pass in the diagonal extracted above, and do not pass dA, V.

mseeger · 2019-08-13T09:41:31Z

src/operator/numpy/linalg/np_gesvd-inl.h

+
+    // G3:
+    gemm::op(tempMs, V, dA, DType(1.0), DType(1.0), false, false, s);
+    gemm::op(UT, dA, tempMr, DType(1.0), DType(0.0), false, false, s);


It is very annoying we need this large temp space, because we don't have in-place left-multiply with square matrix. The drawback here is that for large matrices (large n), this needs a large temp space. It may be worth avoiding that.

mseeger · 2019-08-13T09:42:55Z

src/operator/numpy/linalg/np_gesvd-inl.h

+                 const Tensor<xpu, 3, DType>& V,
+                 const Tensor<xpu, 3, DType>& dA,
+                 const Tensor<xpu, 3, DType>& tempMs,
+                 const Tensor<xpu, 3, DType>& tempMr,


As mentioned below, this large temp space could be avoided by some extra implementation

Could you elaborate how to
"coding up "in-place" dA <- dot(UT, dA) using a temp space of shape (m, m)"?
The method I have thought about is:
We can write dA as blocks: dA = [dA1, dA2, ..., dAx], where dAi is of shape (m, m) and x = ceil(n / m)
so dot(UT, dA) = [dot(UT, dA1), ..., dot(UT, dAx)], and each dot can be achieved with temp space of shape (m, m)
I don't know whether I have understood correctly.

Yes exactly, something like you are saying. Essentially you slice the large matrix up into (m, m) blocks, and then you can even just call gemm in a loop. This may be the easiest, in fact. What I mean, you mask out (m, m) blocks of dA and iterate over them, always replacing one block B with dot(UT, B). For that, you need an (m, m) temp space, but that you have.

Please ask if this is still unclear (but your comment is what I have in mind). You can mask out blocks of dA simply by moving the pointer, everything else (stride, etc) remains the same, because the n-axis is the continuous one. The final block will be (m, m2), where m2 <= m, but that should also not be a problem.

It's been clear. See my commet below to see if my implementation is correct.

mseeger · 2019-08-13T09:46:35Z

tests/python/unittest/test_numpy_op.py

@@ -0,0 +1,131 @@
+# Licensed to the Apache Software Foundation (ASF) under one


Very nice test!

mseeger

The main point to change is extracting the diag in backward, instead of recomputing it.

Another point could be to avoid the large temp space by coding up "in-place"
dA <- dot(UT, dA)
using a temp space of shape (m, m). You can use the one you already have, at this point it is not needed anymore. This could be worth it when this is used for large matrices.

asmushetzel · 2019-08-13T10:24:00Z

src/operator/numpy/linalg/np_gesvd-inl.h

+DType gesvd_back_helper_eps(DType* X);
+
+template<>
+MSHADOW_XINLINE float gesvd_back_helper_eps(float* X) {


Would be cleaner to make this constants dependent on values in std::numeric_limits. So something along the lines of
std::numeric_limits::epsilon*10 (or whatever number you need to be on the safe side)
That also avoids doing template specialization here.

I tried it out. But I found std::numeric_limits::epsilon() cannot be accessed in Cuda. So I stick with the original implementation for now.

asmushetzel · 2019-08-13T10:27:31Z

src/operator/numpy/linalg/np_gesvd-inl.h

+
+    // G1:
+    // This copy is just to make sure there are no invalid values (NaN, infinity) in tempM
+    Copy(tempMs, dUT, s);


May be cleaner to fill with zero instead of just copying some arbitrary data. I think there is a specific method in mshadow or elsewhere to fill a tensor with such values (if not, copying like here is likely the best way)

The only way I know to fill tensor with zero is to:

tempMs.FlatTo1D() = 0; tempMr.FlatTo1D() = 0;

But our old code (syevd) sticks to copying. Which do you think is better?

The copy in syevd is here

Aha, so it seems I am to blame for that. But it is fine

I changed the copying to filling with zeros.

hzfan · 2019-08-14T09:29:21Z

Just updated the code.
The two main points updated:

reuse the diagonal results of (L^-1 dV VT) with an additional temp space of shape (m,)
avoid tempMr (which is of shape (m, n)) by joining blocks of dots.

Now the total temp space used is (k, m, m) + (k, m)

hzfan · 2019-08-14T11:13:44Z

src/operator/numpy/linalg/np_gesvd-inl.h

+
+    // G3:
+    gemm::op(tempM, V, dA, DType(1.0), DType(1.0), false, false, s);
+    for (int i = 0; i < n; i += m) {


Here I go through the blocks dAi (0:m, i: i + m). Other attrs of tensors like stride_, stream_ are not changed, as you said. The ncols is the m2 you just mentioned. It is used for the last block.

mseeger · 2019-08-15T08:13:34Z

src/operator/numpy/linalg/np_gesvd-inl.h

+
+    // G3:
+    gemm::op(tempM, V, dA, DType(1.0), DType(1.0), false, false, s);
+    for (int i = 0; i < n; i += m) {


Add comment:
dA <- dot(UT, dA). Loop over (k, m, m) blocks to avoid large temporary memory

Comment added.

mseeger

Very nice. Just one more comment about a missing comment, then this is ready to go, AFAI am concerned. Just make sure your test works both on CPU and GPU.

hzfan · 2019-08-15T13:58:36Z

Thank @mseeger and @asmushetzel for guidance and review.

* use (m, m) temp space * add technical report citation * add comments for the tricky block matrix multiplication * differentiable svd

hzfan requested a review from szha as a code owner August 8, 2019 06:50

hzfan changed the title ~~Differentiable svd~~ [Numpy] Differentiable svd Aug 8, 2019

reminisce added the Numpy label Aug 8, 2019

mseeger reviewed Aug 13, 2019

View reviewed changes

mseeger suggested changes Aug 13, 2019

View reviewed changes

asmushetzel reviewed Aug 13, 2019

View reviewed changes

hzfan force-pushed the svd_pr branch 2 times, most recently from da6d472 to 67191c4 Compare August 14, 2019 09:02

hzfan force-pushed the svd_pr branch from aecd90c to 554ce67 Compare August 14, 2019 11:01

hzfan commented Aug 14, 2019

View reviewed changes

hzfan force-pushed the svd_pr branch from 554ce67 to 87a27a4 Compare August 15, 2019 03:28

mseeger reviewed Aug 15, 2019

View reviewed changes

mseeger approved these changes Aug 15, 2019

View reviewed changes

hzfan force-pushed the svd_pr branch from 0ae3d4c to 1f2cc19 Compare August 15, 2019 13:56

hzfan force-pushed the svd_pr branch from 1f2cc19 to 1a97e58 Compare September 2, 2019 07:49

hzfan force-pushed the svd_pr branch from 1a97e58 to 46d2804 Compare September 10, 2019 06:06

Fan added 4 commits September 19, 2019 12:40

differentiable svd

8ba4b3b

use (m, m) temp space

4fd9862

add technical report citation

2c37b58

add comments for the tricky block matrix multiplication

f85cce1

reminisce force-pushed the svd_pr branch from 46d2804 to f85cce1 Compare September 19, 2019 19:41

reminisce merged commit 6247dc8 into apache:master Sep 19, 2019

drivanov pushed a commit to drivanov/incubator-mxnet that referenced this pull request Sep 26, 2019

[Numpy] Differentiable svd (apache#15795)

1e078ac

* use (m, m) temp space * add technical report citation * add comments for the tricky block matrix multiplication * differentiable svd

larroy pushed a commit to larroy/mxnet that referenced this pull request Sep 28, 2019

[Numpy] Differentiable svd (apache#15795)

ac7d4d8

* use (m, m) temp space * add technical report citation * add comments for the tricky block matrix multiplication * differentiable svd

		@@ -0,0 +1,131 @@
		# Licensed to the Apache Software Foundation (ASF) under one

[Numpy] Differentiable svd #15795

[Numpy] Differentiable svd #15795

Conversation

hzfan commented Aug 8, 2019

Description

Checklist

Essentials

Changes

Comments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseeger Aug 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hzfan Aug 13, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hzfan Aug 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseeger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hzfan commented Aug 14, 2019

hzfan Aug 14, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mseeger left a comment

Choose a reason for hiding this comment

hzfan commented Aug 15, 2019

mseeger Aug 13, 2019 •

edited

Loading

hzfan Aug 13, 2019 •

edited

Loading

hzfan Aug 14, 2019 •

edited

Loading

hzfan Aug 14, 2019 •

edited

Loading