[MKLDNN] Independent gradients requests check with respect to weights and bias of convolution #15497

zixuanweeei · 2019-07-09T14:26:47Z

Description

As it was described in #15464, MXNet with MKL-DNN gives a wrong gradient of a convolution with respect to its biases unless the gradient with respect to its weights is also requested. In the implement of convolution with MKL-DNN, only request for gradients with respect to weights is checked. It should be checked independently for bias.

Checklist

Changes

Independently check the requests for gradients with respect to weights and bias of convolution.
Convolution can give the gradient with respect to any one of weights or bias.

Comments

No comments.

zixuanweeei · 2019-07-09T14:32:48Z

@pengzhao-intel @ciyongch @TaoLv Please help me review on this PR. Thanks. 😃

pengzhao-intel · 2019-07-09T14:42:42Z

Could you try to add a UT for this case?

zixuanweeei · 2019-07-09T14:52:15Z

Could you try to add a UT for this case?

Sure. The existent UT passed with this PR on a local test. I will add some UTs for checking the correctness of the results of the gradients' requests combinations.

TaoLv

Thanks for the fix. Just one minor comment. Seems the CI is stuck, please try to re-trigger it.

TaoLv · 2019-07-10T16:25:16Z

src/operator/nn/mkldnn/mkldnn_convolution.cc

      MKLDNNStream::Get()->RegisterPrim(convBwdWeight.GetBwdWeights());
      CommitOutput(in_grad[conv::kBias], in_grad_bias);
+    } else {


Suggest to check req[conv::kWeight] here.

Sure. I see. There is unnecessary primitive registration without the check enabled. Thanks.

TaoLv · 2019-07-10T16:33:47Z

@matteosal Could you help to verify this PR with the test case in your project?

pengzhao-intel · 2019-07-11T03:02:07Z

src/operator/nn/mkldnn/mkldnn_convolution.cc

    }
-    CommitOutput(in_grad[conv::kWeight], in_grad_weight);
+    if (req[conv::kWeight]) CommitOutput(in_grad[conv::kWeight], in_grad_weight);


what's the behavior of req[conv::bias]?

It has the same behavior as req[kWeight]. Both of them return the operation request type (OpReqType) to Forward and Backward. We can use it to control the behavior of handling memory of result, like add/copy the result back to the source memory or just do nothing with them.

pengzhao-intel

I suggest to reorg the code logic to avoid multiple if/if-else structure which leads the bad readability.

matteosal · 2019-07-11T16:42:02Z

The example of #15464 is fixed here, but I see a failure with this one, where the weights gradient is requested in isolation (so the opposite of #15464 ):

import mxnet as mx

sym = mx.sym.Convolution(
	mx.sym.Variable('in'), 
	mx.sym.Variable('w'), 
	mx.sym.Variable('b'),
	kernel=(1, 1), 
	num_filter=1
)
args = {
	'in': mx.nd.ones([1, 1, 3, 3]),
	'w': mx.nd.ones([1, 1, 1, 1]),
	'b': mx.nd.ones([1]),
}
grad = {
	'in': mx.nd.zeros([1, 1, 3, 3]),
	'w': mx.nd.zeros([1, 1, 1, 1]),
	'b': mx.nd.zeros([1]),
}
req = {'in': 'null', 'w': 'write', 'b': 'null'}
outgrad = mx.nd.ones([1, 1, 3, 3])

ex = sym.bind(mx.cpu(), args, args_grad=grad, grad_req=req)

ex.forward(True);
ex.backward(out_grads=outgrad);
mx.ndarray.waitall()

This is what gets printed to command line:

Traceback (most recent call last):
  File "script2.py", line 27, in <module>
    mx.ndarray.waitall()
  File "/home/matteo/Git/mxnet/python/mxnet/ndarray/ndarray.py", line 166, in waitall
    check_call(_LIB.MXNDArrayWaitAll())
  File "/home/matteo/Git/mxnet/python/mxnet/base.py", line 253, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: std::exception

It doesn't fail on master

zixuanweeei · 2019-07-12T01:00:26Z

@matteosal I tested your example with commit 9ca0428, and it ran successfully without any exception. Could you test it again with commit 9ca0428?

matteosal · 2019-07-12T07:52:00Z

@matteosal I tested your example with commit 9ca0428, and it ran successfully without any exception. Could you test it again with commit 9ca0428?

Ops sorry, I've missed that commit. Yes, it works on that

zixuanweeei · 2019-07-12T08:11:19Z

@matteosal Thanks. That's great. I am working on a unit test for this feature. Then we can merge it to master after further review and verification.

zixuanweeei · 2019-07-14T08:23:17Z

@pengzhao-intel @TaoLv Please re-review on this PR. It should be noted that the new unit test function will be unevaluated in context of GPU because of the possible precision degradation resulted from the autotuned cudnn convolution. From a local test, the autotuned convolution has no more than 1.0% mismatches compared to a non-autotuned one, when any of the gradient request is set to be null (atol=1e-3, rtol=1e-3).

pengzhao-intel

LGTM

pengzhao-intel · 2019-07-14T08:38:57Z

the possible precision degradation resulted from the autotuned cudnn convolution.

How much degrade from GPU? Could we set a low bar for GPU?

zixuanweeei · 2019-07-14T09:23:28Z

I will take some tests to see whether a low bar works.

pengzhao-intel · 2019-07-15T11:11:54Z

@TaoLv @ciyongch please take a review too.

karan6181 · 2019-07-15T21:45:29Z

@mxnet-label-bot add [MKLDNN, pr-awaiting-review]

ciyongch

Overall looks good :)
One more question: does the gpu precision drop happen in both forward and backward?

ciyongch · 2019-07-16T01:18:33Z

tests/python/unittest/test_operator.py

+                for var_name in var_names:
+                    if var_name == "b" and no_bias:
+                        continue
+                    if grad_req2[var_name] == "null":


We don't have such case?

Yup. It is a very corner use of only requesting the gradient with respect to bias.

zixuanweeei · 2019-07-16T02:24:12Z

@ciyongch The equality assertion error happened with the outputs of convolution forward when any of the gradients requests (x, w, b) was null. Not sure about whether the Backward process drops the numerical precision.

ciyongch

Then it's fine to keep the same tolerant for both forward and backward outputs.

pengzhao-intel · 2019-07-16T02:51:11Z

Thanks for your contribution. Merging now :)

… and bias of convolution (apache#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution

…o weights… (#15805) * [MKLDNN] Independent gradients requests check with respect to weights and bias of convolution (#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution * [Flaky test] Skip test_operator_gpu.test_convolution_independent_gradients (#15631) * Skip test_convolution_independent_gradirents * Add an issue link * Fix inconsistent context of input array and binding op * Trigger CI * Retrigger CI

Independent req[kBias] and req[kWeight] check

80fde0a

zixuanweeei added 3 commits July 10, 2019 16:13

Add UT for independent conv gradient requests

9041993

Update conv independent grad UT with no_bias enabled

a21f065

Merge master and re-trigger CI

cde8ab8

TaoLv reviewed Jul 10, 2019

View reviewed changes

Check req[kWeight] for avoiding unnecessary prim registration

8b2cee4

pengzhao-intel reviewed Jul 11, 2019

View reviewed changes

pengzhao-intel suggested changes Jul 11, 2019

View reviewed changes

pengzhao-intel changed the title ~~Independent gradients requests check with respect to weights and bias of convolution~~ [MKLDNN] Independent gradients requests check with respect to weights and bias of convolution Jul 11, 2019

Check OpReqTpye in CommitOutput automatically

9ca0428

zixuanweeei added 5 commits July 12, 2019 16:20

Lock cudnn autotune for accurate conv output

3fda51f

Ignore independent gradients test on GPU

8eedac6

Merge branch 'master' and re-trigger CI

1f98778

Trigger CI

b2301ba

Merge branch 'master' and re-trigger CI

952a2ba

pengzhao-intel approved these changes Jul 14, 2019

View reviewed changes

zixuanweeei added 2 commits July 15, 2019 10:25

Sets a low bar for autotuned cudnn convolution

9767772

Merge branch 'master' and re-trigger CI

4f41250

marcoabreu added MKLDNN pr-awaiting-review PR is waiting for code review labels Jul 15, 2019

ciyongch reviewed Jul 16, 2019

View reviewed changes

ciyongch approved these changes Jul 16, 2019

View reviewed changes

pengzhao-intel merged commit 1b725c3 into apache:master Jul 16, 2019

ChaiBapchya mentioned this pull request Jul 22, 2019

Flaky test test_operator_gpu.test_convolution_independent_gradients #15603

Open

juliusshufan mentioned this pull request Aug 8, 2019

[v1.5.x] [MKLDNN] Independent gradients requests check with respect to weights… #15805

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MKLDNN] Independent gradients requests check with respect to weights and bias of convolution #15497

[MKLDNN] Independent gradients requests check with respect to weights and bias of convolution #15497

zixuanweeei commented Jul 9, 2019 •

edited

Loading

zixuanweeei commented Jul 9, 2019

pengzhao-intel commented Jul 9, 2019

zixuanweeei commented Jul 9, 2019 •

edited

Loading

TaoLv left a comment

TaoLv Jul 10, 2019

zixuanweeei Jul 11, 2019

TaoLv commented Jul 10, 2019

pengzhao-intel Jul 11, 2019

zixuanweeei Jul 11, 2019 •

edited

Loading

pengzhao-intel left a comment

matteosal commented Jul 11, 2019

zixuanweeei commented Jul 12, 2019 •

edited

Loading

matteosal commented Jul 12, 2019

zixuanweeei commented Jul 12, 2019

zixuanweeei commented Jul 14, 2019

pengzhao-intel left a comment

pengzhao-intel commented Jul 14, 2019 •

edited

Loading

zixuanweeei commented Jul 14, 2019

pengzhao-intel commented Jul 15, 2019 •

edited

Loading

karan6181 commented Jul 15, 2019

ciyongch left a comment

ciyongch Jul 16, 2019

zixuanweeei Jul 16, 2019

zixuanweeei commented Jul 16, 2019

ciyongch left a comment

pengzhao-intel commented Jul 16, 2019

[MKLDNN] Independent gradients requests check with respect to weights and bias of convolution #15497

[MKLDNN] Independent gradients requests check with respect to weights and bias of convolution #15497

Conversation

zixuanweeei commented Jul 9, 2019 • edited Loading

Description

Checklist

Changes

Comments

zixuanweeei commented Jul 9, 2019

pengzhao-intel commented Jul 9, 2019

zixuanweeei commented Jul 9, 2019 • edited Loading

TaoLv left a comment

Choose a reason for hiding this comment

TaoLv Jul 10, 2019

Choose a reason for hiding this comment

zixuanweeei Jul 11, 2019

Choose a reason for hiding this comment

TaoLv commented Jul 10, 2019

pengzhao-intel Jul 11, 2019

Choose a reason for hiding this comment

zixuanweeei Jul 11, 2019 • edited Loading

Choose a reason for hiding this comment

pengzhao-intel left a comment

Choose a reason for hiding this comment

matteosal commented Jul 11, 2019

zixuanweeei commented Jul 12, 2019 • edited Loading

matteosal commented Jul 12, 2019

zixuanweeei commented Jul 12, 2019

zixuanweeei commented Jul 14, 2019

pengzhao-intel left a comment

Choose a reason for hiding this comment

pengzhao-intel commented Jul 14, 2019 • edited Loading

zixuanweeei commented Jul 14, 2019

pengzhao-intel commented Jul 15, 2019 • edited Loading

karan6181 commented Jul 15, 2019

ciyongch left a comment

Choose a reason for hiding this comment

ciyongch Jul 16, 2019

Choose a reason for hiding this comment

zixuanweeei Jul 16, 2019

Choose a reason for hiding this comment

zixuanweeei commented Jul 16, 2019

ciyongch left a comment

Choose a reason for hiding this comment

pengzhao-intel commented Jul 16, 2019

zixuanweeei commented Jul 9, 2019 •

edited

Loading

zixuanweeei commented Jul 9, 2019 •

edited

Loading

zixuanweeei Jul 11, 2019 •

edited

Loading

zixuanweeei commented Jul 12, 2019 •

edited

Loading

pengzhao-intel commented Jul 14, 2019 •

edited

Loading

pengzhao-intel commented Jul 15, 2019 •

edited

Loading