-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MKLDNN] Independent gradients requests check with respect to weights and bias of convolution #15497
Conversation
@pengzhao-intel @ciyongch @TaoLv Please help me review on this PR. Thanks. 😃 |
Could you try to add a UT for this case? |
Sure. The existent UT passed with this PR on a local test. I will add some UTs for checking the correctness of the results of the gradients' requests combinations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. Just one minor comment. Seems the CI is stuck, please try to re-trigger it.
MKLDNNStream::Get()->RegisterPrim(convBwdWeight.GetBwdWeights()); | ||
CommitOutput(in_grad[conv::kBias], in_grad_bias); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to check req[conv::kWeight] here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I see. There is unnecessary primitive registration without the check enabled. Thanks.
@matteosal Could you help to verify this PR with the test case in your project? |
} | ||
CommitOutput(in_grad[conv::kWeight], in_grad_weight); | ||
if (req[conv::kWeight]) CommitOutput(in_grad[conv::kWeight], in_grad_weight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the behavior of req[conv::bias]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has the same behavior as req[kWeight]. Both of them return the operation request type (OpReqType
) to Forward and Backward. We can use it to control the behavior of handling memory of result, like add/copy the result back to the source memory or just do nothing with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to reorg the code logic to avoid multiple if/if-else structure which leads the bad readability.
The example of #15464 is fixed here, but I see a failure with this one, where the weights gradient is requested in isolation (so the opposite of #15464 ):
This is what gets printed to command line:
It doesn't fail on master |
@matteosal I tested your example with commit 9ca0428, and it ran successfully without any exception. Could you test it again with commit 9ca0428? |
Ops sorry, I've missed that commit. Yes, it works on that |
@matteosal Thanks. That's great. I am working on a unit test for this feature. Then we can merge it to master after further review and verification. |
@pengzhao-intel @TaoLv Please re-review on this PR. It should be noted that the new unit test function will be unevaluated in context of GPU because of the possible precision degradation resulted from the autotuned cudnn convolution. From a local test, the autotuned convolution has no more than 1.0% mismatches compared to a non-autotuned one, when any of the gradient request is set to be null (atol=1e-3, rtol=1e-3). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
How much degrade from GPU? Could we set a low bar for GPU? |
I will take some tests to see whether a low bar works. |
@mxnet-label-bot add [MKLDNN, pr-awaiting-review] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good :)
One more question: does the gpu precision drop happen in both forward and backward?
for var_name in var_names: | ||
if var_name == "b" and no_bias: | ||
continue | ||
if grad_req2[var_name] == "null": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have such case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. It is a very corner use of only requesting the gradient with respect to bias.
@ciyongch The equality assertion error happened with the outputs of convolution forward when any of the gradients requests ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it's fine to keep the same tolerant for both forward and backward outputs.
Thanks for your contribution. Merging now :) |
… and bias of convolution (apache#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution
… and bias of convolution (apache#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution
… and bias of convolution (apache#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution
… and bias of convolution (apache#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution
… and bias of convolution (apache#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution
…o weights… (#15805) * [MKLDNN] Independent gradients requests check with respect to weights and bias of convolution (#15497) * Independent req[kBias] and req[kWeight] check * Add UT for independent conv gradient requests * Update conv independent grad UT with no_bias enabled * Check req[kWeight] for avoiding unnecessary prim registration * Check `OpReqTpye` in CommitOutput automatically * Lock cudnn autotune for accurate conv output * Ignore independent gradients test on GPU * Trigger CI * Sets a low bar for autotuned cudnn convolution * [Flaky test] Skip test_operator_gpu.test_convolution_independent_gradients (#15631) * Skip test_convolution_independent_gradirents * Add an issue link * Fix inconsistent context of input array and binding op * Trigger CI * Retrigger CI
Description
As it was described in #15464, MXNet with MKL-DNN gives a wrong gradient of a convolution with respect to its biases unless the gradient with respect to its weights is also requested. In the implement of convolution with MKL-DNN, only request for gradients with respect to weights is checked. It should be checked independently for bias.
Checklist
Changes
Comments