-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-978] Support higher order gradient for log
, log2
, log10
.
#14992
Conversation
16e1815
to
49e59f0
Compare
I don't know much about this library but, I believe it would be better to have gradients defined for existing backward, instead of a differentiable gradient (relying on autograd machinery) at least on Note: |
Can anyone please point how I can have |
@mxnet-label-bot add[Operator, pr-awaiting-review] |
@kshitij12345 Thanks for your contribution. I agree with you it would be better to have gradients defined for existing backward operators. I do not fully understand your question of 1/log(2.0), 1/log(10.0) multiplied with gradient for log2, log 10. Could you please elaborate? |
Reading it again, I phrased it poorly. Sorry. So actually, plan was to update Note: This is not needed for this PR. But curious to know. Thank You. |
2c05d17
to
0165565
Compare
@larroy @apeforest Have updated as per #14095 . |
Failure looks irrelevant to the PR. |
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) { | ||
// For g(x) -> g = log | ||
// g''(x) = -1 * (g'(x) * g'(x)) | ||
auto gx = nnvm::NodeEntry{n, 0, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Missed that. Thank You.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR, thanks a lot for this. just a couple of questions.
|
||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad", | ||
{ograds[0], gx}, nullptr, &n)); | ||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_inp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we returning two gradients, isn't it an unary function with just one input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log
is unary, but the _backward_log
isn't.
https://github.com/apache/incubator-mxnet/blob/016556553ced0c9c9400efec2f08a3c39cbb00ba/src/operator/tensor/elemwise_unary_op_basic.cc#L1071
It takes as input the output gradient and input to log.
https://github.com/apache/incubator-mxnet/blob/016556553ced0c9c9400efec2f08a3c39cbb00ba/src/operator/tensor/elemwise_unary_op_basic.cc#L1045
|
||
std::vector<nnvm::NodeEntry> ret; | ||
|
||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above.
* simplify NodeEntry creation.
After reviewing your code, I had a better understanding of what you meant. I think you can do an elemwise_mul operator with Op('log') and a vector filled with 2.0. There maybe other ways to optimize the graph representation, but I think this should actually work. |
unary_bwd<mshadow_op::log_grad>) | ||
.set_attr<nnvm::FGradient>("FGradient", | ||
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) { | ||
// For g(x) -> g = log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It's very nice to see a comment here. The g(x) is actually a function of x. It might be easily confused with the variable gx two lines below. Maybe use f(x)
in the comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sure . That makes sense. Thank You.
@kshitij12345 The CI failure in unix-GPU was due to a flaky test for TensorRT: #14978 The issue has been fixed by #15014. Please re-trigger CI again. Thanks! |
* update comment to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a lot for your contribution.
Lgtm |
// For f(x) -> f = log10 | ||
// f'(x) = 1 / (log(10) * x) | ||
// f''(x) = -1 * (f'(x) * 1/x) | ||
auto gx = nnvm::NodeEntry{n, 0, 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we follow the same pattern as in the natural logarithm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For natural log
,
we have with us in gradient
function, gx
i.e. 1/x
as well as x
.
Since, second derivative of log is -(gx * gx)
= -1/(x^2)
. We use the pattern.
Considering log2
(similar case for log10
)
we have with us, gx
i.e. 1/(log(2) * x)
as well as x
.
Since second derivative is -1/(log(2) * x * x)
which we get in the code using negative(gx * reciprocal(x))
, where gx=1/(log(2) * x
.
Another way to get that will be negative(gx * gx * log(2.0))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@larroy Thanks for pointing this, going through this again made me realise that there is a problem with the implementation of log
.
We multiply I have validated the expected results. from mxnet import nd, autograd
import numpy
import math
grad_grad_op = lambda x: (-1/x**2)
x = nd.random.normal(0,1,(3,3))
x.attach_grad()
with autograd.record():
y = nd.log(x)
y_grad = autograd.grad(y, x, head_grads= nd.ones_like(y) * 0.5, create_graph=True, retain_graph=True)[0]
y_grad.backward(nd.ones_like(y_grad) * 0.6)
numpy.testing.assert_allclose(x.grad.asnumpy() , ( grad_grad_op(x) * 0.5 * 0.6).asnumpy(), rtol=1e-7, atol=1e-7) Which fails with current code. Have confirmed the behaviour with Pytorch as well. import torch
import numpy
import math
grad_grad_op = lambda x: (-1/x**2)
x = torch.randn(2,3)
x.requires_grad = True
y = torch.log(x)
y_grad = torch.autograd.grad(y, x, grad_outputs=torch.ones_like(y) * 0.5, create_graph=True, retain_graph=True)[0]
y_grad.backward(torch.ones_like(y_grad) * 0.6)
numpy.testing.assert_allclose(x.grad.detach().numpy() , ( grad_grad_op(x) * 0.5 * 0.6).detach().numpy(), rtol=1e-7, atol=1e-7) |
* add higher order gradient support for log, log10, log2 * add tests * address comments * simplify NodeEntry creation. * address comments * update comment to avoid confusion. add nano cross compile add nano dockerfile workaround build error workaround build error - attempt 2 workaround build error - attempt 3 add jetson nano instructions; java api for jetson fix ci side for jetson build revert cmake updates not needed fix website build error and opencv error for arm8 make executable workaround apt install issue update python setup; remove java setup removed unneeded changes add a gpu test for verification remove scala setup step for now get rid of apt-get update since it fails every time
Hi @kshitij12345 sorry, we missed that. We should have reviewed it more carefully. Please submit another PR to fix this issue. I will also update #14613 accordingly. Thanks! |
std::vector<nnvm::NodeEntry> ret; | ||
|
||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad", | ||
{ograds[0], gx}, nullptr, &n)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be {ograds[0], g_lx}
instead? Isn't dL/dy_grad = d^2L/dx^2 * f'(x)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have created a new PR for the same. #15120 Please review |
* add higher order gradient support for log, log10, log2 * add tests * address comments * simplify NodeEntry creation. * address comments * update comment to avoid confusion.
log
.log
, log2
, log10
.
Description
With reference to #14613, #10002 , this PR intends to add support for higher order gradient for
log
{ and ideally forlog2, log10
} as well.Tests are based totally on #14613
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
log
.