-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-978] Higher Order Gradient Support reciprocal
, abs
.
#15413
[MXNET-978] Higher Order Gradient Support reciprocal
, abs
.
#15413
Conversation
reciprocal
, abs
.reciprocal
, abs
.
const std::unordered_map<std::string, std::string> args = {{"scalar", "-2.0"}}; | ||
|
||
auto dydx_mul_dldy = nnvm::NodeEntry{n}; // f'(x) * head_grads | ||
auto dydx = MakeNode("elemwise_div", n->attrs.name + "_dydx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to divide this explicitly here? I think the final _backward_grad_grad_input will also carry the term head_grads in the output, we may not need this extra node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see that you need this node for the first output "_backward_grad_grad"
return nd.reciprocal(x) | ||
|
||
def grad_grad_op(x): | ||
return 2/x**3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add space between /
@larroy Please help to review. |
@mxnet-label-bot add [pr-awaiting-review] |
shape = rand_shape_nd(dim) | ||
array = random_arrays(shape) | ||
check_second_order_unary(array, abs, grad_grad_op) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please remove extra line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two lines between functions as per pep8: https://stackoverflow.com/questions/2953250/python-pep8-blank-lines-convention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fixed actually. I guess I removed the lower line so it is not showing up here.
[](const nnvm::NodePtr& n, const std::vector<nnvm::NodeEntry>& ograds) { | ||
// ograds[0]: dL/dxgrad | ||
// inputs[0]: dL/dy | ||
// inputs[1]: y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this term be x? _backward_abs is using ElemwiseGradUseIn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
* fix extra line in tests. * fix missing space. * fix incorrect comment.
…to develop/add-higher-order/reciprocal-abs
auto dydx_mul_dldy = nnvm::NodeEntry{n}; // f'(x) * head_grads | ||
auto dydx = MakeNode("elemwise_div", n->attrs.name + "_dydx", | ||
{dydx_mul_dldy, n->inputs[0]}, nullptr, &n); | ||
auto fx = MakeNode("reciprocal", n->attrs.name + "_fx", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small thing, Could we get fx from the first backward (node->inputs) if we do ElemwiseGradUseInOut ? I guess we would avoid additional divisions if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can use it as that would work as our _backward_reciprocal
which is binary will have to support 3 inputs.
|
||
std::vector<nnvm::NodeEntry> ret; | ||
|
||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment would help here, this one is the output corresponding to dL/dy from the first backward right?
I'm still unclear since the previous PRs on what
dL/dxgrad * dy/dx represents. To me is not obvious after spending more than half an hour thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even I am not sure of its significance in literature. But if you look at dL/dx = dL/dy * dy/dx
as just c = a * b
, then dc/da = b
while dc/db=a
.
So that is all I am thinking, does dL/dy
affect our dL/dx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This term will be useful when you calculate the third order (and above) gradient.
|
||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad", | ||
{ograds[0], nnvm::NodeEntry{dydx}}, nullptr, &n)); | ||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad_inp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems ok.
std::vector<nnvm::NodeEntry> ret; | ||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad", | ||
{ograds[0], nnvm::NodeEntry(dydx)}, nullptr, &n)); | ||
ret.emplace_back(MakeNode("zeros_like", n->attrs.name + "_backward_grad_grad_in", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
{nnvm::NodeEntry{n}, n->inputs[0]}, nullptr, &n); | ||
|
||
std::vector<nnvm::NodeEntry> ret; | ||
ret.emplace_back(MakeNode("elemwise_mul", n->attrs.name + "_backward_grad_grad", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get the first output, but the result for x_grad_grad looks fine to me.
Sorry that I've been busy this week (for the upcoming conference). I'll take a look next week. |
Sure. No worries. Good Luck! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
PR intends to add support for higher order gradient for
reciprocal
,abs
.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
reciprocal
,abs
.