[MXNET-978] Higher order gradient for sigmoid #15288

apeforest · 2019-06-20T05:55:08Z

Description

This PR supports higher order gradient for sigmoid operator.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Feature1, tests, (and when applicable, API doc)
Feature2, tests, (and when applicable, API doc)

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

…xnet into develop/higher_order_grad

…der_grad

apeforest · 2019-06-20T05:58:04Z

@kshitij12345 I have figured out how backward works when one of the inputs is an output of the forward node. Please review this PR. Thanks!

apeforest · 2019-06-20T05:58:24Z

@larroy @sxjscience Please help review this PR. Thanks!

apeforest · 2019-06-20T06:00:40Z

@larroy I also added the method to dump computation graph to imperative mode since it will be very useful for us to debug. However, it's still very rudimentary and we still need your help to implement a
more elegant way of printing out the graph info. thanks!

…der_grad

kshitij12345 · 2019-06-20T17:56:39Z

src/operator/tensor/elemwise_unary_op_basic.cc

+      auto grad_grad_mid = MakeNode("elemwise_mul", n->attrs.name + "_grad_mul",
+                                    {n->inputs[0], nnvm::NodeEntry{one_minus_two_y}}, nullptr, &n);
+      // when building gradient graph, the backward node of n->inputs[1] will be
+      // added to the graph again, therefore f`(x) will be multiplied


Doesn't this behaviour seem a bit odd? Is this actually the expected behaviour? What would have happened if this was split like function where we would have had many outputs, backward of all outputs be added? Actually I am confused by the behaviour.

Yes, this is actually the expected behavior. The nnvm graph will perform a DFS traverse when performing backward pass. Since the input to this backward_sigmoid node is an output of another node sigmoid, during the backward pass the gradient function sigmoid will be invoked.

See:

This is to collect dependent nodes in the graph during RecordOp:
https://github.com/apache/incubator-mxnet/blob/master/src/imperative/imperative.cc#L180

This is to actually perform the backward pass:
https://github.com/dmlc/tvm/blob/21935dcbf56ad3bd66ebff9891a6bc3865b8106d/nnvm/src/pass/gradient.cc#L126

https://github.com/dmlc/tvm/blob/21935dcbf56ad3bd66ebff9891a6bc3865b8106d/nnvm/src/pass/gradient.cc#L190

I also didn't get that. A drawing might help. I would expect that you multiply with the backward node itself "n"

I guess that is what he meant as

// n->inputs[0] : y_grad // n->inputs[1] : f(x) = sigmoid(x) // ograds[0] : head_grads // f''(x) = f'(x) * (1 - 2*f(x))

Backward of node n->inputs[1] is the Node n itself.

One up for the visual drawing/graph of what is actually happening.

Thanks I'll look at it today.

I kind of see from the graph you have drawn and dumped graph as to what is happening. I still strongly feel that it is the incorrect behaviour.

To check that I modified the test code slightly to also compute and validate for the 3rd order gradient as it should come for free since the second order are composed from differentiable functions.

Have tested for sin, cos, log, sigmoid. Out of them only sigmoid fails for the third order.

I have also attached the hand computation of third-order for sigmoid, so please verify that just to make sure that it is not incorrect.

For some reason it is rotated.

import math from mxnet import nd, autograd from mxnet.test_utils import assert_almost_equal, random_arrays, rand_shape_nd from common import with_seed @with_seed() def test_sin(): def sin(x): return nd.sin(x) def grad_grad_op(x): return -nd.sin(x) def grad_grad_grad_op(x): return -nd.cos(x) for dim in range(1, 5): shape = rand_shape_nd(dim) array = random_arrays(shape) check_second_order_unary(array, sin, grad_grad_op, grad_grad_grad_op) @with_seed() def test_cos(): def cos(x): return nd.cos(x) def grad_grad_op(x): return -nd.cos(x) def grad_grad_grad_op(x): return nd.sin(x) for dim in range(1, 5): shape = rand_shape_nd(dim) array = random_arrays(shape) check_second_order_unary(array, cos, grad_grad_op, grad_grad_grad_op) @with_seed() def test_log(): def log(x): return nd.log(x) def grad_grad_op(x): return -1/(x**2) def grad_grad_grad_op(x): return 2/(x**3) for dim in range(1, 5): shape = rand_shape_nd(dim) array = random_arrays(shape) check_second_order_unary(array, log, grad_grad_op, grad_grad_grad_op) @with_seed() def test_sigmoid(): def sigmoid(x): return nd.sigmoid(x) def grad_op(x): return sigmoid(x) * (1 - sigmoid(x)) def grad_grad_op(x): return grad_op(x) * (1 - 2 * sigmoid(x)) def grad_grad_grad_op(x): return grad_grad_op(x) - 2 * ( grad_op(x)**2 + grad_grad_op(x) * sigmoid(x)) for dim in range(1, 5): shape = rand_shape_nd(dim) array = random_arrays(shape) check_second_order_unary(array, sigmoid, grad_grad_op, grad_grad_grad_op) def check_second_order_unary(x, op, grad_grad_op, grad_grad_grad_op): x = nd.array(x) grad_grad_x = grad_grad_op(x) grad_grad_grad_x = grad_grad_grad_op(x) x.attach_grad() # Manual head_grads. y_grad = nd.random.normal(shape=x.shape) head_grad_grads = nd.random.normal(shape=x.shape) # Perform compute. with autograd.record(): y = op(x) x_grad = autograd.grad(heads=y, variables=x, head_grads=y_grad, create_graph=True, retain_graph=True)[0] x_grad_grad = autograd.grad(heads=x_grad, variables=x, head_grads=head_grad_grads, create_graph=True, retain_graph=True)[0] x_grad_grad.backward() # Compute expected values. expected_grad_grad = grad_grad_x.asnumpy() * head_grad_grads.asnumpy() * \ y_grad.asnumpy() expected_grad_grad_grad = grad_grad_grad_x.asnumpy() * head_grad_grads.asnumpy() * \ y_grad.asnumpy() # Validate the gradients. assert_almost_equal(expected_grad_grad, x_grad_grad.asnumpy()) assert_almost_equal(expected_grad_grad_grad, x.grad.asnumpy()) if __name__ == '__main__': import nose nose.runmodule()

I suspect the discrepancy in 3rd order gradient is not because an error in my implementation of grad_grad_input, but because I did not return back the first grad_grad in correctly. After all, the first output may be useful in calculating higher order even if they are not visible output at Python level. I will look into it.

@kshitij12345 I have fixed the issue. The result can pass your test now. Please review again. Thanks!

larroy · 2019-06-20T21:22:29Z

src/operator/tensor/elemwise_unary_op_basic.cc

+      // n->inputs[0] : y_grad
+      // n->inputs[1] : f(x) = sigmoid(x)
+      // ograds[0] : head_grads
+      // f''(x) = f'(x) * (1 - 2*f(x))


larroy · 2019-06-20T21:28:30Z

src/operator/tensor/elemwise_unary_op_basic.cc

+      // n->inputs[0] : y_grad
+      // n->inputs[1] : f(x) = sigmoid(x)
+      // ograds[0] : head_grads
+      // f''(x) = f'(x) * (1 - 2*f(x))


larroy · 2019-06-20T21:29:19Z

tests/python/unittest/test_higher_order_grad.py

@@ -106,6 +106,23 @@ def grad_grad_op(x):
        check_second_order_unary(array, log10, grad_grad_op)


+@with_seed()
+def test_sigmoid():


looks correct to me.

larroy · 2019-06-20T21:30:23Z

src/imperative/imperative.cc

@@ -501,6 +501,10 @@ std::vector<NDArray*> Imperative::Backward(
    }
  }

+  if (dmlc::GetEnv("MXNET_MEM_PLAN_VERBOSE_LOGGING", false)) {


nice hack :)

Can you explain how logging of static memory helps here ?

He wants to dump the graph. maybe should be separate PR?

It's actually dumping out the computation graph.

@apeforest isn't that what I wrote? could you answer wrt separating into a different PR? Also MXNET_MEM_PLAN_VERBOSE_LOGGING is not documented in faq/env_var.md

@larroy I did not see your comment before I refresh the page. Our minds think alike :)
Sure, I can separate this into a different PR. I added here only to help @kshitij12345 dump out the graph and understand the backward pass better.

The PR is great, up to you to separate not a big deal. I'm not a radical on my reviews.

@apeforest Really like the dump, its great help to actually see the graph. Thank You.

larroy · 2019-06-27T00:45:55Z

Can we merge this?

…der_grad

apeforest · 2019-07-02T04:48:05Z

I verified the result is the same as pytorch

import torch
import numpy as np
import math

op = lambda x: torch.sigmoid(x)
grad_op = lambda x: op(x) * (1 - op(x))
grad_grad_op = lambda x: grad_op(x) * (1 - 2 * op(x))
grad_grad_grad_op = lambda x: grad_grad_op(x) - 2 * ( grad_op(x)**2 + grad_grad_op(x) * op(x))

x = torch.tensor(np.array([1, 2, 3]), dtype=torch.float32)
head_grads = torch.tensor(np.array([1, 1, 1]), dtype=torch.float32) * 0.5
head_grad_grads = torch.tensor(np.array([1, 1, 1]), dtype=torch.float32) * 0.6
head_grad_grad_grads = torch.tensor(np.array([1, 1, 1]), dtype=torch.float32) * 0.7
x.requires_grad = True
head_grads.requires_grad = True

y = op(x)
x_grad = torch.autograd.grad(y, x, grad_outputs= head_grads, create_graph=True, retain_graph=True)[0]
expected_grad_x = (grad_op(x) * head_grads).detach().numpy()
print('expected_grad_x = {}'.format(expected_grad_x))
print('grad_x          = {}'.format(x_grad.detach().numpy()))
x_grad_grad = torch.autograd.grad(x_grad, x, grad_outputs= head_grad_grads, create_graph=True, retain_graph=True)[0]
x_grad_grad.backward(head_grad_grad_grads)

expected_grad_grad_x = (grad_grad_op(x) * head_grads * head_grad_grads).detach().numpy()
expected_head_grad = (grad_op(x) * head_grad_grads).detach().numpy()
expected_grad_grad_grad_x = (grad_grad_grad_op(x) * head_grads * head_grad_grads * head_grad_grad_grads).detach().numpy()

print('expected_grad_grad_x = {}'.format(expected_grad_grad_x))
print('grad_grad_x          = {}'.format(x_grad_grad.detach().numpy()))
print('expected_grad_grad_grad_x = {}'.format(expected_grad_grad_grad_x))
print('grad_grad_grad_x          = {}'.format(x.grad.detach().numpy()))

apeforest · 2019-07-02T16:22:27Z

@sxjscience Please help to review this PR. Thanks!

update code as per apache#15288.

apeforest · 2019-07-02T22:05:44Z

@kshitij12345 could you approve the PR if everything looks good to you now? thx

* init to reset * issue: higher order backward sigmoid * update gradient code. update code as per #15288. * undo changes * relax tolerance of gradient mismatch for tanh * update comments * update comments

…5253) * init to reset * issue: higher order backward sigmoid * update gradient code. update code as per apache#15288. * undo changes * relax tolerance of gradient mismatch for tanh * update comments * update comments

sxjscience and others added 30 commits October 14, 2018 14:56

try to add support some ops

45e1502

Merge branch 'higher_order_sample' of https://github.com/sxjscience/m…

904adb4

…xnet into develop/higher_order_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

d5dc994

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

0e69075

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

0c7cf98

…der_grad

add unit test for second order grad

492e4cd

implement grad for relu and add unit test

45b334e

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

3bbfbac

…der_grad

fix lint

4dc0907

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

c4034b2

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

3fe54e6

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

76aa6ad

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

8458717

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

f66610b

…der_grad

register FGradient attribute for backward relu

30ff1e9

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

8ecffcc

…der_grad

resolve conflict

d9ba3da

remove unused imports

1c93c7d

change gradient using set_attr

de721bc

remove higher order grad test for negative(x)

0ac0942

fix lint

f8e624e

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

3315124

…der_grad

reverse indent

8538980

remove unused backward operator

1ee38b5

refactor backward for sin(x) and cos(x)

c18f317

change value init to list init

689cfee

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

d56e132

…der_grad

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

2207815

…der_grad

change to list initialization

0b6c2ef

generate random shape in test

31f671f

apeforest added 5 commits June 18, 2019 12:59

Merge branch 'master' into develop/higher_order_grad

d060102

test 2nd order gradient for sigmoid

94e3b5f

higher order grads for sigmoid

7d95760

add unit test

f43489d

remove blank lines

55d7ebc

apeforest added 3 commits June 19, 2019 23:04

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

25d3f78

…der_grad

update test

10dab58

fix lint

d134d2f

kshitij12345 reviewed Jun 20, 2019

View reviewed changes

larroy reviewed Jun 20, 2019

View reviewed changes

access2rohit approved these changes Jun 21, 2019

View reviewed changes

larroy approved these changes Jun 21, 2019

View reviewed changes

Roshrini added the pr-awaiting-review PR is waiting for code review label Jun 23, 2019

apeforest added 2 commits June 29, 2019 01:37

Merge remote-tracking branch 'upstream/master' into develop/higher_or…

30b1ba9

…der_grad

fix third order gradient for sigmoid

6848d42

kshitij12345 added a commit to kshitij12345/incubator-mxnet that referenced this pull request Jul 2, 2019

update gradient code.

9bbeb16

update code as per apache#15288.

kshitij12345 approved these changes Jul 3, 2019

View reviewed changes

apeforest merged commit 6a8d9eb into apache:master Jul 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-978] Higher order gradient for sigmoid #15288

[MXNET-978] Higher order gradient for sigmoid #15288

apeforest commented Jun 20, 2019 •

edited

Loading

apeforest commented Jun 20, 2019

apeforest commented Jun 20, 2019

apeforest commented Jun 20, 2019 •

edited

Loading

kshitij12345 Jun 20, 2019

apeforest Jun 20, 2019

larroy Jun 20, 2019

kshitij12345 Jun 21, 2019

apeforest Jun 21, 2019

kshitij12345 Jun 22, 2019

kshitij12345 Jun 22, 2019 •

edited

Loading

apeforest Jun 24, 2019

apeforest Jul 2, 2019

kshitij12345 Jul 2, 2019

larroy Jun 20, 2019

larroy Jun 20, 2019

larroy Jun 20, 2019

larroy Jun 20, 2019

access2rohit Jun 21, 2019

larroy Jun 21, 2019

apeforest Jun 21, 2019

larroy Jun 21, 2019 •

edited

Loading

apeforest Jun 21, 2019

larroy Jun 22, 2019

kshitij12345 Jun 22, 2019

larroy commented Jun 27, 2019

apeforest commented Jul 2, 2019

apeforest commented Jul 2, 2019

apeforest commented Jul 2, 2019

[MXNET-978] Higher order gradient for sigmoid #15288

[MXNET-978] Higher order gradient for sigmoid #15288

Conversation

apeforest commented Jun 20, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

apeforest commented Jun 20, 2019

apeforest commented Jun 20, 2019

apeforest commented Jun 20, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kshitij12345 Jun 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larroy Jun 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

larroy commented Jun 27, 2019

apeforest commented Jul 2, 2019

apeforest commented Jul 2, 2019

apeforest commented Jul 2, 2019

apeforest commented Jun 20, 2019 •

edited

Loading

apeforest commented Jun 20, 2019 •

edited

Loading

kshitij12345 Jun 22, 2019 •

edited

Loading

larroy Jun 21, 2019 •

edited

Loading