[RELAY] Add primal gradients for Relay operators. #2562

jroesch · 2019-02-04T19:51:07Z

Relay's automatic differentiation is still missing primal gradients. It would be interesting to integrate with the Tensor level AD at some point, but for the time being we should focus on adding primal gradients. I will open an PR adding to the basic set but we should work towards completion for Relay operators. Those with expertise on the less straight forward gradient computations help would be appreciated.

The gradients should be in C++ and provide tests, see below for complete list.

Level 1

Level 2

Level 3

Level 4

Level 5

tvm.relay.image.resize
tvm.relay.vision.multibox_prior
tvm.relay.vision.multibox_transform_loc
tvm.relay.vision.nms

Level 10

masahi · 2019-02-05T04:37:31Z

does this mean we need to write all gradient ops in TOPI (conv2d_grad etc)? That would be major undertaking.

reminisce · 2019-02-05T05:04:54Z

To ease the work of implementing so many gradient expressions, I think we can take advantage of this PR #2498 for simple operators and attach appropriate schedules. For complicated operators such as convolution, we will probably need to implement gradient expression manually.

sergei-mironov · 2019-02-05T12:38:27Z

We think that a portion of above operations may indeed be handled by #2498. We will test tensor-level AD for compatibility with listed operations and publish results. Meanwhile, we work on integration of AD with Relay. We plan to provide a layer similar in spirit to our NNVM draft https://github.com/sgrechanik-h/tvm/blob/87d6f319f74360b9dfd0578b68214d1309b208fe/nnvm/src/top/tensor/gradient.cc .

ajtulloch · 2019-02-05T20:36:54Z

@jroesch given how many of these are just simple either elementwise ops (log, etc) or reductions (broadcast, etc) - would it be possible for you (or someone familiar with how you want this work done) to first implement one of them as a template (i.e. showing desired code location (alongside or in separate file?), primal grad registration, direct + gradient checking in unittests, etc), which will allow others to efficiently use that as a template for the similar work?

jroesch · 2019-02-06T20:45:06Z

@ajtulloch yes, there are a few basic ones committed to the repo, I will try to open a PR with multiple examples from level 1 this week. I've been busy prototyping other Relay features for training and execution which I hope to RFC in the coming weeks.

@reminisce @grwlf I think it would be great if we could get default behavior for Relay, and if the generated gradient's performance isn't sufficient we can hand implement them. @tqchen what do you think about this approach?

sergei-mironov · 2019-02-08T14:11:09Z

@jroesch , dear all. We made a quick check of AD-Relay compatibility: For every relay operation from the above list, we (a) Look at its FTVMCompute attribute (b) determine which TOPI function corresponds to it and (c) Compare the gradients of this function calculated by AD with their numerical estimations. The results are in the table below.

Additional notes:

Numerical check in this test may need adjustments, we saw rare random failures due to precision problems
Some functions run different implementations depending on parameters passed. We attempted to include the most common cases, but some combinations may be missing.
Checking the performance of all operations would require additional efforts, we don't do it now.
For cases with 'Integers gradients' comment: we need to clarify the gradient semantics for such operations. One possible solutions is to just return zeros. But we think that it may be incorrect for some tasks.
To reproduce, apply [TVM] Automatic differentiation for tensor expressions #2498 to the 427bdcc26 commit of TVM and use the following test.

PS We think about writing TVM Python codegen to pretty-print TVM IR code. Does anybody work on it?

Legend:

Supported, numerical check passed
Missing by accident/easy to add
Need to think first
Need to debug
Unable to check

Status	Name	Comment
Level 1
	tvm.relay.log	Currently we do not assert on negative values which may be incorrect
	tvm.relay.sqrt	Missing by accident, easy to fix
	tvm.relay.exp
	tvm.relay.sigmoid
	tvm.relay.add
	tvm.relay.substract
	tvm.relay.multiply
	tvm.relay.divide
	tvm.relay.mod	🔢 Integer gradients
	tvm.relay.tanh
	tvm.relay.concatenate
	tvm.relay.expand_dims
	tvm.relay.softmax
	tvm.relay.log_softmax
	tvm.relay.relu
	tvm.relay.dropout	💻 Missing FTVMCompute attribute
	tvm.relay.batch_norm	💻 Missing FTVMCompute attribute
	tvm.relay.bias_add
Level 2
	tvm.relay.conv2d
	tvm.relay.conv2d_transpose
	tvm.relay.dense
	tvm.relay.max_pool
	tvm.relay.avg_pool
	tvm.relay.global_max_pool
	tvm.relay.global_avg_pool
	tvm.relay.upsampling
	tvm.relay.flatten
	tvm.relay.pad
	tvm.relay.lrn	Blocked by missing pow intrinsic
	tvm.relay.l2_normalize	Blocked by missing sqrt intrinsic
	tvm.relay.conv2d_winograd_without_weight_transform	Missing TOPI implementation
	tvm.relay.conv2d_winograd_weight_transform
Level 3
	tvm.relay.leaky_relu
	tvm.relay.prelu
	tvm.relay.reshape
	tvm.relay.reshape_like
	tvm.relay.copy_identity
	tvm.relay.transpose
	tvm.relay.squeeze
	tvm.relay.floor	🔢 Integer gradients
	tvm.relay.ceil	🔢 Integer gradients
	tvm.relay.trunc	🔢 Integer gradients
	tvm.relay.clip	Missing Not operation
	tvm.relay.round	🔢 Integer gradients
	tvm.relay.abs
	tvm.relay.negative
	tvm.relay.take
	tvm.relay.zeros
	tvm.relay.zeros_like
	tvm.relay.ones
	tvm.relay.ones_like
	tvm.relay.full
	tvm.relay.full_like
	tvm.relay.cast	Currently, differentiate returns zeros for non-float32 inputs
	tvm.relay.split
Level 4
	tvm.relay.right_shift	🔢 Integer gradients
	tvm.relay.left_shift	🔢 Integer gradients
	tvm.relay.equal	🔢 Integer gradients
	tvm.relay.not_equal	🔢 Integer gradients
	tvm.relay.greater	🔢 Integer gradients
	tvm.relay.greater_equal	🔢 Integer gradients
	tvm.relay.less	🔢 Integer gradients
	tvm.relay.less_equal	🔢 Integer gradients
	tvm.relay.maximum
	tvm.relay.minimum
	tvm.relay.power	Missing by accident, should be easy to fix.
	tvm.relay.where	🐍 Missing Python API
	tvm.relay.argmax
	tvm.relay.argmin
	tvm.relay.sum
	tvm.relay.max
	tvm.relay.min
	tvm.relay.mean
	tvm.relay.prod
	tvm.relay.strided_slice
	tvm.relay.broadcast_to
Level 5
	tvm.relay.resize	Blocked by missing floor intrinsic
	tvm.relay.multibox_prior
	tvm.relay.multibox_transform_loc
	tvm.relay.nms
Level 10
	tvm.relay.broadcast_to_like
	tvm.relay.collapse_sum_like	🐍 Missing Python API
	tvm.relay.slice_like
	tvm.relay.layout_transform	🐍 Missing Python API
	tvm.relay.device_copy	💻 Missing FTVMCompute attribute
	tvm.relay.on_device	💻 Missing FTVMCompute attribute

tqchen · 2019-02-08T18:03:26Z

While it is great to have a tensor expression gradient support. I recommend we provide the primal gradient in the form of relay operators, at this moment.

The main reason is that the relay-> relay transformation and makes it easier to do follow up analysis and transformations in relay, it also makes sure that each op can generate different variants easily(winograd, spatial pack for conv2d).

This does not eliminate the value of expression level gradient though, as they could be nice complementary when a user define custom op, and as a topic of research in the long run, if integrated properly with relay

sergei-mironov · 2019-02-11T11:48:17Z

Expressing gradients in relay would be a good design test. My thoughts regarding this design choice are follows:

I am not sure that all listed operations have gradients which may be expressed in Relay language currently. Ideally we should move towards creating a list of basic operations which form a closed set in the sense that they have gradients expressible in themselves.
Expressing gradients in relay may foster Relay's C++ API.
As an option, one may implement operations in Relay (in addition to provide its FTVMCompute attribute). This way It would become a subject to relay's differentiation engine. dense, softmax are possible candidates for this approach.

sgrechanik-h · 2019-02-15T19:12:10Z

I've updated the tensor expression AD PR with a Relay integration, here. The commit itself is here.

ZihengJiang · 2019-02-20T22:26:47Z

I am working on adding gradient definition for some level 1/2 operators, see #2633 for details

SWu · 2019-05-30T15:23:01Z

I'm interested in helping contribute gradient implementations, but I'm finding it a bit difficult to understand what orientation the original op arguments are in, and what role collapse_sum_like plays (its documentation, "Return a scalar value array with the same shape and type as the input array." is identical to broadcast_to_like, and I'm not really understanding the mathematical operation it's performing).

As an example, by trial and error I arrived at the following for nn.dense:

@register_gradient("nn.dense")
def dense_grad(orig, grad):
    data, weight = orig.args
    return [collapse_sum_like(transpose(transpose(weight) * grad), data),
            collapse_sum_like(transpose(grad * transpose(data)), weight)]

I'm verifying this by checking gradient values numerically from a toy tensorflow model with a dense layer that I converted. I would not have expected to need the outer transpose here, but without it it seems like collapse_sum_like was broadcasting a sum on the wrong axis.

Would it be possible to provide a more detailed tutorial about how to translate a known mathematical form of a gradient to a relay implementation, to make it easier for the community to contribute some of these implementations?

jroesch · 2019-05-31T09:19:48Z

@altanh could you maybe further improve upon the docs when you open your PR and try to address some of @SWu's comments.

Altan has revived the work in the past few weeks and we have been working on a library for using Relay for training, he will hopefully follow up on this thread with more details.

altanh · 2019-06-01T21:01:27Z

@SWu this is an issue that I've run into as well. I believe the specific documentation issue you ran into is indeed a copy-paste error, which we should fix. Overall though, the documentation is lacking as @jroesch said, and we (who implement more grads) should definitely update it with better descriptions as we work through them.

For collapse_sum_like, I dug into the TOPI code, and it looks like the general idea is to match up tensor dimensions (starting from the last dimension of both) and reduce them (using sum) until it matches the target shape. If two dims don't match, then the input dim is reduced and squeezed, and we continue trying to match. If they are equal, then do nothing. If the output dimension is 1, then we reduce down to 1.

For example, if A.shape() = (4,5,3) and B.shape() = (5, 1), then collapse_sum_like(A, B) will reduce the 3rd dim of A to 1 (i.e. keepdims=True), not reduce the 2nd dimension, and then reduce and squeeze (i.e. keepdims=False) the 1st dimension. It's unclear to me how this will work for 'mismatched' shapes like (4,4,4) and (3,2), since the input will just be completely squeezed (and from what I can tell, there's no error check for this, so maybe this is correct behavior that I don't understand).

We also need to think about the best way to verify correctness of these implementations, since currently the numerical tests in TVM are somewhat arbitrary. Your approach seems solid for ensuring correct behavior with respect to existing frameworks. This problem is more general than just for gradients though, and I think we should have a TVM-wide discussion.

As for your last point, I think this would be a good idea. I'll try to type up a tutorial of sorts walking through my implementation of softmax once I'm done with my current work. I don't want to write too much more here (and maybe this is already too much), but hopefully this helped. I'll make a more comprehensive post once the PR is ready.

tqchen · 2020-09-03T17:40:09Z

closing for now due to inactive status, let us open new thread for new TODOs of gradients

tqchen added the status: help wanted label Feb 4, 2019

yzhliu mentioned this issue Mar 2, 2019

[DEV] TVM v0.6 Roadmap #2623

Closed

28 tasks

altanh mentioned this issue May 14, 2019

[TVM] Automatic differentiation for tensor expressions #2498

Closed

altanh mentioned this issue Jun 27, 2019

[Relay] [WIP] Add some missing primal gradients for Relay operators. #3443

Closed

SWu mentioned this issue Aug 29, 2019

[Relay] Add grads #3857

Merged

tqchen mentioned this issue Nov 8, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

t-vi mentioned this issue Jun 23, 2020

add a few gradients #5899

Merged

tqchen added the status: inactive label Sep 3, 2020

tqchen closed this as completed Sep 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELAY] Add primal gradients for Relay operators. #2562

[RELAY] Add primal gradients for Relay operators. #2562

jroesch commented Feb 4, 2019 •

edited by ZihengJiang

Loading

masahi commented Feb 5, 2019

reminisce commented Feb 5, 2019

sergei-mironov commented Feb 5, 2019

ajtulloch commented Feb 5, 2019

jroesch commented Feb 6, 2019

sergei-mironov commented Feb 8, 2019 •

edited

Loading

tqchen commented Feb 8, 2019

sergei-mironov commented Feb 11, 2019

sgrechanik-h commented Feb 15, 2019

ZihengJiang commented Feb 20, 2019

SWu commented May 30, 2019

jroesch commented May 31, 2019 •

edited

Loading

altanh commented Jun 1, 2019

tqchen commented Sep 3, 2020

[RELAY] Add primal gradients for Relay operators. #2562

[RELAY] Add primal gradients for Relay operators. #2562

Comments

jroesch commented Feb 4, 2019 • edited by ZihengJiang Loading

Level 1

Level 2

Level 3

Level 4

Level 5

Level 10

masahi commented Feb 5, 2019

reminisce commented Feb 5, 2019

sergei-mironov commented Feb 5, 2019

ajtulloch commented Feb 5, 2019

jroesch commented Feb 6, 2019

sergei-mironov commented Feb 8, 2019 • edited Loading

tqchen commented Feb 8, 2019

sergei-mironov commented Feb 11, 2019

sgrechanik-h commented Feb 15, 2019

ZihengJiang commented Feb 20, 2019

SWu commented May 30, 2019

jroesch commented May 31, 2019 • edited Loading

altanh commented Jun 1, 2019

tqchen commented Sep 3, 2020

jroesch commented Feb 4, 2019 •

edited by ZihengJiang

Loading

sergei-mironov commented Feb 8, 2019 •

edited

Loading

jroesch commented May 31, 2019 •

edited

Loading