-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELAY] Add primal gradients for Relay operators. #2562
Comments
does this mean we need to write all gradient ops in TOPI ( |
To ease the work of implementing so many gradient expressions, I think we can take advantage of this PR #2498 for simple operators and attach appropriate schedules. For complicated operators such as convolution, we will probably need to implement gradient expression manually. |
We think that a portion of above operations may indeed be handled by #2498. We will test tensor-level AD for compatibility with listed operations and publish results. Meanwhile, we work on integration of AD with Relay. We plan to provide a layer similar in spirit to our NNVM draft https://github.com/sgrechanik-h/tvm/blob/87d6f319f74360b9dfd0578b68214d1309b208fe/nnvm/src/top/tensor/gradient.cc . |
@jroesch given how many of these are just simple either elementwise ops (log, etc) or reductions (broadcast, etc) - would it be possible for you (or someone familiar with how you want this work done) to first implement one of them as a template (i.e. showing desired code location (alongside or in separate file?), primal grad registration, direct + gradient checking in unittests, etc), which will allow others to efficiently use that as a template for the similar work? |
@ajtulloch yes, there are a few basic ones committed to the repo, I will try to open a PR with multiple examples from level 1 this week. I've been busy prototyping other Relay features for training and execution which I hope to RFC in the coming weeks. @reminisce @grwlf I think it would be great if we could get default behavior for Relay, and if the generated gradient's performance isn't sufficient we can hand implement them. @tqchen what do you think about this approach? |
@jroesch , dear all. We made a quick check of AD-Relay compatibility: For every relay operation from the above list, we (a) Look at its Additional notes:
PS We think about writing TVM Python codegen to pretty-print TVM IR code. Does anybody work on it? Legend:
|
While it is great to have a tensor expression gradient support. I recommend we provide the primal gradient in the form of relay operators, at this moment. The main reason is that the relay-> relay transformation and makes it easier to do follow up analysis and transformations in relay, it also makes sure that each op can generate different variants easily(winograd, spatial pack for conv2d). This does not eliminate the value of expression level gradient though, as they could be nice complementary when a user define custom op, and as a topic of research in the long run, if integrated properly with relay |
Expressing gradients in relay would be a good design test. My thoughts regarding this design choice are follows:
|
I am working on adding gradient definition for some level 1/2 operators, see #2633 for details |
I'm interested in helping contribute gradient implementations, but I'm finding it a bit difficult to understand what orientation the original op arguments are in, and what role As an example, by trial and error I arrived at the following for
I'm verifying this by checking gradient values numerically from a toy tensorflow model with a dense layer that I converted. I would not have expected to need the outer Would it be possible to provide a more detailed tutorial about how to translate a known mathematical form of a gradient to a relay implementation, to make it easier for the community to contribute some of these implementations? |
@SWu this is an issue that I've run into as well. I believe the specific documentation issue you ran into is indeed a copy-paste error, which we should fix. Overall though, the documentation is lacking as @jroesch said, and we (who implement more grads) should definitely update it with better descriptions as we work through them. For For example, if We also need to think about the best way to verify correctness of these implementations, since currently the numerical tests in TVM are somewhat arbitrary. Your approach seems solid for ensuring correct behavior with respect to existing frameworks. This problem is more general than just for gradients though, and I think we should have a TVM-wide discussion. As for your last point, I think this would be a good idea. I'll try to type up a tutorial of sorts walking through my implementation of softmax once I'm done with my current work. I don't want to write too much more here (and maybe this is already too much), but hopefully this helped. I'll make a more comprehensive post once the PR is ready. |
closing for now due to inactive status, let us open new thread for new TODOs of gradients |
Relay's automatic differentiation is still missing primal gradients. It would be interesting to integrate with the Tensor level AD at some point, but for the time being we should focus on adding primal gradients. I will open an PR adding to the basic set but we should work towards completion for Relay operators. Those with expertise on the less straight forward gradient computations help would be appreciated.
The gradients should be in C++ and provide tests, see below for complete list.
Level 1
Level 2
Level 3
Level 4
Level 5
Level 10
The text was updated successfully, but these errors were encountered: