-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Wrong gradients with C API? #20293
Comments
This seems a problem specific with the |
Any help on this? |
Analysis: The issue is solved mainly by comparing
Result: The issue is solved!
Further investigation: This result shows CloneGradient is the most possible reason that causes this issue. So the remaining check is
It is possible that "_copy" operation intrigues this issue in CloneGradient. This helps understand why it fails.
It is possible they have the similar bug. |
@KexinFeng thanks for looking into this!
There must be other factors besides this because the equivalent python example in my original report works fine and it also has the |
I see.
Actually I notice a discrepancy between the python call and cpp call. In python code, it calls Update /* Create cached op */ I'm wondering if dynamic allocation is applicable in your usage? |
Unfortunately dynamic allocation is not an option for me, at least for now. Anyway, I have verified that Thanks for submitting the PR. |
@szha @leezu @sandeep-krishnamurthy Bringing this issue to your attention - this issue is most probably a bug in the handling of reqs in |
I don't think the root cause is in CachedOp. As I was debugging this issue, the elemwise_add is using CloneGradient, which means copy ograds multiple times for the inputs. For cached_op, if the static_alloc is on, then it will construct backward graph with grad_graph outputs From my point of view, the solution of this bug is to change the elemwise_add gradient function to this
@KexinFeng FYI |
The graph is exactly the same, whether you have static or dynamic alloc. The problem I believe lies here: especially lines 992 and 1002 - it sets the reqs for The proper fix therefore would be to change the logic in the |
Also, you definitely do not want to make gradient of add be |
The following program performs a forward + backward pass on the sum of two length-1 vectors (via the
elemwise_add
operator)It behaves as expected, producing
Switching off the gradient request for the second input, i.e. defining the grad_req arrays as
also behaves as expected, producing 0 for
GRAD 2
(gradient arrays are initialized with 0):But switching off the first gradient with
seems to switch off both
GRAD 1
andGRAD 2
!I must be doing something wrong here, because the equivalent python example works just fine with gradient requests 0 and 1 (
'null'
and'write'
):Can anyone spot the problem with the C program with gradient requests 0 and 1?
The text was updated successfully, but these errors were encountered: