-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Failure of MKL-DNN Convolution from C API #16143
Comments
Hey, this is the MXNet Label Bot. |
As suggested by the error message, need call Reorder2Default to change MKL-DNN internal layout to MXNet default layout before touching the data. Is possible to do that with NDArrayHandle? |
Looks like the NDArray method Or am I missing something? |
Thank you @matteosal . Sure I will consider your suggestion and I personally tend to hide the layout conversion from frond-end users. Here is a temporal fix. It would be highly appreciated if you can try it in your model and share if you observe any further issue. Thank you. |
Yes, that patch fixes the issue. Do you have an approximate ETA for a permanent fix? Thanks! |
I've discovered an example which is not fixed by the above patch. This is time it involves a more complex symbol with multiple ops, and it's not reproduceable by making it simpler. Again, this doesn't happen in python:
It fails with the same error, but at
|
@matteosal Thank you for reporting that and I'm really sorry for the inconvenience. Here is another patch: TaoLv@893c596. Would you mind sharing the Python scripts doing the same thing? I'm trying to understand the differences between C API and Python for this deconv issue. |
This second patch fixes the last example, thanks.
As for the second example, I've managed to reproduce it with this script:
This one fails in e87995d, but is fixed by your second patch. |
Any news about this? |
Sorry for the delay @matteosal . I got trapped by other stuff this week. Will look into the python script and get back to you next week. Thanks for your patience. |
@marcoabreu Do you have any suggestion about including @matteosal 's demo case as unit tests of MXNet? Where should I put the cpp code? |
Description
With MKL-DNN, getting the output of a Convolution operator using the C API can trigger this error:
Environment info (Required)
Package used: C API
Build info
Compiler: gcc
MXNet commit hash: e87995d
Build config: plain config.mk with
USE_OPENCV=0
Error Message:
Minimum reproducible example
Steps to reproduce
Running the above standalone C program triggers the mentioned error. The error is not triggered if the output has less than 40 channels, or the if the line
MXNDArrayWaitToRead(out_arr);
is commented out.I haven't been able of reproducing this error with the Python interface.
The text was updated successfully, but these errors were encountered: