Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Failure of MKL-DNN Convolution from C API #16143

Closed
matteosal opened this issue Sep 11, 2019 · 11 comments · Fixed by #16265
Closed

Failure of MKL-DNN Convolution from C API #16143

matteosal opened this issue Sep 11, 2019 · 11 comments · Fixed by #16265
Assignees

Comments

@matteosal
Copy link
Contributor

matteosal commented Sep 11, 2019

Description

With MKL-DNN, getting the output of a Convolution operator using the C API can trigger this error:

[14:52:08] src/ndarray/ndarray.cc:757: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first

Environment info (Required)

----------Python Info----------
Version      : 3.7.2
Compiler     : GCC 7.3.0
Build        : ('default', 'Dec 29 2018 06:19:36')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.0.1
Directory    : /opt/Anaconda/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.5.0
Directory    : /home/matteo/Git/mxnet/python/mxnet
Commit hash file "/home/matteo/Git/mxnet/python/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library      : ['/home/matteo/Git/mxnet/python/mxnet/../../lib/libmxnet.so']
Build features:
✖ CUDA
✖ CUDNN
✖ NCCL
✖ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✖ OPENMP
✖ SSE
✔ F16C
✔ JEMALLOC
✖ BLAS_OPEN
✔ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✖ LAPACK
✔ MKLDNN
✖ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✖ SIGNAL_HANDLER
✖ DEBUG
----------System Info----------
Platform     : Linux-4.15.0-55-generic-x86_64-with-debian-buster-sid
system       : Linux
node         : mongolius
release      : 4.15.0-55-generic
version      : #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Stepping:            3
CPU MHz:             2700.253
CPU max MHz:         3500,0000
CPU min MHz:         800,0000
BogoMIPS:            5184.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            6144K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0117 sec, LOAD: 0.8935 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0599 sec, LOAD: 2.1901 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1028 sec, LOAD: 0.9832 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0657 sec, LOAD: 1.2597 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0380 sec, LOAD: 0.8543 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0395 sec, LOAD: 0.4625 sec.

Package used: C API

Build info

Compiler: gcc

MXNet commit hash: e87995d

Build config: plain config.mk with USE_OPENCV=0

Error Message:

[15:00:11] src/ndarray/ndarray.cc:757: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first
Stack trace:
  [bt] (0) libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f34bcc3dac3]
  [bt] (1) libmxnet.so(mxnet::NDArray::SetTBlob() const+0x2fc) [0x7f34bf350f4c]
  [bt] (2) libmxnet.so(MXNDArrayGetData+0x2d) [0x7f34bfaa208d]
  [bt] (3) ./tblob(+0xe65) [0x55e8d2164e65]
  [bt] (4) /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f34bc294b97]
  [bt] (5) ./tblob(+0xa3a) [0x55e8d2164a3a]

Minimum reproducible example

#include <stdio.h>

#include "mxnet/c_api.h"
#include "nnvm/c_api.h"

int main() {

  /* Create symbol variables */
  SymbolHandle in_sym;
  SymbolHandle w_sym;
  SymbolHandle b_sym;
  MXSymbolCreateVariable("in", &in_sym);
  MXSymbolCreateVariable("w", &w_sym);
  MXSymbolCreateVariable("b", &b_sym);

  /* Create convolution op */
  OpHandle op;
  NNGetOpHandle("Convolution", &op);
  SymbolHandle sym;
  const char *keys1[2] = {"kernel", "num_filter"};
  const char *vals[2] = {"(1,1)", "40"};
  MXSymbolCreateAtomicSymbol(op, 2, keys1, vals, &sym);

  /* Compose op and variables */
  const char **keys2 = NULL;
  SymbolHandle vars[3] = {in_sym, w_sym, b_sym};

  MXSymbolCompose(sym, "Conv", 3, keys2, vars);

  /* Create NDArrays for arguments */
  int dev_type = 1;
  int dev_id = 0; 

  mx_uint in_shape[4] = {1, 3, 30, 30};
  NDArrayHandle in_arg_arr;
  MXNDArrayCreateEx(in_shape, 4, dev_type, dev_id, 0, 0, &in_arg_arr);
  mx_uint w_shape[4] = {40, 3, 1, 1};
  NDArrayHandle w_arg_arr;
  MXNDArrayCreateEx(w_shape, 4, dev_type, dev_id, 0, 0, &w_arg_arr);
  mx_uint b_shape[1] = {40};
  NDArrayHandle b_arg_arr;
  MXNDArrayCreateEx(b_shape, 1, dev_type, dev_id, 0, 0, &b_arg_arr);

  /* Create and bind executor */
  ExecutorHandle ex;
  NDArrayHandle arg[3] = {in_arg_arr, w_arg_arr, b_arg_arr};
  NDArrayHandle grad[3] = {NULL, NULL, NULL};
  NDArrayHandle *aux = NULL;
  mx_uint req[3] = {1, 1, 1};
  MXExecutorBind(sym, dev_type, dev_id, 3, arg, grad, req, 0, aux, &ex);

  /* Get executor output handle */
  mx_uint out_size;
  NDArrayHandle *out_arr_p;
  MXExecutorOutputs(ex, &out_size, &out_arr_p);
  NDArrayHandle out_arr = *out_arr_p;

  /* Forward */
  MXExecutorForward(ex, 0);

  /* Read output */
  MXNDArrayWaitToRead(out_arr);
  void *data;
  if(MXNDArrayGetData(out_arr, &data) != 0)
	printf("%s\n", MXGetLastError());
  else
    printf("Ok!\n");

  return 0;
}

Steps to reproduce

Running the above standalone C program triggers the mentioned error. The error is not triggered if the output has less than 40 channels, or the if the line MXNDArrayWaitToRead(out_arr); is commented out.
I haven't been able of reproducing this error with the Python interface.

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Feature

@TaoLv
Copy link
Member

TaoLv commented Sep 11, 2019

As suggested by the error message, need call Reorder2Default to change MKL-DNN internal layout to MXNet default layout before touching the data. Is possible to do that with NDArrayHandle?

@matteosal
Copy link
Contributor Author

Looks like the NDArray method Reorder2Default() is not exposed by the C API.
If that's intended, then the user is never supposed to access it and mxnet should use it automatically when needed, so in this case I'd say this is a genuine bug.
Otherwise, if a user is actually requested to access it, it should be exposed.

Or am I missing something?

@TaoLv TaoLv self-assigned this Sep 11, 2019
@TaoLv
Copy link
Member

TaoLv commented Sep 11, 2019

Thank you @matteosal . Sure I will consider your suggestion and I personally tend to hide the layout conversion from frond-end users. Here is a temporal fix. It would be highly appreciated if you can try it in your model and share if you observe any further issue. Thank you.
TaoLv@ffc8459

@matteosal
Copy link
Contributor Author

Yes, that patch fixes the issue. Do you have an approximate ETA for a permanent fix? Thanks!

@matteosal
Copy link
Contributor Author

matteosal commented Sep 12, 2019

I've discovered an example which is not fixed by the above patch. This is time it involves a more complex symbol with multiple ops, and it's not reproduceable by making it simpler. Again, this doesn't happen in python:

#include <stdio.h>

#include "mxnet/c_api.h"
#include "nnvm/c_api.h"

int main() {

  SymbolHandle sym;
  char json[] = 
  "{\"nodes\":[{\"op\":\"null\",\"name\":\"input\",\"inputs\":[]},{\"op\
\":\"null\",\"name\":\"w1\",\"inputs\":[]},{\"op\":\"null\",\"name\":\
\"b1\",\"inputs\":[]},{\"op\":\"Convolution\",\"name\":\"conv1\",\"\
attrs\":{\"cudnn_off\":\"0\",\"dilate\":\"(1, 1)\",\"kernel\":\"(1, \
1)\",\"layout\":\"None\",\"no_bias\":\"False\",\"num_filter\":\"1\",\"\
num_group\":\"1\",\"pad\":\"(0, 0)\",\"stride\":\"(1, \
1)\"},\"inputs\":[[0,0,0],[1,0,0],[2,0,0]]},{\"op\":\"null\",\"name\":\
\"w2\",\"inputs\":[]},{\"op\":\"null\",\"name\":\"b2\",\"inputs\":[]},\
{\"op\":\"Deconvolution\",\"name\":\"deconv\",\"attrs\":{\"dilate\":\"\
(1, 1)\",\"kernel\":\"(1, \
1)\",\"no_bias\":\"False\",\"num_filter\":\"8\",\"num_group\":\"1\",\"\
pad\":\"(0, 0)\",\"stride\":\"(1, \
1)\"},\"inputs\":[[3,0,0],[4,0,0],[5,0,0]]},{\"op\":\"null\",\"name\":\
\"w3\",\"inputs\":[]},{\"op\":\"null\",\"name\":\"b3\",\"inputs\":[]},\
{\"op\":\"Convolution\",\"name\":\"conv2\",\"attrs\":{\"cudnn_off\":\"\
0\",\"dilate\":\"(1, 1)\",\"kernel\":\"(1, \
1)\",\"layout\":\"None\",\"no_bias\":\"False\",\"num_filter\":\"8\",\"\
num_group\":\"1\",\"pad\":\"(0, 0)\",\"stride\":\"(1, \
1)\"},\"inputs\":[[6,0,0],[7,0,0],[8,0,0]]},{\"op\":\"_copy\",\"name\"\
:\"out\",\"inputs\":[[9,0,0]]}],\"arg_nodes\":[0,1,2,4,5,7,8],\"node_\
row_ptr\":[0,1,2,3,4,5,6,7,8,9,10,11],\"heads\":[[10,0,0]],\"attrs\":{\
\"mxnet_version\":[\"int\",10500]}}";
  
  MXSymbolCreateFromJSON(json, &sym);
  
  /* Create NDArrays for arguments */
  int dev_type = 1;
  int dev_id = 0; 

  mx_uint in_shape[4] = {1, 3, 10, 10};
  NDArrayHandle in_arg_arr;
  MXNDArrayCreateEx(in_shape, 4, dev_type, dev_id, 0, 0, &in_arg_arr);
  mx_uint w1_shape[4] = {1, 3, 1, 1};
  NDArrayHandle w1_arg_arr, w1_grad_arr;
  MXNDArrayCreateEx(w1_shape, 4, dev_type, dev_id, 0, 0, &w1_arg_arr);
  MXNDArrayCreateEx(w1_shape, 4, dev_type, dev_id, 0, 0, &w1_grad_arr);
  mx_uint b1_shape[1] = {1};
  NDArrayHandle b1_arg_arr;
  MXNDArrayCreateEx(b1_shape, 1, dev_type, dev_id, 0, 0, &b1_arg_arr);
  mx_uint w2_shape[4] = {1, 8, 1, 1};
  NDArrayHandle w2_arg_arr;
  MXNDArrayCreateEx(w2_shape, 4, dev_type, dev_id, 0, 0, &w2_arg_arr);
  mx_uint b2_shape[1] = {8};
  NDArrayHandle b2_arg_arr;
  MXNDArrayCreateEx(b2_shape, 1, dev_type, dev_id, 0, 0, &b2_arg_arr);
  mx_uint w3_shape[4] = {8, 8, 1, 1};
  NDArrayHandle w3_arg_arr;
  MXNDArrayCreateEx(w3_shape, 4, dev_type, dev_id, 0, 0, &w3_arg_arr);
  mx_uint b3_shape[1] = {8};
  NDArrayHandle b3_arg_arr;
  MXNDArrayCreateEx(b3_shape, 1, dev_type, dev_id, 0, 0, &b3_arg_arr);

  mx_uint outgrad_shape[4] = {1, 8, 10, 10};
  NDArrayHandle outgrad_arr;
  MXNDArrayCreateEx(outgrad_shape, 4, dev_type, dev_id, 0, 0, &outgrad_arr);

  /* Create and bind executor */
  ExecutorHandle ex;
  NDArrayHandle arg[7] = {in_arg_arr, w1_arg_arr, b1_arg_arr, w2_arg_arr, 
    b2_arg_arr, w3_arg_arr, b3_arg_arr};
  NDArrayHandle grad[7] = {NULL, w1_grad_arr, NULL, NULL, NULL, NULL,NULL};
  NDArrayHandle *aux = NULL;
  mx_uint req[7] = {0, 1, 0, 0, 0, 0, 0};
  MXExecutorBind(sym, dev_type, dev_id, 7, arg, grad, req, 0, aux, &ex);
  
  /* Forward, backward */
  NDArrayHandle outgrad_vec[1] = {outgrad_arr};
  MXExecutorForward(ex, 1);
  MXExecutorBackward(ex, 1, outgrad_vec);
  
  /* Read output */
  void *data;
  if(MXNDArrayWaitToRead(w1_grad_arr) != 0)
    printf("%s\n", MXGetLastError());
  else
    printf("Ok!\n");
  return 0;
}

It fails with the same error, but at MXNDArrayWaitToRead instead of MXNDArrayGetData. Complete error message:

[18:21:33] src/ndarray/ndarray.cc:757: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first
Stack trace:
  [bt] (0) libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x43) [0x7f3feefb6ac3]
  [bt] (1) libmxnet.so(mxnet::NDArray::SetTBlob() const+0x2fc) [0x7f3ff16c9f4c]
  [bt] (2) libmxnet.so(mxnet::op::MKLDNNDeconvolutionBackward(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::NDArray, std::allocator<mxnet::NDArray> > const&)+0x5e7) [0x7f3fef0a9b67]
  [bt] (3) libmxnet.so(+0x26f2022) [0x7f3ff10cf022]
  [bt] (4) libmxnet.so(mxnet::exec::FComputeExExecutor::Run(mxnet::RunContext, bool)+0x2d1) [0x7f3ff151f4d1]
  [bt] (5) libmxnet.so(+0x2aff246) [0x7f3ff14dc246]
  [bt] (6) libmxnet.so(+0x2aff33f) [0x7f3ff14dc33f]
  [bt] (7) libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x585) [0x7f3ff1d8cce5]
  [bt] (8) libmxnet.so(std::_Function_handler<void (std::shared_ptr<dmlc::ManualEvent>), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptr<dmlc::ManualEvent>)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<dmlc::ManualEvent>&&)+0x147) [0x7f3ff1da0117]

@TaoLv
Copy link
Member

TaoLv commented Sep 13, 2019

@matteosal Thank you for reporting that and I'm really sorry for the inconvenience. Here is another patch: TaoLv@893c596. Would you mind sharing the Python scripts doing the same thing? I'm trying to understand the differences between C API and Python for this deconv issue.

@matteosal
Copy link
Contributor Author

matteosal commented Sep 14, 2019

This second patch fixes the last example, thanks.
As for the python scripts, this one should match the first C example (except for different grad_req, which doesn't make any difference) and it doesn't fail:

import mxnet as mx

sym = mx.sym.Convolution(
	mx.symbol.Variable('in'),
	mx.symbol.Variable('w'),
	mx.symbol.Variable('b'),
	kernel = (1, 1),
	num_filter = 40
)

exe = sym.bind(mx.cpu(), 
	{
		'in': mx.nd.ones((1, 3, 30, 30)),
		'w': mx.nd.ones((40, 3, 1, 1)),
		'b': mx.nd.ones((40))
	},
	grad_req = 'write'
)
out = exe.outputs[0]
exe.forward()
out.wait_to_read()
print(out.asnumpy().shape)

As for the second example, I've managed to reproduce it with this script:

import mxnet as mx

json = '{\"nodes\":[{\"op\":\"null\",\"name\":\"input\",\"inputs\":[]},{\"op\
\":\"null\",\"name\":\"w1\",\"inputs\":[]},{\"op\":\"null\",\"name\":\
\"b1\",\"inputs\":[]},{\"op\":\"Convolution\",\"name\":\"conv1\",\"\
attrs\":{\"cudnn_off\":\"0\",\"dilate\":\"(1, 1)\",\"kernel\":\"(1, \
1)\",\"layout\":\"None\",\"no_bias\":\"False\",\"num_filter\":\"1\",\"\
num_group\":\"1\",\"pad\":\"(0, 0)\",\"stride\":\"(1, \
1)\"},\"inputs\":[[0,0,0],[1,0,0],[2,0,0]]},{\"op\":\"null\",\"name\":\
\"w2\",\"inputs\":[]},{\"op\":\"null\",\"name\":\"b2\",\"inputs\":[]},\
{\"op\":\"Deconvolution\",\"name\":\"deconv\",\"attrs\":{\"dilate\":\"\
(1, 1)\",\"kernel\":\"(1, \
1)\",\"no_bias\":\"False\",\"num_filter\":\"8\",\"num_group\":\"1\",\"\
pad\":\"(0, 0)\",\"stride\":\"(1, \
1)\"},\"inputs\":[[3,0,0],[4,0,0],[5,0,0]]},{\"op\":\"null\",\"name\":\
\"w3\",\"inputs\":[]},{\"op\":\"null\",\"name\":\"b3\",\"inputs\":[]},\
{\"op\":\"Convolution\",\"name\":\"conv2\",\"attrs\":{\"cudnn_off\":\"\
0\",\"dilate\":\"(1, 1)\",\"kernel\":\"(1, \
1)\",\"layout\":\"None\",\"no_bias\":\"False\",\"num_filter\":\"8\",\"\
num_group\":\"1\",\"pad\":\"(0, 0)\",\"stride\":\"(1, \
1)\"},\"inputs\":[[6,0,0],[7,0,0],[8,0,0]]},{\"op\":\"_copy\",\"name\"\
:\"out\",\"inputs\":[[9,0,0]]}],\"arg_nodes\":[0,1,2,4,5,7,8],\"node_\
row_ptr\":[0,1,2,3,4,5,6,7,8,9,10,11],\"heads\":[[10,0,0]],\"attrs\":{\
\"mxnet_version\":[\"int\",10500]}}'

sym = mx.symbol.load_json(json)
exe = sym.bind(mx.cpu(), 
	{
		'input': mx.nd.ones((1, 3, 10, 10)),
		'w1': mx.nd.ones((1, 3, 1, 1)),
		'b1': mx.nd.ones((1)),
		'w2': mx.nd.ones((1, 8, 1, 1)),
		'b2': mx.nd.ones((8)),
		'w3': mx.nd.ones((8, 8, 1, 1)),
		'b3': mx.nd.ones((8))
	},
	args_grad = {'w1': mx.nd.ones((1, 3, 1, 1))},
	grad_req = {'w1': 'write'}
)
exe.forward(is_train = True)
exe.backward(out_grads = [mx.nd.ones((1, 8, 10, 10))])
grad = exe.grad_arrays[1]
grad.wait_to_read()
print(grad.asnumpy().shape)

This one fails in e87995d, but is fixed by your second patch.

@matteosal
Copy link
Contributor Author

matteosal commented Sep 20, 2019

Any news about this?
Not sure it should be labeled C API anymore, because one 1 example out of 2 (the second above here) can also be reproduced from Python.

@TaoLv
Copy link
Member

TaoLv commented Sep 20, 2019

Sorry for the delay @matteosal . I got trapped by other stuff this week. Will look into the python script and get back to you next week. Thanks for your patience.

@TaoLv
Copy link
Member

TaoLv commented Sep 23, 2019

@marcoabreu Do you have any suggestion about including @matteosal 's demo case as unit tests of MXNet? Where should I put the cpp code?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants