-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Check failed: format != mkl_mem_->GetFormat() (5 vs. 5) #10809
Comments
@dwSun Thanks for reporting this. I will take a look and be back to you soon. |
@dwSun, Tao's PR is merged. |
tested with mxnet-mkl-1.2.0b20180508, fashion.py in this issue works well.
I am using ubuntu18.04 and cuda9.1 installed with: sudo aptitude install nvidia-cuda-toolkit --without-recommends should I start a new issue? |
@sandeep-krishnamurthy could you help to add label MKL? Thanks |
@dwSun I suggest starting a new thread. From the log, don't see anything related MKL-DNN. |
tested with mxnet-cu91mkl (1.2.0) from pypi, fashion.py in this issue works well. [23:18:18] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 28672 bytes with malloc directly
[23:18:18] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly
[23:18:18] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 86016 bytes with malloc directly
[23:18:19] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 28672 bytes with malloc directly
[23:18:19] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 4096 bytes with malloc directly
[23:18:19] src/operator/nn/mkldnn/mkldnn_base.cc:60: Allocate 86016 bytes with malloc directly
Traceback (most recent call last):
File "fashion.py", line 71, in <module>
valid_loss = cumulative_valid_loss.asscalar()/valid_samples
File "/home/david/.local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1894, in asscalar
return self.asnumpy()[0]
File "/home/david/.local/lib/python3.6/site-packages/mxnet/ndarray/ndarray.py", line 1876, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/david/.local/lib/python3.6/site-packages/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [23:18:19] src/ndarray/ndarray.cc:721: Check failed: !IsMKLDNNData() We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first
Stack trace returned 10 entries:
[bt] (0) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17ec9d) [0x7fcb3104dc9d]
[bt] (1) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x17f068) [0x7fcb3104e068]
[bt] (2) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x293d945) [0x7fcb3380c945]
[bt] (3) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3e7b73) [0x7fcb312b6b73]
[bt] (4) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x3e9cd8) [0x7fcb312b8cd8]
[bt] (5) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x28174e6) [0x7fcb336e64e6]
[bt] (6) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27a3ac2) [0x7fcb33672ac2]
[bt] (7) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27a3ac2) [0x7fcb33672ac2]
[bt] (8) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27a3ac2) [0x7fcb33672ac2]
[bt] (9) /home/david/.local/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x27a3ac2) [0x7fcb33672ac2] This issue is weird... |
fashion.py works well with master branch which is built with: |
@dwSun I tried mxnet-mkl 1.2.0 and seems it indeed has some issues there (but error msg on my side is not as same as yours). PR #11212 is trying to push some mkldnn related fixes into 1.2.0 branch and I find fashion.py works well with code of #11212. Hope it can be merged soon and you try it out then. Sorry for the inconvenient. |
The fix is merged. I think this bug can be closed. @dwSun |
Description
Crashed when training a model.
With code from this tutorial, I try to train my own model with MobileNetV2. But it crashed with mxnet-mkl-1.2.0b20180503 from pypi.
On mxnet-mkl-1.1.0 from pypi, this code works.
Batch size 32 and 16 can reproduce this error, others like 8 or 32 seems can't. Smaller network can't reproduce this error.
Not sure this error related to pr #10317 or not.
And maybe this is a same error like issue #10807.
Environment info (Required)
This is the code
crash.zip
Run with
Package used (Python/R/Scala/Julia):
Error Message:
The text was updated successfully, but these errors were encountered: