-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Training DeepLab on mxnet-mkl #1368
Comments
You can uninstall mkl version mxnet and install the openblas mxnet by |
@zhreshold |
@roy6324 Understand, the suggestion is to make sure non-mkl version works before we continue to locate the root cause. If it's only reproducible in MKL versions, we can ping specialist to handle that situation. Also I think intel's team has added multiple bug fixed in the latest mxnet versions so you have a good chance to bypass it in mxnet 1.7.0 for example |
@zhreshold I've already tried and tested multiple solutions before posting this issue. mxnet works fine, it's just mxnet-mkl that causes the error. |
@xinyu-intel @wuxun-zhang Do you guys happen to know the issue? |
I think mxnet-mkl is no longer being updated for now. Please try to use mxnet directly since mkl-dnn is enabled by default. |
@wuxun-zhang mxnet-mkl is way faster than just mxnet , is there a way to force just mxnet on using mkl-dnn ? |
|
@wuxun-zhang @zhreshold I tried cloning and building from master using mkl-dnn as recommended in the mkldnn readme there. But there's version mismatch between the latest version of gluoncv and mxnet master, mxnet 1.6.0 works fine, but the master version causes some errors when importing gluoncv ( mxnet doesn't have an attribute called metric for example ). |
please try mxnet 1.7.0 rc release:)
Thanks
Xinyu
…________________________________
发件人: Roy Anwar <[email protected]>
发送时间: Tuesday, July 21, 2020 6:09:47 PM
收件人: dmlc/gluon-cv <[email protected]>
抄送: Chen, Xinyu1 <[email protected]>; Mention <[email protected]>
主题: Re: [dmlc/gluon-cv] Error Training DeepLab on mxnet-mkl (#1368)
@wuxun-zhang<https://github.com/wuxun-zhang> @zhreshold<https://github.com/zhreshold> I tried cloning and building from master using mkl-dnn as recommended in the mkldnn readme there. But there's version mismatch between the latest version of gluoncv and mxnet master, mxnet 1.6.0 works fine, but the master version causes some errors when importing gluoncv ( mxnet doesn't have an attribute called metric for example ).
―
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#1368 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AEIRSCAI4XDBVVGT6I4ECLTR4VSOXANCNFSM4OZRF56A>.
|
@wuxun-zhang @xinyu-intel
|
@roy6324 thanks for your try. can you please also try https://github.com/apache/incubator-mxnet/tree/v1.x. If also the same error, I will take a look at this bug. Thanks:) |
@xinyu-intel Tried it , same error. Thanks :) |
ok. we will take a look at it. |
@roy6324 Please try the above fix and see if your problem is resolved. |
@wuxun-zhang @xinyu-intel @zhreshold Thanks for the fix, the problem is solved. |
@zhreshold
I've encountered an error trying to train deeplab on cpu using mxnet-mkl.
Steps to reproduce:
pip3 install mxnet-mkl
python3 gluoncv_test.py --dataset pascal_aug --model-zoo deeplab_resnet101_coco --aux --lr 0.001 --checkname res101 --no-cuda
mxnet.base.MXNetError: [15:37:48] src/ndarray/ndarray.cc:757: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first
The text was updated successfully, but these errors were encountered: