Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Training DeepLab on mxnet-mkl #1368

Closed
roy6324 opened this issue Jul 14, 2020 · 16 comments
Closed

Error Training DeepLab on mxnet-mkl #1368

roy6324 opened this issue Jul 14, 2020 · 16 comments

Comments

@roy6324
Copy link

roy6324 commented Jul 14, 2020

@zhreshold
I've encountered an error trying to train deeplab on cpu using mxnet-mkl.

Steps to reproduce:

  • Installing mxnet-mkl:
    pip3 install mxnet-mkl
  • Using gluoncv's train.py and running the following command :
    python3 gluoncv_test.py --dataset pascal_aug --model-zoo deeplab_resnet101_coco --aux --lr 0.001 --checkname res101 --no-cuda
  • This will cause the following error:
    mxnet.base.MXNetError: [15:37:48] src/ndarray/ndarray.cc:757: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first
@zhreshold
Copy link
Member

You can uninstall mkl version mxnet and install the openblas mxnet by pip3 uninstall mxnet-mkl and pip3 install mxnet

@roy6324
Copy link
Author

roy6324 commented Jul 19, 2020

@zhreshold
Mxnet using MKLDNN is way faster, I was looking for a solution to use MKL with the Deeplab network, not just uninstall it.

@zhreshold
Copy link
Member

@roy6324 Understand, the suggestion is to make sure non-mkl version works before we continue to locate the root cause. If it's only reproducible in MKL versions, we can ping specialist to handle that situation. Also I think intel's team has added multiple bug fixed in the latest mxnet versions so you have a good chance to bypass it in mxnet 1.7.0 for example

@roy6324
Copy link
Author

roy6324 commented Jul 20, 2020

@zhreshold I've already tried and tested multiple solutions before posting this issue. mxnet works fine, it's just mxnet-mkl that causes the error.

@zhreshold zhreshold reopened this Jul 20, 2020
@zhreshold
Copy link
Member

@xinyu-intel @wuxun-zhang Do you guys happen to know the issue?

@wuxun-zhang
Copy link
Collaborator

wuxun-zhang commented Jul 21, 2020

I think mxnet-mkl is no longer being updated for now. Please try to use mxnet directly since mkl-dnn is enabled by default.
Try nightly build pip install --pre mxnet -f https://dist.mxnet.io/python/cpu

@roy6324
Copy link
Author

roy6324 commented Jul 21, 2020

@wuxun-zhang mxnet-mkl is way faster than just mxnet , is there a way to force just mxnet on using mkl-dnn ?

@wuxun-zhang
Copy link
Collaborator

mxnet-mkl means mxnet with mkl-dnn support. Currently, mxnet master is also enabled with mkl-dnn by default. You can try to build from source or use nightly build.

@roy6324
Copy link
Author

roy6324 commented Jul 21, 2020

@wuxun-zhang @zhreshold I tried cloning and building from master using mkl-dnn as recommended in the mkldnn readme there. But there's version mismatch between the latest version of gluoncv and mxnet master, mxnet 1.6.0 works fine, but the master version causes some errors when importing gluoncv ( mxnet doesn't have an attribute called metric for example ).

@xinyu-intel
Copy link
Member

xinyu-intel commented Jul 21, 2020 via email

@roy6324
Copy link
Author

roy6324 commented Jul 21, 2020

@wuxun-zhang @xinyu-intel
I tried branch 1.7.0.rc0 , same error with deeplab network.

MXNetError: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first

@xinyu-intel
Copy link
Member

@roy6324 thanks for your try. can you please also try https://github.com/apache/incubator-mxnet/tree/v1.x. If also the same error, I will take a look at this bug. Thanks:)

@roy6324
Copy link
Author

roy6324 commented Jul 21, 2020

@xinyu-intel Tried it , same error.
MXNetError: Check failed: !IsMKLDNNData(): We can't generate TBlob for MKLDNN data. Please use Reorder2Default() to generate a new NDArray first

Thanks :)

@xinyu-intel
Copy link
Member

ok. we will take a look at it.

@wuxun-zhang
Copy link
Collaborator

@roy6324 Please try the above fix and see if your problem is resolved.

@roy6324
Copy link
Author

roy6324 commented Jul 27, 2020

@wuxun-zhang @xinyu-intel @zhreshold Thanks for the fix, the problem is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants