You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
The example/gluon/image_classification.py failed to run on CPU machine when set as "symbolic" mode,
i.e., when the command is like below:
python image_classification.py --model=alexnet --mode=symbolic --dataset=dummy
Once model is NOT set as "symbolic", it can be executed smoothly.
Please kindly note: Build w.o. MKLDNN has no issue. (Build command: make -j($nproc) USE_OPENCV=1 USE_BLAS=openblas)
Environment info (Required)
CentOS-7.2
What to do:
Run python image_classification.py --model=alexnet **_--mode=symbolic_** --dataset=dummy
Build config:
Build with or w.o. MKLDNN will all trigger this issue.
make -j($nproc) USE_MKLDNN=1 USE_OPENCV=1 USE_BLAS=mkl
Error Message:
Traceback (most recent call last):
File "image_classification.py", line 290, in
main()
File "image_classification.py", line 269, in main
initializer = mx.init.Xavier(magnitude=2))
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/module/base_module.py", line 520, in fit
self.update_metric(eval_metric, data_batch.label)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/module/module.py", line 757, in update_metric
self.exec_group.update_metric(eval_metric, labels)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/module/executor_group.py", line 616, in update_metric
eval_metric.update_dict(labels, preds)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/metric.py", line 132, in update_dict
self.update(label, pred)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/metric.py", line 418, in update
pred_label = pred_label.asnumpy().astype('int32')
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/ndarray/ndarray.py", line 1876, in asnumpy
ctypes.c_size_t(data.size)))
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [07:30:00] src/operator/tensor/./././elemwise_unary_op.h:302: Check failed: inputs[0].dptr == outputs[0].dptr_ (0x7ff485a590c0 vs. 0x7ff485b79100)
Able to reproduce on Ubuntu with gcc 5.4.0 and MacOS 10.12.6 with Apple LLVM version 9.0.0 (clang-900.0.38). Tested with pip installed packages. However, different from @juliusshufan observation (built from source I guess), only mxnet-mkl pip package will cause this issue.
Installed packages using: pip install mxnet --pre pip install mxnet-mkl --pre
@roywei Thanks for your comments and double check, I retry the non-MKLDNN build from source, and GPU build, looks like the symbolic working well. I think this aligned with your observation.
Sorry for the confusion. I modify the issue description accordingly.
@roywei I tried the PR from @zheng-da#10651, and which seems like addressing similar issue, and can solve the issue I reported as well. @pengzhao-intel
As the corresponding PR pended to merge, can we keep this issue open?
Description
The example/gluon/image_classification.py failed to run on CPU machine when set as "symbolic" mode,
i.e., when the command is like below:
python image_classification.py --model=alexnet --mode=symbolic --dataset=dummy
Once model is NOT set as "symbolic", it can be executed smoothly.
Please kindly note: Build w.o. MKLDNN has no issue. (Build command: make -j($nproc) USE_OPENCV=1 USE_BLAS=openblas)
Environment info (Required)
CentOS-7.2
Package used (Python/R/Scala/Julia):
Python
Build info (Required if built from source)
GCC 4.8.5
MXNet commit hash:
9f8f042
Build config:
Build with or w.o. MKLDNN will all trigger this issue.
make -j($nproc) USE_MKLDNN=1 USE_OPENCV=1 USE_BLAS=mkl
Error Message:
Traceback (most recent call last):
File "image_classification.py", line 290, in
main()
File "image_classification.py", line 269, in main
initializer = mx.init.Xavier(magnitude=2))
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/module/base_module.py", line 520, in fit
self.update_metric(eval_metric, data_batch.label)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/module/module.py", line 757, in update_metric
self.exec_group.update_metric(eval_metric, labels)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/module/executor_group.py", line 616, in update_metric
eval_metric.update_dict(labels, preds)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/metric.py", line 132, in update_dict
self.update(label, pred)
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/metric.py", line 418, in update
pred_label = pred_label.asnumpy().astype('int32')
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/ndarray/ndarray.py", line 1876, in asnumpy
ctypes.c_size_t(data.size)))
File "/ec/fm/disks/nrv_algo_home01/shufanwu/pythonenv/py2.7_1/lib/python3.4/site-packages/mxnet-1.2.0-py3.4.egg/mxnet/base.py", line 149, in check_call
raise MXNetError(py_str(LIB.MXGetLastError()))
mxnet.base.MXNetError: [07:30:00] src/operator/tensor/./././elemwise_unary_op.h:302: Check failed: inputs[0].dptr == outputs[0].dptr_ (0x7ff485a590c0 vs. 0x7ff485b79100)
Stack trace returned 10 entries:
[bt] (0) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(dmlc::StackTrace()+0x3f) [0x7ff591c05edf]
[bt] (1) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x21) [0x7ff591c062b1]
[bt] (2) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(void mxnet::op::UnaryOp::IdentityComputemshadow::cpu(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&, std::vector<mxnet::OpReqType, std::allocatormxnet::OpReqType > const&, std::vector<mxnet::TBlob, std::allocatormxnet::TBlob > const&)+0x940) [0x7ff59238b6f0]
[bt] (3) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(mxnet::exec::FComputeExecutor::Run(mxnet::RunContext, bool)+0xe9) [0x7ff5945baaf9]
[bt] (4) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(+0x2f40023) [0x7ff594584023]
[bt] (5) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(mxnet::engine::ThreadedEngine::ExecuteOprBlock(mxnet::RunContext, mxnet::engine::OprBlock*)+0x589) [0x7ff5945142a9]
[bt] (6) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(std::_Function_handler<void (std::shared_ptrdmlc::ManualEvent), mxnet::engine::ThreadedEnginePerDevice::PushToExecute(mxnet::engine::OprBlock*, bool)::{lambda()#1}::operator()() const::{lambda(std::shared_ptrdmlc::ManualEvent)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptrdmlc::ManualEvent)+0x92) [0x7ff594524e62]
[bt] (7) /nfs/site/home/shufanwu/workspace/mxnet/v2/lib/libmxnet.so(std::thread::_Impl<std::_Bind_simple<std::function<void (std::shared_ptrdmlc::ManualEvent)> (std::shared_ptrdmlc::ManualEvent)> >::_M_run()+0x44) [0x7ff594521e54]
[bt] (8) /lib64/libstdc++.so.6(+0xb52b0) [0x7ff636eb22b0]
[bt] (9) /lib64/libpthread.so.0(+0x7e25) [0x7ff64372ae25]
The text was updated successfully, but these errors were encountered: