Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

argmax causes python VM to crash #9118

Closed
marfago opened this issue Dec 18, 2017 · 7 comments
Closed

argmax causes python VM to crash #9118

marfago opened this issue Dec 18, 2017 · 7 comments

Comments

@marfago
Copy link

marfago commented Dec 18, 2017

The code

>>> import mxnet.ndarray as nd
>>> A=nd.array([0,1,2])
>>> nd.argmax(A)

causes python to crash with

[22:56:20] /Users/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [22:56:20] src/operator/tensor/./broadcast_reduce_op.h:395: Global reduction not supported yet

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x0000000109af78d8 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x0000000109af5499 _ZN4dmlc15LogMessageFatalD1Ev + 9
[bt] (2) 2   libmxnet.so                         0x0000000109c4e1f5 _ZN5mxnet2op17SearchAxisComputeIN7mshadow3cpuENS2_3red7maximumEEEvRKN4nnvm9NodeAttrsERKNS_9OpContextERKNSt3__16vectorINS_5TBlobENSD_9allocatorISF_EEEERKNSE_INS_9OpReqTypeENSG_ISL_EEEESK_ + 1941
@aodhan-domhnaill
Copy link

Reproducable in version 0.11.1b20170929),

$ pip list | grep mxnet
mxnet (0.11.1b20170929)
$ python
Python 3.6.3 (default, Oct  4 2017, 06:09:15) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mxnet import nd
>>> A=nd.array([0,1,2])
>>> nd.argmax(A)
[10:40:37] /Users/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [10:40:37] src/operator/tensor/./broadcast_reduce_op.h:352: Global reduction not supported yet

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x000000010d0f7408 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x000000010d0f4fc9 _ZN4dmlc15LogMessageFatalD1Ev + 9
[bt] (2) 2   libmxnet.so 

And in version 1.0.0.post1,

$ pip list | grep mxnet
mxnet (1.0.0.post1)
$ python
Python 3.6.3 (default, Oct  4 2017, 06:09:15) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mxnet import nd
>>> A=nd.array([0,1,2])
>>> nd.argmax(A)
[10:45:02] /Users/travis/build/dmlc/mxnet-distro/mxnet-build/dmlc-core/include/dmlc/logging.h:308: [10:45:02] src/operator/tensor/./broadcast_reduce_op.h:396: Global reduction not supported yet

Stack trace returned 10 entries:
[bt] (0) 0   libmxnet.so                         0x0000000105b40898 _ZN4dmlc15LogMessageFatalD2Ev + 40
[bt] (1) 1   libmxnet.so                         0x0000000105b3e599 _ZN4dmlc15LogMessageFatalD1Ev + 9
[bt] (2) 2   libmxnet.so

Looking at the specified line of code broadcast_reduce_op.h:396,

  if (!param.axis) LOG(FATAL) << "Global reduction not supported yet";

It hints that including axis works, which is a pretty stupid error.

$ python
Python 3.6.3 (default, Oct  4 2017, 06:09:15) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from mxnet import nd
>>> A=nd.array([0,1,2])
>>> nd.argmax(A, axis=0)

[ 2.]
<NDArray 1 @cpu(0)>

@nswamy
Copy link
Member

nswamy commented Mar 20, 2018

tested on 1.1 and confirm that it still crashes

@anirudh2290
Copy link
Member

@nswamy exception handling change went after 1.1 . can you check with master ?

@anirudh2290
Copy link
Member

Python 2.7.12 (default, Nov 20 2017, 18:23:56)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet as mx
>>> A=mx.nd.array([0,1,2])
>>> A

[ 0.  1.  2.]
<NDArray 3 @cpu(0)>
>>> mx.nd.argmax(A)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ubuntu/sparse_support/mxnet/python/mxnet/ndarray/ndarray.py", line 189, in __repr__
    return '\n%s\n<%s %s @%s>' % (str(self.asnumpy()),
  File "/home/ubuntu/sparse_support/mxnet/python/mxnet/ndarray/ndarray.py", line 1826, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/home/ubuntu/sparse_support/mxnet/python/mxnet/base.py", line 149, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [22:31:57] ../src/operator/tensor/./broadcast_reduce_op.h:392: Global reduction not supported yet

Stack trace returned 10 entries:
[bt] (0) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x54) [0x7fbf70eb4337]
[bt] (1) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x2a) [0x7fbf70eb461e]
[bt] (2) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(void mxnet::op::SearchAxisCompute<mshadow::cpu, mshadow::red::maximum>(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0xcc) [0x7fbf74e57bb9]
[bt] (3) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&), void (*)(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)>::_M_invoke(std::_Any_data const&, nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)+0x91) [0x7fbf70fe8b6c]
[bt] (4) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)>::operator()(nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&) const+0xa6) [0x7fbf755a6f6c]
[bt] (5) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0x203) [0x7fbf756f949d]
[bt] (6) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x44) [0x7fbf7570191c]
[bt] (7) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(std::function<void (mxnet::RunContext)>::operator()(mxnet::RunContext) const+0x56) [0x7fbf754b978c]
[bt] (8) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(+0x63c0ec9) [0x7fbf754d5ec9]
[bt] (9) /home/ubuntu/sparse_support/mxnet/python/mxnet/../../build/libmxnet.so(+0x63c2058) [0x7fbf754d7058]


>>> exit()

@nswamy
Copy link
Member

nswamy commented Mar 20, 2018

@marfago closing this ticket as @anirudh2290 fixed the issue and the VM does not crash anymore.

@nswamy nswamy closed this as completed Mar 20, 2018
@laszukdawid
Copy link

@nswamy @anirudh2290 what's the commit sha? I'd like to follow this ticket.

@anirudh2290
Copy link
Member

@laszukdawid Please see: #9681

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants