Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Segfault on ndarray with negative dimension i.e. mxnet.nx.zeros((-1,)) #9166

Closed
tsutton opened this issue Dec 21, 2017 · 7 comments · Fixed by #14362
Closed

Segfault on ndarray with negative dimension i.e. mxnet.nx.zeros((-1,)) #9166

tsutton opened this issue Dec 21, 2017 · 7 comments · Fixed by #14362

Comments

@tsutton
Copy link

tsutton commented Dec 21, 2017

Description

When trying to create an ndarray with a negative size in some dimension, I get a segmentation fault or bad_alloc error. I would have expected an exception with a useful message instead. (When executing in a python terminal, I get the bad_alloc error; when I'm executing in Jupyter notebook it gives a segfault).

Environment info (Required)

----------Python Info----------
Version : 3.6.3
Compiler : GCC 7.2.0
Build : ('default', 'Oct 3 2017 21:45:48')
Arch : ('64bit', 'ELF')
------------Pip Info-----------
Version : 9.0.1
Directory : /usr/lib/python3/dist-packages/pip
----------MXNet Info-----------
Version : 1.0.0
Directory : /usr/local/lib/python3.6/dist-packages/mxnet
Commit Hash : 0f05c65
----------System Info----------
Platform : Linux-4.13.0-19-generic-x86_64-with-Ubuntu-17.10-artful
system : Linux
node : adams
release : 4.13.0-19-generic
version : #22-Ubuntu SMP Mon Dec 4 11:58:07 UTC 2017
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 78
Model name: Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
Stepping: 3
CPU MHz: 2800.000
CPU max MHz: 3400.0000
CPU min MHz: 400.0000
BogoMIPS: 5616.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
NUMA node0 CPU(s): 0-3
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0007 sec, LOAD: 0.6986 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0022 sec, LOAD: 0.0343 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0030 sec, LOAD: 0.0688 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0008 sec, LOAD: 0.2045 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0326 sec, LOAD: 0.0717 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0013 sec, LOAD: 0.0765 sec.

Build info (Required if built from source)

Downloaded via "pip3 install mxnet".

Minimum reproducible example

Python 3.6.3 (default, Oct  3 2017, 21:45:48) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from mxnet import nd
>>> nd.zeros((-1,))
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)
@piiswrong piiswrong added the Bug label Jan 19, 2018
@Vikas-kum
Copy link
Contributor

Is there any use case of trying to create an NDArray of negative size ?

@apeforest
Copy link
Contributor

@Vikas89 shape -1 means MXNet will automatically infer the shape.

@apeforest
Copy link
Contributor

@nswamy Please add label [NDArray] Thanks

@anirudhacharya
Copy link
Member

@apeforest this is not a bug but a case of improper exception handling. Please update the labels.

@apeforest apeforest added the Bug label Mar 6, 2019
@apeforest
Copy link
Contributor

@anirudhacharya This is causing a core dump. I think it's a bug that needs to be fixed.

@anirudhacharya
Copy link
Member

anirudhacharya commented Mar 6, 2019

with the latest version of mxnet, the above command neither causes core dump or Segmentation fault.
When I try to pass -1 as shape in python2 and python3, I get the following stack trace.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/ndarray/utils.py", line 67, in zeros
    return _zeros_ndarray(shape, ctx, dtype, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/ndarray/ndarray.py", line 3822, in zeros
    return _internal._zeros(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
  File "<string>", line 34, in _zeros
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/base.py", line 252, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [21:21:58] src/storage/./cpu_device_storage.h:74: Failed to allocate CPU Memory

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(dmlc::StackTrace[abi:cxx11]()+0x1bc) [0x7f761aea978c]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f761aeaab08]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(+0x396d71d) [0x7f761e26e71d]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(mxnet::storage::NaiveStorageManager<mxnet::storage::CPUDeviceStorage>::Alloc(mxnet::Storage::Handle*)+0xd) [0x7f761e26e74d]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(mxnet::StorageImpl::Alloc(mxnet::Storage::Handle*)+0x5b) [0x7f761e269efb]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(mxnet::NDArray::CheckAndAlloc() const+0x98d) [0x7f761aeab59d]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}::operator()(mxnet::RunContext) const+0xd88) [0x7f761db1a308]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(std::_Function_handler<void (mxnet::RunContext), mxnet::imperative::PushFCompute(std::function<void (nnvm::NodeAttrs const&, mxnet::OpContext const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&, std::vector<mxnet::TBlob, std::allocator<mxnet::TBlob> > const&)> const&, nnvm::Op const*, nnvm::NodeAttrs const&, mxnet::Context const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::Resource, std::allocator<mxnet::Resource> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, std::allocator<mxnet::NDArray*> > const&, std::vector<unsigned int, std::allocator<unsigned int> > const&, std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > const&)::{lambda(mxnet::RunContext)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&)+0x17) [0x7f761db1a9c7]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(std::_Function_handler<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete), mxnet::Engine::PushSync(std::function<void (mxnet::RunContext)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*)::{lambda(mxnet::RunContext, mxnet::engine::CallbackOnComplete)#1}>::_M_invoke(std::_Any_data const&, mxnet::RunContext&&, mxnet::engine::CallbackOnComplete&&)+0x5e) [0x7f761da9eb8e]
[bt] (9) /usr/local/lib/python2.7/dist-packages/mxnet-1.5.0-py2.7.egg/mxnet/libmxnet.so(mxnet::engine::NaiveEngine::PushAsync(std::function<void (mxnet::RunContext, mxnet::engine::CallbackOnComplete)>, mxnet::Context, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, std::vector<mxnet::engine::Var*, std::allocator<mxnet::engine::Var*> > const&, mxnet::FnProperty, int, char const*, bool)+0x20c) [0x7f761e24393c]

Based on this, I think this issue can be closed.

@apeforest
Copy link
Contributor

@anirudhacharya Thanks for checking this. Could you please help to update the exception with more meaningful message? Ideally, the user should know that negative number should not be used as dimension in this command.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants