Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Out of bounds access in req vector in MKLDNN #15992

Closed
larroy opened this issue Aug 24, 2019 · 13 comments
Closed

Out of bounds access in req vector in MKLDNN #15992

larroy opened this issue Aug 24, 2019 · 13 comments

Comments

@larroy
Copy link
Contributor

larroy commented Aug 24, 2019

Request vector (SetupOpExec) might have only one element. (OpExecutor)

Description

req vector access is out of bounds. I wonder why this is not catch by CI.

Environment info (Required)

commit fbdfc78cabdeef341708514c6370db43b46fe31f (HEAD -> master, origin/master, origin/HEAD)
Author: Pedro Larroy <[email protected]>
Date:   Wed Aug 21 14:12:49 2019 -0700

    Add option to choose between OMP implementations
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:            1
CPU MHz:             1831.720
CPU max MHz:         3000.0000
CPU min MHz:         1200.0000
BogoMIPS:            4600.06
Hypervisor vendor:   Xen
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0-15,32-47
NUMA node1 CPU(s):   16-31,48-63
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida
----------Python Info----------
Version      : 3.6.8
Compiler     : GCC 8.0.1 20180414 (experimental) [trunk revision 259383
Build        : ('default', 'Jan 14 2019 11:02:34')
Arch         : ('64bit', 'ELF')
------------Pip Info-----------
Version      : 19.2.2
Directory    : /home/piotr/mxnet/py3_venv/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /home/piotr/mxnet/python/mxnet
Commit hash file "/home/piotr/mxnet/python/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library      : ['/home/piotr/mxnet/python/mxnet/../../build/libmxnet.so']
Build features:
✖ CUDA
✖ CUDNN
✖ NCCL
✖ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✔ F16C
✔ JEMALLOC
✔ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✔ DEBUG
✖ TVM_OP
----------System Info----------
Platform     : Linux-4.15.0-1045-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-21-122
release      : 4.15.0-1045-aws
version      : #47-Ubuntu SMP Fri Aug 2 13:50:30 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0009 sec, LOAD: 0.5070 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0004 sec, LOAD: 0.0661 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0004 sec, LOAD: 0.0474 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0003 sec, LOAD: 0.0720 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0003 sec, LOAD: 0.1958 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0003 sec, LOAD: 0.0469 sec.

183|       }
184|     }
185|
186|     fwd_.reset(new MKLDNNFullyConnectedForward(full_param_, ctx.is_train, data, weight,
187|       (has_bias ? &cached_bias_ : nullptr), out_md));
188|     initialized_ = true;
189|   }
190|   std::vector<NDArray> new_inputs;
191|   std::vector<OpReqType> new_req;
192|   if (has_bias) {
193|     new_inputs = {data, weight, cached_bias_};
194+>    new_req = {req[fullc::kData], req[fullc::kWeight], req[fullc::kBias]};
195|   } else {
196|     new_inputs = {data, weight};
197|     new_req = {req[fullc::kData], req[fullc::kWeight]};
198|   }
199|
200|   MKLDNNFCForwardFullFeature(full_param_, ctx, fwd_.get(), new_inputs, new_req, out_data);
201|
202|   if (mkldnn_param.quantized && !mkldnn_param.enable_float_output) {
203|     float *min_output_ptr = out_data[quantized_fullc::kOutMin].data().dptr<float>();
204|     float *max_output_ptr = out_data[quantized_fullc::kOutMax].data().dptr<float>();
205|     *min_output_ptr = cached_min_output_;
/home/piotr/mxnet/src/operator/subgraph/mkldnn/mkldnn_fc.cc
(gdb) down
#3  0x00007f58e8bd9a89 in std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> >::operator[] (this=0x2732408, __n=1) at /usr/include/c++/7/bits/stl_vector.h:815
(gdb) up
#4  0x00007f58eb10e62f in mxnet::op::SgMKLDNNFCOp::Forward (this=0x2763b50, ctx=..., in_data=std::vector of length 3, capacity 4 = {...}, req=std::vector of length 1, capacity 1 = {...}, out_data=st
d::vector of length 1, capacity 1 = {...}) at ../src/operator/subgraph/mkldnn/mkldnn_fc.cc:194
(gdb) p req
$4 = std::vector of length 1, capacity 1 = {mxnet::kWriteTo}
(gdb) p req.size
$5 = {std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> >::size_type (const std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> > * const)} 0x7f58e8a73d80
     <std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> >::size() const>
(gdb) bt 5
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f59943ae801 in __GI_abort () at abort.c:79
#2  0x00007f58e8a44e48 in std::__replacement_assert (__file=0x7f58ed5762a0 "/usr/include/c++/7/bits/stl_vector.h", __line=815, __function=0x7f58ed5775e0 <std::vector<mxnet::OpReqType, std::allocator
<mxnet::OpReqType> >::operator[](unsigned long) const::__PRETTY_FUNCTION__> "std::vector<_Tp, _Alloc>::const_reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) const
 [with _Tp = mxnet::OpReqType; _Alloc = std::allocator<mxnet::OpReqType>; std::"..., __condition=0x7f58ed576270 "__builtin_expect(__n < this->size(), true)") at /usr/include/x86_64-linux-gnu/c++/7/b
its/c++config.h:472
#3  0x00007f58e8bd9a89 in std::vector<mxnet::OpReqType, std::allocator<mxnet::OpReqType> >::operator[] (this=0x2732408, __n=1) at /usr/include/c++/7/bits/stl_vector.h:815
#4  0x00007f58eb10e62f in mxnet::op::SgMKLDNNFCOp::Forward (this=0x2763b50, ctx=..., in_data=std::vector of length 3, capacity 4 = {...}, req=std::vector of length 1, capacity 1 = {...}, out_data=st
d::vector of length 1, capacity 1 = {...}) at ../src/operator/subgraph/mkldnn/mkldnn_fc.cc:194
(More stack frames follow...)
(gdb)
 ip-172-311:cgdb* 2:fish- 3:vi  4:vi

Repro

./dev_menu.py build
source py3_venv/bin/activate.fish
ulimit -c unlimited
nosetests -v -s --with-timer --timer-ok 1 --timer-warning 15 --timer-filter warning,error --with-xunit --xunit-file nosetests_unittest.xml tests/python/unittest/test_executor.py:test_reshape
cgdb (which python) core
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@larroy
Copy link
Contributor Author

larroy commented Aug 24, 2019

@pengzhao-intel @TaoLv

@larroy
Copy link
Contributor Author

larroy commented Aug 24, 2019

@mxnet-label-bot add [Bug, MKL]

@larroy
Copy link
Contributor Author

larroy commented Aug 24, 2019

@mxnet-label-bot add [breaking]

@pengzhao-intel
Copy link
Contributor

@ciyongch @wuxun-zhang please help take a look for this bug.

@ciyongch
Copy link
Contributor

Sure, I will take a look at this.

@ciyongch
Copy link
Contributor

@larroy I've created a PR #16000 to fix this bug. The reason of CI not catching this is: STL vector dosen't check for [] operator, while at() does.
I checked the new_req in the code, and seems it's not necessary, so it could be removed and use original req instead.

@larroy
Copy link
Contributor Author

larroy commented Aug 25, 2019

Thanks for looking into this. With cmake we are adding range check for stl containers in debug mode which also checks in operator []. See h
-D_GLIBCXX_ASSERTIONS define in cmake build.

@ciyongch
Copy link
Contributor

@larroy Thanks for point out this. Does the current CI enable debug mode?

@apeforest
Copy link
Contributor

Issue fixed by #16000

@larroy
Copy link
Contributor Author

larroy commented Aug 26, 2019

Seems that we are not building and testing with STL range checks and MKLDNN enabled, otherwise this would be caught by CI.

@ciyongch
Copy link
Contributor

@larroy Yeah, that make sense for not catching such error in current CI. So it's better to enable this check in CI in the future, right?

@larroy
Copy link
Contributor Author

larroy commented Aug 27, 2019

Yes indeed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants