Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Image augumention crash when set MXNET_CPU_WORKER_NTHREADS bigger than 3 #9520

Open
yuantangliang opened this issue Jan 22, 2018 · 2 comments
Labels

Comments

@yuantangliang
Copy link

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

Image augumention crash when set MXNET_CPU_WORKER_NTHREADS bigger than 3

Environment info (Required)

----------Python Info----------
('Version :', '2.7.6')
('Compiler :', 'GCC 4.8.4')
('Build :', ('default', 'Oct 26 2016 20:30:19'))
('Arch :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version :', '9.0.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version :', '0.12.1')
('Directory :', '/usr/local/lib/python2.7/dist-packages/mxnet')
('Commit Hash :', 'e0c7906693f0c79b0ce34a4d777c26a6bf1903c1')
----------System Info----------
('Platform :', 'Linux-4.4.0-64-generic-x86_64-with-Ubuntu-14.04-trusty')
('system :', 'Linux')
('node :', 'meter')
('release :', '4.4.0-64-generic')
('version :', '#85~14.04.1-Ubuntu SMP Mon Feb 20 12:10:54 UTC 2017')
----------Hardware Info----------
('machine :', 'x86_64')
('processor :', 'x86_64')
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Stepping: 3
CPU MHz: 4200.000
BogoMIPS: 8016.71
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0072 sec, LOAD: 1.6354 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0060 sec, LOAD: 0.5680 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1315 sec, LOAD: 0.9401 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.1766 sec, LOAD: 1.0127 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0067 sec, LOAD: 0.3688 sec.
Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [Errno 1] _ssl.c:510: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>, DNS finished in 0.277799129486 sec.

Package used (Python/R/Scala/Julia):
Python

Error Message:

BLAS : Program is Terminated. Because you tried to allocate too many memory regions.

Minimum reproducible example

default_train_augument = {}

default_train_augument['mean'] = np.array([123.68, 116.28, 103.53])
default_train_augument['rand_crop'] = 1 # random crop 只会放大
default_train_augument['rand_mirror'] = 1
default_train_augument['rand_pad'] = 0.65 # pad 之后将图片变小
default_train_augument['rand_gray'] = 0.2
default_train_augument['brightness'] = 0.7#
default_train_augument['contrast'] = 0.7 # 是否清晰
default_train_augument['saturation'] = 0.7 #
default_train_augument['pca_noise'] = 0.7 #
default_train_augument['hue'] = 0.7 #
default_train_augument['min_object_covered'] = 0
default_train_augument['aspect_ratio_range'] =(0.8, 1.22)
default_train_augument['area_range'] =(0.3, 2.4)
default_train_augument['min_eject_coverage'] = 0.4 # 不符合条件的box将会被删除

os.environ["MXNET_CPU_WORKER_NTHREADS"] = "%d" % num_worker

def create_mx_det_iter():
    file_name = os.path.join(RECORD_PERSON_ROOPATH, 'person_test.rec')
    id_file_name = os.path.join(RECORD_PERSON_ROOPATH, 'person_test.idx')
    iter1 = ImageDetIter(20,(3,480,640), path_imgrec=file_name, path_imgidx=id_file_name, **default_train_augument)
    return iter1

def record_iterator_test_all(det_iter,batch_size=32):
    import time
    import mxnet as mx
    mx.profiler
    i = 0
    det_iter.reset()
    tic = time.time()
    for batch in det_iter:
        i+=1
        print(batch_size * i / (time.time() - tic))

det_iter = create_mx_det_iter()
record_iterator_test_all(det_iter,32)

@vdantu
Copy link
Contributor

vdantu commented Feb 27, 2018

@sandeep-krishnamurthy: Label "BLAS" "Need Triaging"

@yuantangliang: Are you still seeing this issue? Could you give more information on how to reproduce it. Especially "Minimum reproducible example" section. This helps in debugging it further. And have you started a discussion on discuss.mxnet.io . If yes, could you post that link here for any-one coming to this link?

@szha
Copy link
Member

szha commented Mar 24, 2018

@sandeep-krishnamurthy @vdantu please refrain from manually adding "Need Triage" label while triaging. Thanks.

@szha szha removed the needs triage label Mar 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants