Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Can't train CIFAR10 with debug flag using CMake / GPU / MKL #15790

Open
larroy opened this issue Aug 8, 2019 · 4 comments
Open

Can't train CIFAR10 with debug flag using CMake / GPU / MKL #15790

larroy opened this issue Aug 8, 2019 · 4 comments
Labels
Bug Build CMake CMake related bugs/issues/improvements

Comments

@larroy
Copy link
Contributor

larroy commented Aug 8, 2019

Description

train cifar_10 doesn't seem to make any progress when MXNet is built with CMake and debug flags.

Environment info (Required)

Console is flooded with OMP warnings:

OMP: Error #13: Assertion failure at kmp_runtime.cpp(6481).
OMP: Hint: Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_runtime.cpp(6481): __kmp_team_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6481).
OMP: Hint: Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_runtime.cpp(6481): __kmp_team_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6481).
OMP: Hint: Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Assertion failure at kmp_runtime.cpp(6481): __kmp_team_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6481).
OMP: Hint: Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
Version      : 1.6.0
Directory    : /home/piotr/mxnet_master_cmake_debug/python/mxnet
Commit hash file "/home/piotr/mxnet_master_cmake_debug/python/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library      : ['/home/piotr/mxnet_master_cmake_debug/python/mxnet/../../build/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✖ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✔ F16C
✔ JEMALLOC
✔ BLAS_OPEN
✖ BLAS_ATLAS
✖ BLAS_MKL
✖ BLAS_APPLE
✔ LAPACK
✔ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✔ SIGNAL_HANDLER
✔ DEBUG
✖ TVM_OP
----------System Info----------
Platform     : Linux-4.15.0-1044-aws-x86_64-with-Ubuntu-18.04-bionic
system       : Linux
node         : ip-172-31-21-194
release      : 4.15.0-1044-aws
version      : #46-Ubuntu SMP Thu Jul 4 13:38:28 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Assertion failure at kmp_runtime.cpp(6481): __kmp_team_pool == __null.
OMP: Error #13: Assertion failure at kmp_runtime.cpp(6481).
OMP: Hint: Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/.
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0107 sec, LOAD: 0.4519 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0877 sec, LOAD: 0.0986 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.2109 sec, LOAD: 0.2410 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0080 sec, LOAD: 0.0885 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0036 sec, LOAD: 0.3234 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0113 sec, LOAD: 0.0685 sec.


Package used (Python/R/Scala/Julia):
(I'm using ...)

For Scala user, please provide:

  1. Java version: (java -version)
  2. Maven version: (mvn -version)
  3. Scala runtime if applicable: (scala -version)

For R user, please provide R sessionInfo():

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash:
aadef2d
Build config:

piotr@ip-172-31-21-194:0: ~/mxnet_master_cmake_debug [master]> diff cmake_options.yml cmake/cmake_options.yml 
19c19
< USE_CUDA: "ON" # Build with CUDA support
---
> USE_CUDA: "OFF" # Build with CUDA support

Error Message:

See above.

Minimum reproducible example

Steps to reproduce

I'm using train_cifar10 from gluoncv

 time python train_cifar10.py --num-epochs 10 --mode hybrid --num-gpus 1 -j 1 --batch-size 128            --wd 0.0001 --lr 0.1 --lr-decay 0.1 --lr-decay-epoch 80,160 --model cifar_resnet20_v1
 ip-172-312:time  3:fish* 4:make- 5:fish                  

What have you tried to solve it?

#10856

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Build

@larroy
Copy link
Contributor Author

larroy commented Aug 8, 2019

@mxnet-label-bot add [Bug, Build, CMake]

@marcoabreu marcoabreu added Bug Build CMake CMake related bugs/issues/improvements labels Aug 8, 2019
@larroy
Copy link
Contributor Author

larroy commented Aug 8, 2019

This is also happening in release mode. Cifar10 makes no progress.

@larroy
Copy link
Contributor Author

larroy commented Aug 8, 2019

(py3_venv) piotr@ip-172-31-30-124:0:~/mxnet_master_cmake_rel (master)+$ diff cmake_options.yml cmake/cmake_options.yml
19c19
< USE_CUDA: "ON" # Build with CUDA support
---
> USE_CUDA: "OFF" # Build with CUDA support
50c50
< CMAKE_BUILD_TYPE: "Release"
---
> CMAKE_BUILD_TYPE: "Debug"

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Build CMake CMake related bugs/issues/improvements
Projects
None yet
Development

No branches or pull requests

3 participants