Remove conflicting llvm OpenMP from cmake builds #12160

lebeg · 2018-08-14T11:05:16Z

Description

Using a different OpenMP library from the one supplied by the compiler can lead to undefined behaviour. Multiple problems were reported by users of MXNet.

If there is need to use OpenMP library provided by llvm, one should compile MXNet with latest Clang, which is now supported as a test stage by the CI.

Checklist

Essentials

This is a minor change
Changes are complete

Changes

Removed OpenMP overwrite from CMakeLists.txt
Removed otherwise not used OpenMP submodule

Comments

Fixes:

libomp.so dependency (need REAL fix) #11417 libomp.so dependency (need REAL fix)
Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 #10856 test_multi_worker_forked_data_loader fails with DEBUG=1

szha · 2018-08-14T19:48:35Z

you may want to check this on mac, where the "special" clang doesn't provide -fopenmp option

lebeg · 2018-08-15T13:42:02Z

@szha yes, but for AppleClang we are failing on this line already, so no difference made in this case.

vandanavk · 2018-08-21T18:56:21Z

Is this PR good to go? #10951 is waiting on this

KellenSunderland · 2018-08-22T13:53:55Z

LGTM, might not work on older versions of MacOS as mentioned, but I think this is partially ok given it's broken already and that soon the current release, and current release -1 should both support openmp from clang.

Orthogonal to this PR, but @szha @lebeg: Wouldn't we expect the CMake behaviour would detect if you have OpenMP (or don't have it) and take appropriate actions without an error? Why do we even need to specify OPEN_MP=0/1? If we were setting this automatically we would not run into different versions of MacOS building/failing right? What do you think?

lebeg · 2018-08-22T14:00:39Z

@KellenSunderland well the option USE_OPENMP i think is good to control explicit need for OpenMP (REQUIRED in cmake's find_package) or disable it if needed for testing for example. We could potentially have a string option with values like Auto, Require, Disable, but that would be a separate slightly bigger change.

larroy

I always compile with openmp=off in mac anyway.

KellenSunderland · 2018-08-22T14:04:41Z

@lebeg Yes different PR for sure. In the future it'd be great if we had 'auto', 'on', and 'off, and the default was auto.

lebeg · 2018-08-22T14:05:10Z

@larroy Actually, if you install clang via brew (brew install llvm) then OpenMP would work.

lebeg · 2018-08-22T14:06:40Z

And it would use the OpenMP lib that we had in the submodule 😁

KellenSunderland · 2018-08-22T15:01:07Z

FYI here's what the deps look like before / after this patch is applied. Clearly moving from linking solely to the 3rdparty llvm openmp to the gomp version. https://gist.github.com/KellenSunderland/3e2852d929c444f6ba36c3c9b46194b8

KellenSunderland

Would you be able to check if this resolves #10856 Anton?

lebeg · 2018-08-22T15:13:32Z

Yes, this certainly fixes #10856 since it's kmp API (thread affinity) failure available only in llvm omp.

cjolivier01 · 2018-08-22T15:27:46Z

CMakeLists.txt

@@ -370,32 +370,12 @@ endif()
 # ---[ OpenMP
 if(USE_OPENMP)
  find_package(OpenMP REQUIRED)
-  # This should build on Windows, but there's some problem and I don't have a Windows box, so
-  # could a Windows user please fix?
-  if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt


Are you high? Why would you remove this?

cjolivier01 · 2018-08-22T15:32:53Z

"Using a different OpenMP library from the one supplied by the compiler has side effects." <-- that doesn't really give any useful information

Using the llvm OpenMP gives a significant performance improvement over gcc's libgomp. It's loaded anyway for many builds such as mkl. It should not be removed unless there is a VERY good reason to, and no other reasonable solution around the problem exists.

lebeg · 2018-08-22T15:40:43Z

Thanks for your review @cjolivier01 :)

Actually for MKL builds another OpenMP library is provided (from Intel), specifically for MKLML. And it is stated in MKLDNN README that:

Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application.

In MXNet exactly this happens but with llvm openmp.

lebeg · 2018-08-22T15:43:51Z

I consider improving stability and lowering complexity of the build enough good reasons to do this change.

cjolivier01 · 2018-08-22T16:21:32Z

The build replaces libomp5 with this libomp when building with MKL, which is legal and essentially the same thing. I know of no stability changes caused by using IntelOMP (you'd have top call ALL MKL builds unstable in this case)

cjolivier01 · 2018-08-22T16:45:07Z

btw, I did a lot of work with mxnet and cython ( https://github.com/apache/incubator-mxnet/tree/cython ), and never had issues with this openmp library (always build with cmake). More likely the problems you see are symptoms of something else that can be adjusted. This OpenMP library has 66% less overhead than libgomp when running parallel OMP regions, which adds up to a lot of time quickly.

larroy · 2018-08-22T21:38:28Z

@cjolivier01 are these numbers available or is there some script / instructions on how to reproduce the benchmarks / how are they conducted?

vandanavk · 2018-08-27T20:21:42Z

Will changes be made to this PR based on review comments?

lebeg · 2018-08-27T20:25:46Z

I am about to get some performance numbers on how MXNet performs without the included as submodule OpenMP. I do not think there is any change I can make based on the reviewing comments.

lupesko · 2018-08-28T05:10:19Z

+1 to @cjolivier01 ask to update the issue description to be more informative, and would be great to see perf data.
Thanks for the contribution!

lebeg · 2018-08-28T07:23:46Z

I have updated the description.

lebeg · 2018-11-22T12:44:03Z

I cross post the things I already sent to the @dev list:

The issue was closed, but I am strong in my opinion that it's the right thing to do.

Background

If you want to use OpenMP pragmas in your code for parallelization you would supply a special flag to the compiler:

Clang / -fopenmp
https://openmp.llvm.org/
GCC / -fopenmp
https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
Intel / [Q]openmp
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
Visual Studio: /openmp (Enable OpenMP 2.0 Support)
https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx

Each of the compilers would enable the #pragma omp directive during C/C++ compilation and arrange for automatic linking of the OpenMP runtime library supplied by each compiler separately.

Thus, to use the advantages of an OpenMP implementation one has to compile the code with the corresponding compiler.

Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is used (here and here) to replace the OpenMP library supplied by the compiler.

I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library for Deep Neural Networks):

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes." link

And:

"Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application." link

As can be seen from ldd for MXNet:

$ ldd build/tests/mxnet_unit_tests | grep omp
    libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so (0x00007f697bc55000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f69660cd000)

Performance

The only performance data related to OpenMP in MXNet I was able to find is here:
#9744 (comment)

Which in my understanding is testing impact of different environment variables for the same setup (using same bundled OpenMP library).

The libraries may differ in implementation and the Thread Affinity Interface may have significant impact on performance.

All compilers support it:

Clang / KMP_AFFINITY
https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
GCC / GOMP_CPU_AFFINITY
https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
Intel / KMP_AFFINITY
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
Visual Studio / SetThreadAffinityMask
https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask

I see no reason why this submodule should be kept.

pengzhao-intel · 2018-11-22T12:52:14Z

Thanks @lebeg it's a long thread and I need to go through the change and issue first.
Will back to you soon.

pengzhao-intel · 2018-11-22T12:58:48Z

Also @xinyu-intel @ZhennanQin they have the experience of different platform, MacOS, Windows and compiler-level knowledge.

lebeg · 2018-11-27T16:03:51Z

Link to the dev list discussion.

larroy · 2019-05-20T05:51:58Z

Please reopen this PR given the data collected in the wiki:

https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations

ZhennanQin · 2019-05-20T06:54:46Z

@larroy Thanks for continuously watching this issue. My previous compiler developer experience tells me, build project with certain version of certain compiler component is not a good idea. Because compiler may have assumption about the version of omp be used. We should let toolchain to make the decision, this can avoid many unexpected behavior.
So, reopen this PR +1.

stsukrov · 2019-05-22T08:04:23Z

Yup.
Let's do it properly.

larroy · 2019-05-22T08:05:41Z

@stsukrov can you ping @lebeg ?

lebeg · 2019-05-22T12:24:21Z

Guys, I can not reopen this PR - it was closed by a committer and I assume it takes one to open it again. The branch is still there, I can make any necessary changes if needed after it will be reopened.

cjolivier01 · 2019-05-22T21:26:34Z

Is all this still related to an assertion? Why does it assert?

You should also turn off operators tuning before running performance tests and include mnist.

cjolivier01

why does assertion hit?
also above comment

larroy · 2019-05-22T23:24:05Z

This is not (only) about the assertion. Unit tests hang when compiling with OpenMP in Linux. I always compile without openmp for this reason.

There are several related issues that we hope that this PR will make go away:

#14999

@cjolivier01 is suggesting to run the tests with MXNET_USE_OPERATOR_TUNING=0 so OMP is always used for operators, but is this a useful scenario? If with the provided openmp and tuning we get similar perfomance why does it matter if some operators are tuned or not? Please elaborate on why are you requesting additional work and benchmarks and not just ask for open ended work with might discourage this contribution which was opened more than half a year ago. I suggest to provide a path to conclusion when asking for additional work, explain the reasons and limit the scope.

I think running a couple of end to end models like mnist or resnet is a valid request. There's also the internal benchmarks which could give us data on possible regressions so the PR could be reverted if we observe regressions after a couple of days. This is not a one way door type of decision.

larroy · 2019-05-22T23:34:00Z

I spent some time debugging this assertion, it was long ago, but if I remember correctly is related to the problematic use of pthread_atfork in https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L76
Which causes openmp to be reinitialized and the assertion to be triggered.
I'm not sure what you want to do with this information or what you suggest to do with that problem and that problematic piece of code which caused hard bugs in the past, but I would say it's out of the scope of this PR.

stsukrov

We have multiple issues caused by tweaking with OpenMP.
For real-life payloads any potential overhead was proven to be insignificant.

piyushghai · 2019-06-07T22:49:39Z

@lebeg Can you rebase with the latest master branch ?

larroy · 2019-06-14T23:16:25Z

Mnist results:

This PR:

(py3_venv) piotr@ec2 cpu:0: ~/m/e/module [omp]> /usr/bin/time python mnist_mlp.py
[23:07:54] ../src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(100,784)
[23:07:54] ../src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(100,784)
epoch 000: accuracy=0.123883
epoch 001: accuracy=0.679983
INFO:root:Epoch[0] Train-accuracy=0.133283
INFO:root:Epoch[0] Time cost=1.047
INFO:root:Epoch[0] Validation-accuracy=0.252500
INFO:root:Epoch[1] Train-accuracy=0.725100
INFO:root:Epoch[1] Time cost=1.017
INFO:root:Epoch[1] Validation-accuracy=0.866900
batch 000 acc: 0.910
batch 020 acc: 0.890
batch 040 acc: 0.870
batch 060 acc: 0.910
batch 080 acc: 0.840
validation Accuracy: 0.867
accuracy=0.866900
174.10user 8.20system 0:07.17elapsed 2541%CPU (0avgtext+0avgdata 501636maxresident)

####
100 epoch:


4733.37user 13.77system 2:14.40elapsed 3531%CPU (0avgtext+0avgdata 501476maxresident)k
0inputs+0outputs (0major+649505minor)pagefaults 0swaps

Master 3b663ef

(py3_venv) piotr@ec2 cpu:0: ~/m/e/module [master]> /usr/bin/time python mnist_mlp.py
[23:08:09] ../src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(100,784)
[23:08:09] ../src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(100,784)
epoch 000: accuracy=0.123883
epoch 001: accuracy=0.679983
INFO:root:Epoch[0] Train-accuracy=0.133283
INFO:root:Epoch[0] Time cost=0.781
INFO:root:Epoch[0] Validation-accuracy=0.252500
INFO:root:Epoch[1] Train-accuracy=0.725100
INFO:root:Epoch[1] Time cost=0.935
INFO:root:Epoch[1] Validation-accuracy=0.866900
batch 000 acc: 0.910        
batch 020 acc: 0.890        
batch 040 acc: 0.870        
batch 060 acc: 0.910        
batch 080 acc: 0.840        
validation Accuracy: 0.867  
accuracy=0.866900           
105.58user 83.53system 0:06.92elapsed 2733%CPU (0avgtext+0avgdata 502812maxresident)k
0inputs+0outputs (0major+266146minor)pagefaults 0swaps

#####
With 100 epochs:

2654.15user 2779.46system 2:33.42elapsed 3541%CPU (0avgtext+0avgdata 503584maxresident)k
0inputs+0outputs (0major+697918minor)pagefaults 0swaps

larroy · 2019-06-14T23:43:56Z

Gluon unit tests seem to run faster with the system omp:

test_gluon.test_np_shape_parameters ... (2, 2016)
(2, 16)
ok
test_gluon.test_gluon_param_load ... ok

----------------------------------------------------------------------
Ran 107 tests in 70.958s

OK (SKIP=18)
206.25user 490.32system 1:12.04elapsed 966%CPU (0avgtext+0avgdata 13262532maxresident)k
0inputs+976152outputs (0major+33768421minor)pagefaults 0swaps

Master:

(2, 16)
ok
test_gluon.test_gluon_param_load ... ok

----------------------------------------------------------------------
Ran 107 tests in 259.936s

OK (SKIP=18)
949.25user 1507.26system 4:21.16elapsed 940%CPU (0avgtext+0avgdata 13279492maxresident)k
0inputs+976152outputs (0major+34278780minor)pagefaults 0swaps

My previous runs were made on a 72 core machine (c5d.18xlarge)

larroy · 2019-06-14T23:49:53Z

#14999

cjolivier01 · 2019-06-17T18:08:06Z

As you can see, standard MKL builds also use libomp.

ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so
linux-vdso.so.1 (0x00007ffc989cf000)
libmklml_intel.so => /usr/local/lib/python3.6/dist-packages/mxnet/libmklml_intel.so (0x00007f0afb7c1000)
libiomp5.so => /usr/local/lib/python3.6/dist-packages/mxnet/libiomp5.so (0x00007f0afb3e5000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0afb1dd000)
libmkldnn.so.0 => /usr/local/lib/python3.6/dist-packages/mxnet/libmkldnn.so.0 (0x00007f0afa7ba000)
libgfortran.so.3 => /usr/local/lib/python3.6/dist-packages/mxnet/libgfortran.so.3 (0x00007f0afa493000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0afa28f000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0af9f06000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0af9b68000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0af9950000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0af9731000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0af9340000)
/lib64/ld-linux-x86-64.so.2 (0x00007f0b073f4000)
libquadmath.so.0 => /usr/local/lib/python3.6/dist-packages/mxnet/libquadmath.so.0 (0x00007f0af9100000)

larroy · 2019-06-17T18:20:41Z

Why do you mean by your latest comment and what additional info are you requesting?

What is your comment regarding the speed improvements and solving the issues with the bundled OpenMP please elaborate on your veto and how to move forward.

Please provide a way forward, there's a significant work providing data having made by project contributors that you are dismissing by closing this PR without justification.

lebeg requested a review from szha as a code owner August 14, 2018 11:05

lebeg mentioned this pull request Aug 14, 2018

libomp.so dependency (need REAL fix) #11417

Closed

lebeg force-pushed the omp branch from 7496857 to 02507fc Compare August 14, 2018 15:37

lebeg force-pushed the omp branch 2 times, most recently from df646c3 to cc068d6 Compare August 17, 2018 16:12

lebeg requested a review from marcoabreu as a code owner August 17, 2018 16:12

KellenSunderland approved these changes Aug 22, 2018

View reviewed changes

larroy approved these changes Aug 22, 2018

View reviewed changes

KellenSunderland reviewed Aug 22, 2018

View reviewed changes

cjolivier01 suggested changes Aug 22, 2018

View reviewed changes

lebeg mentioned this pull request Apr 3, 2019

Poor performance of the libmxnet if OMP_PLACES environment variable is present #14087

Open

aaronmarkham reopened this May 22, 2019

lebeg force-pushed the omp branch from cc068d6 to 8af0528 Compare May 22, 2019 20:24

cjolivier01 suggested changes May 22, 2019

View reviewed changes

stsukrov approved these changes May 23, 2019

View reviewed changes

lebeg added 2 commits June 11, 2019 10:46

Remove conflicting llvm OpenMP

df2b951

Fixed libomp packaging

28b4c23

lebeg force-pushed the omp branch from 8af0528 to 28b4c23 Compare June 11, 2019 08:50

lebeg requested a review from rahul003 as a code owner June 11, 2019 08:50

cjolivier01 closed this Jun 17, 2019

larroy mentioned this pull request Jun 24, 2019

Failed OpenMP assertion when loading MXNet compiled with DEBUG=1 #10856

Closed

lebeg mentioned this pull request Jun 26, 2019

Fix build with system's openmp #15369

Closed

5 tasks

Remove conflicting llvm OpenMP from cmake builds #12160

Remove conflicting llvm OpenMP from cmake builds #12160

Conversation

lebeg commented Aug 14, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

szha commented Aug 14, 2018

lebeg commented Aug 15, 2018

vandanavk commented Aug 21, 2018

KellenSunderland commented Aug 22, 2018 • edited Loading

lebeg commented Aug 22, 2018

larroy left a comment

Choose a reason for hiding this comment

KellenSunderland commented Aug 22, 2018

lebeg commented Aug 22, 2018

lebeg commented Aug 22, 2018

KellenSunderland commented Aug 22, 2018

KellenSunderland left a comment

Choose a reason for hiding this comment

lebeg commented Aug 22, 2018

cjolivier01 Aug 22, 2018

Choose a reason for hiding this comment

cjolivier01 commented Aug 22, 2018 • edited Loading

lebeg commented Aug 22, 2018

lebeg commented Aug 22, 2018 • edited Loading

cjolivier01 commented Aug 22, 2018

cjolivier01 commented Aug 22, 2018

larroy commented Aug 22, 2018 • edited Loading

vandanavk commented Aug 27, 2018

lebeg commented Aug 27, 2018

lupesko commented Aug 28, 2018

lebeg commented Aug 28, 2018

lebeg commented Nov 22, 2018

pengzhao-intel commented Nov 22, 2018

pengzhao-intel commented Nov 22, 2018

lebeg commented Nov 27, 2018

larroy commented May 20, 2019

ZhennanQin commented May 20, 2019

stsukrov commented May 22, 2019

larroy commented May 22, 2019

lebeg commented May 22, 2019

cjolivier01 commented May 22, 2019

cjolivier01 left a comment

Choose a reason for hiding this comment

larroy commented May 22, 2019 • edited Loading

larroy commented May 22, 2019

stsukrov left a comment

Choose a reason for hiding this comment

piyushghai commented Jun 7, 2019

larroy commented Jun 14, 2019

larroy commented Jun 14, 2019 • edited Loading

larroy commented Jun 14, 2019

cjolivier01 commented Jun 17, 2019

larroy commented Jun 17, 2019 • edited Loading

lebeg commented Aug 14, 2018 •

edited

Loading

KellenSunderland commented Aug 22, 2018 •

edited

Loading

cjolivier01 commented Aug 22, 2018 •

edited

Loading

lebeg commented Aug 22, 2018 •

edited

Loading

larroy commented Aug 22, 2018 •

edited

Loading

larroy commented May 22, 2019 •

edited

Loading

larroy commented Jun 14, 2019 •

edited

Loading

larroy commented Jun 17, 2019 •

edited

Loading