-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Remove conflicting llvm OpenMP from cmake builds #12160
Conversation
you may want to check this on mac, where the "special" clang doesn't provide -fopenmp option |
df646c3
to
cc068d6
Compare
Is this PR good to go? #10951 is waiting on this |
LGTM, might not work on older versions of MacOS as mentioned, but I think this is partially ok given it's broken already and that soon the current release, and current release -1 should both support openmp from clang. Orthogonal to this PR, but @szha @lebeg: Wouldn't we expect the CMake behaviour would detect if you have OpenMP (or don't have it) and take appropriate actions without an error? Why do we even need to specify OPEN_MP=0/1? If we were setting this automatically we would not run into different versions of MacOS building/failing right? What do you think? |
@KellenSunderland well the option USE_OPENMP i think is good to control explicit need for OpenMP ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always compile with openmp=off in mac anyway.
@lebeg Yes different PR for sure. In the future it'd be great if we had 'auto', 'on', and 'off, and the default was auto. |
@larroy Actually, if you install clang via brew ( |
And it would use the OpenMP lib that we had in the submodule 😁 |
FYI here's what the deps look like before / after this patch is applied. Clearly moving from linking solely to the 3rdparty llvm openmp to the gomp version. https://gist.github.com/KellenSunderland/3e2852d929c444f6ba36c3c9b46194b8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to check if this resolves #10856 Anton?
Yes, this certainly fixes #10856 since it's kmp API (thread affinity) failure available only in llvm omp. |
@@ -370,32 +370,12 @@ endif() | |||
# ---[ OpenMP | |||
if(USE_OPENMP) | |||
find_package(OpenMP REQUIRED) | |||
# This should build on Windows, but there's some problem and I don't have a Windows box, so | |||
# could a Windows user please fix? | |||
if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you high? Why would you remove this?
"Using a different OpenMP library from the one supplied by the compiler has side effects." <-- that doesn't really give any useful information Using the llvm OpenMP gives a significant performance improvement over gcc's libgomp. It's loaded anyway for many builds such as mkl. It should not be removed unless there is a VERY good reason to, and no other reasonable solution around the problem exists. |
Thanks for your review @cjolivier01 :) Actually for MKL builds another OpenMP library is provided (from Intel), specifically for MKLML. And it is stated in MKLDNN README that:
In MXNet exactly this happens but with llvm openmp. |
I consider improving stability and lowering complexity of the build enough good reasons to do this change. |
The build replaces libomp5 with this libomp when building with MKL, which is legal and essentially the same thing. I know of no stability changes caused by using IntelOMP (you'd have top call ALL MKL builds unstable in this case) |
btw, I did a lot of work with mxnet and cython ( https://github.com/apache/incubator-mxnet/tree/cython ), and never had issues with this openmp library (always build with cmake). More likely the problems you see are symptoms of something else that can be adjusted. This OpenMP library has 66% less overhead than libgomp when running parallel OMP regions, which adds up to a lot of time quickly. |
@cjolivier01 are these numbers available or is there some script / instructions on how to reproduce the benchmarks / how are they conducted? |
Will changes be made to this PR based on review comments? |
I am about to get some performance numbers on how MXNet performs without the included as submodule OpenMP. I do not think there is any change I can make based on the reviewing comments. |
+1 to @cjolivier01 ask to update the issue description to be more informative, and would be great to see perf data. |
I have updated the description. |
I cross post the things I already sent to the @dev list: The issue was closed, but I am strong in my opinion that it's the right thing to do. Background If you want to use OpenMP pragmas in your code for parallelization you would supply a special flag to the compiler:
Each of the compilers would enable the Thus, to use the advantages of an OpenMP implementation one has to compile the code with the corresponding compiler. Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is used (here and here) to replace the OpenMP library supplied by the compiler. I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library for Deep Neural Networks): "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes." link And: "Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application." link As can be seen from ldd for MXNet:
Performance The only performance data related to OpenMP in MXNet I was able to find is here: Which in my understanding is testing impact of different environment variables for the same setup (using same bundled OpenMP library). The libraries may differ in implementation and the Thread Affinity Interface may have significant impact on performance. All compilers support it:
I see no reason why this submodule should be kept. |
Thanks @lebeg it's a long thread and I need to go through the change and issue first. |
Also @xinyu-intel @ZhennanQin they have the experience of different platform, MacOS, Windows and compiler-level knowledge. |
Link to the dev list discussion. |
Please reopen this PR given the data collected in the wiki: |
@larroy Thanks for continuously watching this issue. My previous compiler developer experience tells me, build project with certain version of certain compiler component is not a good idea. Because compiler may have assumption about the version of omp be used. We should let toolchain to make the decision, this can avoid many unexpected behavior. |
Yup. |
Guys, I can not reopen this PR - it was closed by a committer and I assume it takes one to open it again. The branch is still there, I can make any necessary changes if needed after it will be reopened. |
Is all this still related to an assertion? Why does it assert? You should also turn off operators tuning before running performance tests and include mnist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does assertion hit?
also above comment
This is not (only) about the assertion. Unit tests hang when compiling with OpenMP in Linux. I always compile without openmp for this reason. There are several related issues that we hope that this PR will make go away: @cjolivier01 is suggesting to run the tests with MXNET_USE_OPERATOR_TUNING=0 so OMP is always used for operators, but is this a useful scenario? If with the provided openmp and tuning we get similar perfomance why does it matter if some operators are tuned or not? Please elaborate on why are you requesting additional work and benchmarks and not just ask for open ended work with might discourage this contribution which was opened more than half a year ago. I suggest to provide a path to conclusion when asking for additional work, explain the reasons and limit the scope. I think running a couple of end to end models like mnist or resnet is a valid request. There's also the internal benchmarks which could give us data on possible regressions so the PR could be reverted if we observe regressions after a couple of days. This is not a one way door type of decision. |
I spent some time debugging this assertion, it was long ago, but if I remember correctly is related to the problematic use of pthread_atfork in https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L76 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have multiple issues caused by tweaking with OpenMP.
For real-life payloads any potential overhead was proven to be insignificant.
@lebeg Can you rebase with the latest master branch ? |
Mnist results: This PR:
Master 3b663ef
|
Gluon unit tests seem to run faster with the system omp:
Master:
My previous runs were made on a 72 core machine (c5d.18xlarge) |
As you can see, standard MKL builds also use libomp. ldd /usr/local/lib/python3.6/dist-packages/mxnet/libmxnet.so |
Why do you mean by your latest comment and what additional info are you requesting? What is your comment regarding the speed improvements and solving the issues with the bundled OpenMP please elaborate on your veto and how to move forward. Please provide a way forward, there's a significant work providing data having made by project contributors that you are dismissing by closing this PR without justification. |
Description
Using a different OpenMP library from the one supplied by the compiler can lead to undefined behaviour. Multiple problems were reported by users of MXNet.
If there is need to use OpenMP library provided by llvm, one should compile MXNet with latest Clang, which is now supported as a test stage by the CI.
Checklist
Essentials
Changes
Comments
Fixes: