Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Add option to choose between OMP implementations #15808

Closed
wants to merge 4 commits into from

Conversation

larroy
Copy link
Contributor

@larroy larroy commented Aug 8, 2019

Description

Adds a build switch to choose OpenMP implementations:

Workaround for #15790

@karan6181

@larroy larroy requested a review from szha as a code owner August 8, 2019 20:15
@larroy
Copy link
Contributor Author

larroy commented Aug 8, 2019

@mxnet-label-bot add [Build, CMake]

@marcoabreu marcoabreu added Build CMake CMake related bugs/issues/improvements labels Aug 8, 2019
@larroy
Copy link
Contributor Author

larroy commented Aug 8, 2019

@mxnet-label-bot add [Breaking]

Copy link
Contributor

@ChaiBapchya ChaiBapchya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But I am not an expert in CMake so I can not really decide whether this is the most proper way. @marcoabreu @lebeg for review.

@lebeg
Copy link
Contributor

lebeg commented Aug 13, 2019

LGTM. But I am not an expert in CMake so I can not really decide whether this is the most proper way. @marcoabreu @lebeg for review.

I think what you were proposing would be a better way to handle this. Instead of ON/OFF I would define an order in which the Omp versions would be preferred. Like a list OMP_SYSTEM, OMP_BUNDLED, OMP_OFF.

For a case like OMP_SYSTEM if it's not found it would be an error.

@larroy
Copy link
Contributor Author

larroy commented Aug 13, 2019

@lebeg you might be right, but getting this working is urgent as we can't debug. I would propose doing it incrementally or getting your contribution in this direction.

@apeforest
Copy link
Contributor

@larroy What's the effort required to implement the multi-selection as you earlier proposed? If this only blocks debugging using CMake and not regular users or developers using Make, I am more leaning towards @lebeg suggestion.

@larroy
Copy link
Contributor Author

larroy commented Aug 13, 2019

CI is blocked now, and it's going to be blocked for a few days. There's problems with Windows and other things. If I make changes I'm afraid this won't be merged for a while. Up to you guys to decide. I think this is urgent enough to have it merged and then refine it. I would say this is a critical fix for being able to debug, as other people need to debug issues related to other tasks. (Make in debug requires updating the linker which is something I would prefer to avoid in my workstation).

The effort to implement your suggestion should be around 30 minutes.

@marcoabreu
Copy link
Contributor

I'd also prefer the multi selection.

@larroy
Copy link
Contributor Author

larroy commented Aug 23, 2019

Can you guys review again? I made the suggested changes.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@apeforest apeforest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larroy I thought the suggested change is to use multi-selection for USE_OPENMP not adding additional variable:

It would look like:

USE_OPENMP:  "Configure Openmp support"    OMP_SYSTEM |  | OMP_BUNDLED | OMP_OFF)

@lebeg Please let me know if my understanding is correct. Thanks

@larroy
Copy link
Contributor Author

larroy commented Aug 24, 2019

@apeforest I think what you propose is not homogeneous with what we have. I think is better to leave an intitial boolean switch and a fine grained selector. It also keeps backwards compatibility. Hence my proposal in this PR.

@lebeg
Copy link
Contributor

lebeg commented Aug 26, 2019

@larroy I thought the suggested change is to use multi-selection for USE_OPENMP not adding additional variable:

It would look like:

USE_OPENMP:  "Configure Openmp support"    OMP_SYSTEM |  | OMP_BUNDLED | OMP_OFF)

@lebeg Please let me know if my understanding is correct. Thanks

Yes, this is more or less what I had in mind. For the sake of backwards compatibility (and maybe simplicity) keeping the option ON would default to OMP_SYSTEM.

@lebeg
Copy link
Contributor

lebeg commented Aug 30, 2019

There seem to be some unintended changes in this PR, otherwise it looks good

@larroy
Copy link
Contributor Author

larroy commented Aug 30, 2019

They are for CI, a change in dmlc is needed first:

dmlc/dmlc-core#558

@larroy
Copy link
Contributor Author

larroy commented Sep 6, 2019

@lebeg please approve then

@@ -23,7 +23,7 @@ mxnet_option(USE_CUDA "Build with CUDA support" ON)
mxnet_option(USE_OLDCMAKECUDA "Build with old cmake cuda" OFF)
mxnet_option(USE_NCCL "Use NVidia NCCL with CUDA" OFF)
mxnet_option(USE_OPENCV "Build with OpenCV support" ON)
mxnet_option(USE_OPENMP "Build with Openmp support" ON)
mxnet_option(USE_OPENMP "Build with Openmp support" ON) # OFF | ON | PLATFORM | BUNDLED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between ON and the rest? Can you add some comments here?

if(OPENMP_FOUND)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
add_definitions(-DMXNET_USE_OPENMP=1)
endif()
else()
message(FATAL_ERROR "USE_OPENMP takes values [PLATFORM, BUNDLED, OFF]")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"USE_OPENMP takes values [PLATFORM, BUNDLED, OFF, ON]")?

@apeforest
Copy link
Contributor

They are for CI, a change in dmlc is needed first:

dmlc/dmlc-core#558

Why is it needed for this PR?

Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review comments. Also is the submodule update necessary

AND NOT MSVC
AND NOT CMAKE_CROSSCOMPILING)

if(USE_OPENMP STREQUAL "BUNDLED" AND EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/3rdparty/openmp/CMakeLists.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first check and the others should be separate to throw appropriate error messages

Copy link
Contributor Author

@larroy larroy Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. Can you explain what you mean exactly? A concrete request would help, as this is a followup on previous review requests that might be conflicting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone uses the option "bundled" but the "exists" check fails, they'd receive an error message that they selected an invalid mode.

@@ -452,14 +451,18 @@ if(USE_OPENMP)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
add_definitions(-DMXNET_USE_OPENMP=1)
else()
elseif(USE_OPENMP STREQUAL "PLATFORM" OR USE_OPENMP STREQUAL "ON")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between on and platform

Copy link
Contributor Author

@larroy larroy Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the same, for backwards compatibility, meaning if someone had USE_OPENMP=ON will continue working with the OMP implementation which has been shown to be most stable.

Copy link
Member

@cjolivier01 cjolivier01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dangerous

@cjolivier01 cjolivier01 closed this Sep 8, 2019
@szha
Copy link
Member

szha commented Sep 8, 2019

@cjolivier01 if you'd like your change request to be considered a veto, please provide a technical reason. You may refer to the Apache Voting Process for details.

Reopening given that @larroy didn't indicate that he'd like to abandon this change.

@szha szha reopened this Sep 8, 2019
@larroy
Copy link
Contributor Author

larroy commented Sep 8, 2019

Submodule update is necessary as build was failing, there's a fix in the submodule that should be merged soon. You can see the error message by pulling the PR in your local and trying to build. It's related to throw specification, for details see the line changed in the submodule.

@cjolivier01
Copy link
Member

this is a repeat of a previously vetoed proposal by Pedro and another by someone he tried to sneak a similar change in, just under a different guise. I really don’t have the energy to fight openly discuss this anymore after two years.

@cjolivier01 cjolivier01 closed this Sep 8, 2019
@cjolivier01
Copy link
Member

in addition, on a fundamental level, flippantly switching between omp libraries while taking nothing else into considerations is extraordinarily hazardous (ie what’s mkl bringing in?).

@larroy
Copy link
Contributor Author

larroy commented Sep 8, 2019

This has repeatedly caused crashes reported by several users which didn't happen when we used the platform OMP which is also the one shipped with binary releases. You are doing a disservice to the community and the project with this attitude, the users are free to choose the OMP implementation that they want. We need to reproduce the builds which use the builtin omp with the CMake build, and this change enables exactly this behaviour, You are free to compile with whatever OMP version you like, see the github issues for reports of crashes and problems with other OMP implementations. Intel already said they will remove the bundled iomp from MKL as it causes some crashes using multiprocessing which doesn't see to be caused by our code.

@szha
Copy link
Member

szha commented Sep 8, 2019

@cjolivier01 I didn't realize that. If this is indeed a repetition of a change proposal already vetoed, would you mind linking to it?

I only recall a case where @lebeg attempted to remove the bundled omp, which is different from what @larroy is proposing here. If this is what you're referring to, given the difference, would you mind providing a technical reason for rejecting this change?

@cjolivier01
Copy link
Member

cjolivier01 commented Sep 8, 2019

There is plenty of info out there already on this issue including discussions on dev. I will not be forever forced to re-argue this thing every time Pedro wants to sneak basically the exact same change in every few months for two years+ while never actually root-causing the issue (he’s been told what the issue was — not the OMP library’s fault) — it’s a big in mxnet — see the issue he is locked out of), but ignores repeatedly.

@larroy
Copy link
Contributor Author

larroy commented Sep 8, 2019

@cjolivier01 stop holding this project hostage. This will be reported to the relevant decision makers in Apache. I hope your powers are revoked due to your track record of destructive actions.

@marcoabreu
Copy link
Contributor

If the submodule update fixes the build, please put it in a separate PR as already stated.

@marcoabreu
Copy link
Contributor

While I try to not engage in the discussion above whether this is a match to the previously vetoed PR, I agree that I see some similarities.

With regards to #15790: This PR doesn't actually solve the problem (which appears to be the same as previously) but gives users a different OMP library choice. That effectively means that the default method we use is still broken and users would have to know that they have to use a different one. I might be no expert in that field, but I agree that to this day we only read something about some assertion being hit and nothing further. From what I can see, there has been no root causing (to the actual root) but a lot based on assumptions. This change seems quite massive (introducing a new OMP Backend choice) while not being based on real data. So i understand the frustration from both sides here: Chris' request for root causing is constantly being denied, while the root cause for pedro seems apparent. But since we are a community and we would like to make a well educated decision as a group, I'd agree with the general request for more root causing in order to create transparency.

Thinking about it, I agree that this is the same change in a different costume that already had been vetoed with proper technical reasons - but I understand that Chris is getting frustrated by constantly having to repeat his request.

So yeah, I agree with his veto that I'd like to see a deep dive. That should be collected somewhere and if we are at a crossroads, we should bring the topic up for a broader vote on dev@.

@marcoabreu
Copy link
Contributor

Another proposal to avoid the difficult personal situation is to hand this issues over to somebody else. Maybe another engineer at Amazon or somebody at Intel or Nvidia would like to be of assistance here. I'm certain that they are also quite capable of debugging concurrency issues. Wdyt?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Breaking Build CMake CMake related bugs/issues/improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants