-
-
Notifications
You must be signed in to change notification settings - Fork 827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for TBB multi-threading backend in addition to OpenMP #1236
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
p12tic
changed the title
Wrap OpenMP invocations in an interface to support other parallelization backends in the future
Wrap OpenMP invocations in an interface to support other multi-threading backends in the future
Sep 23, 2022
p12tic
force-pushed
the
wrap-openmp
branch
5 times, most recently
from
September 24, 2022 05:19
f903a85
to
54bfb78
Compare
need a rebase |
p12tic
force-pushed
the
wrap-openmp
branch
2 times, most recently
from
October 7, 2022 13:12
dc4760d
to
21e395e
Compare
@fabiencastan Rebased, thanks. |
p12tic
force-pushed
the
wrap-openmp
branch
3 times, most recently
from
October 11, 2022 04:15
4045093
to
8ade13b
Compare
p12tic
changed the title
Wrap OpenMP invocations in an interface to support other multi-threading backends in the future
Add support for TBB multi-threading backend in addition to OpenMP
Oct 11, 2022
@fabiencastan I've expanded the scope of this PR and it now includes full support for oneTBB multithreading backend. This will allow to run aliceVision with multithreading enabled on macOS. |
p12tic
force-pushed
the
wrap-openmp
branch
2 times, most recently
from
October 12, 2022 06:42
2bbcc95
to
5c5da60
Compare
The function spends more than 99.98% of its time in Estimate_T_triplet() even in tests which presumably operate on smaller datasets than what's in production. It's not worth to complicate code with per-thread accumulator in this case.
This commit is best reviewed with whitespace changes ignored.
This commit is best reviewed with whitespace changes ignored.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently AliceVision uses OpenMP as the only multi-threading backend. This is problematic due to multiple reasons.
OpenMP support is not uniform among compilers. In particular, Apple mobile platforms do not support OpenMP. As a result, the performance of the algorithms is as much as 8 times lower than possible.
OpenMP critical sections are global. That is, all
#pragma omp critical
lock the same global mutex. As a consequence it is inefficient to run multiple instances of the same parallelized aliceVision algorithm because these instances will share the same mutex even though data races are possible only among threads running single instance of the algorithm. Ideally each instance would have its own mutex.It is not possible to efficiently integrate third-party libraries that use another multi-threading framework because OpenMP assumes that it is the only user of the CPU. As a result, the CPU will be oversubscribed which leads to poor performance. Note that as currently used OpenMP will oversubscribe the CPU all by itself even right now if multiple instances of the same parallelized algorithm are invoked in parallel.
This PR takes inspiration from OpenCV to hide the usage of multi-threading framework behind an API. This will eventually allow supporting multiple multi-threading frameworks. For more details in how it works in OpenCV, see this document.
This PR implements the following:
boost::atomic_ref
(once we can use C++20 we can migrate tostd::atomic_ref
).pragma omp parallel
uses behind an interface exposingsystem::parallelFor
andsystem::parallelLoop
functions instead.As a result the OpenMP code can be converted as follows. For example:
Equivalent implementation of this loop using
system::parallelFor
is the following:As a result, the performance has been improved in several cases where libgomp implementation of OpenMP currently doesn't handle well (many requests to parallelize relatively small problems). This reduces the need for PRs like #1277 as the multithreading runtime handles a wider variety of tasks. For example, before #1277, the this PR made the following tests faster running on a machine with AMD 2990WX with disabled turbo boost:
test_voctree_vocabularyTreeBuild
: before ~5.1-7s (extremely variable), after ~1.4stest_voctree_kmeans
: before ~5.8-10s (extremely variable), after ~4.7stest_sfm_sequentialSfM
: before ~1.7s, after ~1.5sThe PR is split into a large number of commits to allow easy bisection in case a bug slips through. As a result the risk of the PR is low as any bugs will be easily diagnosed and fixed.