Optimize voctree build by up to 40 times for certain problems #1277

p12tic · 2022-10-10T02:17:25Z

This PR fixes 3 performance problems in voctree which cause inefficiency for relatively small problems:

all center updates depend on a single lock during clustering
thread pool is created per-iteration during clustering
thread pool is created per-iteration during k-means initialization

This PR disables parallelism in case problem size is small and introduces a lock for each cluster center to help when this is not the case. Additionally, k-means initialization now runs in parallel for trials which reduces the thread fanout when calculating new distance sum.

I haven't tested this with large problems, but at least vocabularyTreeBuild test became faster by up to 40 times on a machine with AMD 2990WX (~4s -> 0.1s). The runtime of another relevant test voctree_kmeans has been improved from ~3.7s to ~2.7s.

Note that AMD 2990WX has 32 cores/64 threads, so any lock contention problems are exacerbated there. It's not top of the line machine though, as 128 core/256 thread machines have been available for several years already.

p12tic · 2022-10-10T04:02:53Z

A further optimization pushed the improvement 10 -> 40 times. I've edited the PR description to reflect this.

src/aliceVision/voctree/SimpleKmeans.hpp

p12tic · 2022-10-10T19:17:53Z

@simogasp I've addressed your comment.

Currently al updates to new_centers array use a single lock defined by an OpenMP critical section. This introduces unnecessary lock contention because updates to information about different centers are independent. Reduction of thread contention improved the runtime of voctree_vocabularyTreeBuild test on AMD 2990WX by roughly 1.2x from ~4.0s to ~3.5s.

New thread setup dominates the runtime of clustering because it is done for each iteration and thus thread team is repeatedly setup and torn down. To avoid this problem, parallelism is disabled if problem is relatively small. This improves the runtime of voctree_vocabularyTreeBuild test on AMD 2990WX by roughly 7-10 times from ~3.5s to ~0.3-0.5s.

Currently each kmeans trial is run sequentially which is later parallelized across all cores when summing distances. This has higher overhead compared to performing all trials in parallel and then parallelizing for summing because less threads are started up. numTrials is currently 5, so the number of thread startups is up to 5 times less than before.

p12tic force-pushed the optimize-voctree branch from 085885f to 9b0aa70 Compare October 10, 2022 03:22

p12tic changed the title ~~Optimize voctree build by up to 10 times for certain problems~~ Optimize voctree build by up to 40 times for certain problems Oct 10, 2022

fabiencastan requested a review from simogasp October 10, 2022 09:52

simogasp added type:enhancement bugfix labels Oct 10, 2022

simogasp added this to the 2.5.0 milestone Oct 10, 2022

simogasp reviewed Oct 10, 2022

View reviewed changes

src/aliceVision/voctree/SimpleKmeans.hpp Outdated Show resolved Hide resolved

src/aliceVision/voctree/SimpleKmeans.hpp Show resolved Hide resolved

p12tic force-pushed the optimize-voctree branch from 9b0aa70 to fd55c80 Compare October 10, 2022 19:17

p12tic added 3 commits October 11, 2022 07:53

p12tic force-pushed the optimize-voctree branch from fd55c80 to ae283c2 Compare October 11, 2022 04:53

simogasp approved these changes Oct 11, 2022

View reviewed changes

fabiencastan merged commit ecbfe1b into alicevision:develop Oct 11, 2022

p12tic mentioned this pull request Oct 11, 2022

(trivial) Improve parallel ctest speed by 1.5x by splitting sfm_panorama test into several executables #1272

Merged

p12tic deleted the optimize-voctree branch October 11, 2022 14:01

p12tic mentioned this pull request Oct 11, 2022

Add support for TBB multi-threading backend in addition to OpenMP #1236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize voctree build by up to 40 times for certain problems #1277

Optimize voctree build by up to 40 times for certain problems #1277

p12tic commented Oct 10, 2022 •

edited

Loading

p12tic commented Oct 10, 2022

p12tic commented Oct 10, 2022

Optimize voctree build by up to 40 times for certain problems #1277

Optimize voctree build by up to 40 times for certain problems #1277

Conversation

p12tic commented Oct 10, 2022 • edited Loading

p12tic commented Oct 10, 2022

p12tic commented Oct 10, 2022

p12tic commented Oct 10, 2022 •

edited

Loading