Clear candidates (suspects) in parallel: entanglement management perf improvement (and other fixes) #168

shwestrick · 2022-10-18T20:12:41Z

Integrate with the scheduler to clear suspects in parallel. This patch subsumes #167 but I will keep the other open for now, for potential discussion.

Performance improvements. Big! On 72 procs, ms-queue is now at ~30x speedup over sequential (old: 19x), and linden-pq is now at ~36x speedup over sequential (old: ~30x)

Algorithm description. The algorithm in this patch is straightforward. Clearing suspects is a bucketed filter: some elements are eliminated, and some survive; each of the survivors is moved into one of $D$ buckets (where $D$ is the depth of the heap whose suspects are being cleared). To parallelize this, we (1) break up the input, (2) run multiple sequential filters in parallel, and then finally (3) merge the results.

TODO: algorithmic improvements? This algorithm has three phases. The middle phase is highly parallel, but both the first and last phases are sequential. In the first phase, the algorithm converts the input list (of chunks) into an array (of chunks). In the last phase, it sequentially merges bucketed outputs. Probably neither of these is a performance bottleneck at the moment. But in the future, there are opportunities for more parallelism. To parallelize the first phase, we could maintain all suspect sets as balanced trees, to enable fast balanced splitting. To parallelize the last phase, we could merge results with a D&C reduction.

TODO: implementation optimizations? This particular implementation could be further optimized in a few ways.

Don't rely on malloc/free for intermediate data. Use the block allocator directly instead, to avoid the malloc/free bottleneck.
Free old chunks in parallel. Currently we free these chunks sequentially in ES_deleteClearSet. But we could instead free them in ES_processClearSetGrain, by eagerly freeing each processed chunk we see along the way.

…gement);signifiicantly less overhead on entangled read barriers

…ects

umutacar · 2022-10-18T20:41:09Z

why are we using list of chunks? why not array of chunks (possibly with doubling to grow)?

shwestrick · 2022-10-18T21:14:07Z

Originally we chose list-of-chunks for simplicity. O(1) for both insertion and concatenate is really nice. We use both operations extensively... I don't think it would be easy to rework things to avoid concatenate.

Also, the amortization for doubling wouldn't play nice with span.

The ideal data structure would be non-amortized with:

O(1) insertion
O(1) concatenate
O(log n) split (or O(1), of course 😄)

IIRC there is a "bag" data structure that has these guarantees. I can't remember it off the top of my head at the moment. If we're willing to pay log-cost for concatenate, then there are lots of good options.

umutacar · 2022-10-18T21:22:16Z

every concat corresponds to a join right?

shwestrick · 2022-10-18T21:29:26Z

Essentially yes. Both logical joins and true joins.

umutacar · 2022-10-18T21:31:54Z

cool, if so, would the following work:
-- base case: create array
-- at joins/concats: simply link up the arrays with a pointer, or possibly with a "node" that has them as its children

the length of such a thing should be bounded by span...

shwestrick · 2022-10-18T21:50:58Z

Hmm, I'll have to think about this one. Seems like it could increase the overall span to $O(S^2)$ by paying $O(S)$ span per join.

A bound like $O(S log C)$ where $C$ is the maximum number of candidates might be better, and we can get this bound with a balanced tree.

umutacar · 2022-10-18T22:09:55Z

ok let's discuss in person

Be warned! `#ifdef ASSERT` is true in all builds. This was causing the debug version of `traverseAndCheck` to run in all builds, with significant performance degradation in entangled benchmarks. I cleaned up the header and definition a little here, too.

shwestrick · 2022-10-19T19:55:03Z

After discussion today:

Altogether, this patch implements three fixes / performance improvements:

Parallelization of suspect clearing
Eliminate EBR overhead for read barriers
Bugfix: #if vs #ifdef typo

The results are Very Very Good™️

Merging now 🎉

shwestrick added 4 commits October 16, 2022 12:52

separate runtime call for clear suspects, and some suspects logging

6ea1e7e

performance improvement: dont assume quiescent for HM_EBR (chunk mana…

12073fd

…gement);signifiicantly less overhead on entangled read barriers

initial brainstorm of interface for scheduler to handle clearing susp…

7b2ce2b

…ects

clear suspects in parallel if needed

15a0578

Fix #if ASSERT typo

fe3156c

Be warned! `#ifdef ASSERT` is true in all builds. This was causing the debug version of `traverseAndCheck` to run in all builds, with significant performance degradation in entangled benchmarks. I cleaned up the header and definition a little here, too.

shwestrick changed the title ~~Clear candidates (suspects) in parallel: entanglement management performance improvement~~ Clear candidates (suspects) in parallel: entanglement management perf improvement (and other fixes) Oct 19, 2022

shwestrick merged commit bb4eb70 into performance-eval Oct 19, 2022

shwestrick mentioned this pull request Oct 19, 2022

Improve performance of entangled read barriers #167

Closed

shwestrick deleted the par-clear-candidates branch September 11, 2024 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear candidates (suspects) in parallel: entanglement management perf improvement (and other fixes) #168

Clear candidates (suspects) in parallel: entanglement management perf improvement (and other fixes) #168

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 19, 2022

Clear candidates (suspects) in parallel: entanglement management perf improvement (and other fixes) #168

Clear candidates (suspects) in parallel: entanglement management perf improvement (and other fixes) #168

Conversation

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 18, 2022

umutacar commented Oct 18, 2022

shwestrick commented Oct 19, 2022