Add two-array radix and sample sorts and a distributed sort #13347

mppf · 2019-07-01T15:35:46Z

Factors bucketize process out into separate parts in the Sort module
Implements two-array radix and sample sorts
Implements a distributed sort
Adds a directory of sort tests in different languages for cross-language performance comparisons. (These are not run automatically).
Includes work towards a prototype implementation of an in-place parallel bucketizer from ips4o in Chapel (added as .notest just so we don't lose track of it)
Does not currently change the sort code that is run for a normal sort called in Sort.sort
Changes test/library/standard/Random/stonea/fillRandom.chpl into a future because Sort now uses Random and that changes its behavior. See Calling functions in a module without any use #13523.

Reviewed by @ben-albrecht - thanks!

full local testing

Each call to _Merge created two temporary arrays that together matched the portion of the Data array being processed. So each level of recursive call used its own set of arrays that combine to the size of Data. Instead, create one Scratch array at the outset (as in chapel-lang#13347).

ben-albrecht

At a high-level, changes look good. Had a few trivial comments.

Implements a distributed sort

🎉

ben-albrecht · 2019-07-23T19:57:50Z

modules/packages/Sort.chpl

+}
+   */
+  // TODO: These shallowCopy functions should handle Block,Cyclic arrays
+  inline proc shallowCopy(ref A, dst, src, nElts) {


Is this something that should be provided as a standalone module, or do we expect this to be handled at the language level relatively soon?

I don't actually know the future of it, presently...

Is there an issue that captures the desire for a better way to do this?

I'm not even sure what to put it an issue right now. For the moment, there is a TODO comment about moving it elsewhere.

ben-albrecht · 2019-07-23T20:00:26Z

modules/packages/Sort.chpl

+
+    proc getNumBuckets() {
+      return numBuckets * (1 + equalBuckets:int);
+    }


If we had a Chapel linter, it would probably complain to you about inconsistent spacing (newlines) between function definitions.

cassella · 2019-07-24T15:37:48Z

modules/packages/Sort.chpl

+
+    if A._instance.isDefaultRectangular() {
+      var size = (nElts:size_t)*c_sizeof(A.eltType);
+      c_memcpy(c_ptrTo(A[dst]), c_ptrTo(A[src]), size);


If there were a possibility of overlap, this should use memmove instead of memcpy. But AFAICT this shallowCopy is presently used only with size=1.

Similarly below, if DstA and SrcA might be the same array. (Or slices or views of the same array.)

cassella · 2019-07-24T15:45:34Z

modules/packages/Sort.chpl

+      var start = tid * blockSize;
+      var n = blockSize;
+      if start + n > nElts then
+        n = nElts - start;


If nTasks > nElts + 1, you'll end up with a negative n here -- is that what's wrong?

@cassella

Two array zero-copy mergesort [PR by @cassella - thanks! Reviewed/tested/merged by @mppf] Each call to `_Merge` created two temporary arrays that together matched the portion of the Data array being processed. So each level of recursive call used its own set of arrays that combine to the size of Data. Instead, create one Scratch array at the outset (as in #13347). Rather than copy our portion of Data into Scratch to start off each `_Merge()`, the recursive levels will alternate merging from Data into Scratch, and merging from Scratch into Data. Performance is a little hard to get a feel for on my laptop, but I think I'm seeing 15-20% improvement in mergesort. The variability decreased a lot too. I tested with array sizes 1000-1040 and 2020-2060 to hit different points near minlen, starting near an even and an odd power of 2. Those runs had a halt() at the "we'll never reach this point" comment, which was not hit.

@ronawho

Distributed sort improvements `modules/packages/Sort.chpl` includes two implementations of distributed sorting: * `Sort.TwoArrayRadixSort.twoArrayRadixSort` * `Sort.TwoArraySampleSort.twoArraySampleSort` Neither of these `Sort.TwoArray*` submodules are included by default (and so they have limited impact on compilation time). Additionally, they are not documented. However, they can provide more efficient sorting for Block-distributed arrays and they include code specifically for the distributed case. These sorts were added in PR #13347. See also https://chapel-lang.org/CHIUW/2019/Ferguson.pdf for more information about these TwoArray sorts. Note that twoArraySampleSort does not currently compile for distributed arrays. This PR takes several steps to improve the performance of distributed two-array sorting: * use two-array sorting for large subproblems within a locale (instead of calling msbRadixSort which is in-place and less parallel) * perform recursive sub-problems in the distributed sort in a distributed manner. Instead of having all locales participate in each distributed subproblem - assign each subproblem an "owner" and from there create tasks for all the locales involved in that subproblem. Since a given locale can be involved in two subproblems (when the bucket boundary for the subproblem is within the region of the array owned by that locale) we added state1 and state2 to keep separate counts in these cases. This change also involved storing the tasks (subproblems) in a nested structure - each subproblem contains a representation of which locales will be involved in it. * optimized the all-to-all communication of the counts This PR improves scalability of `twoArrayRadixSort` quite a bit. Thanks to @ronawho for reviewing and gathering performance information! - [x] full local testing - [x] test/library/packages/Sort with `CHPL_COMM=gasnet`

cassella mentioned this pull request Jul 11, 2019

Two array zero-copy mergesort #13432

Merged

mppf force-pushed the distributed-radix-sort branch from 9939144 to a2c8e32 Compare July 17, 2019 13:01

mppf added 3 commits July 23, 2019 10:00

Add some cross-language random sort comparisons

65a0cb2

Two-array and distributed sorters

5776fae

Private use dependencies

60d1637

mppf force-pushed the distributed-radix-sort branch from ec8a087 to 60d1637 Compare July 23, 2019 14:15

mppf added 2 commits July 23, 2019 13:56

Future-ize fillRandom test, see 13523

5ec3f28

Add to the modules currently parsed

9e61ab6

ben-albrecht approved these changes Jul 23, 2019

View reviewed changes

mppf added 2 commits July 24, 2019 10:06

Add .numlocales files so these run multilocale

4539429

Tidy comments

ed1dbd3

mppf merged commit cacd1f4 into chapel-lang:master Jul 24, 2019

mppf deleted the distributed-radix-sort branch July 24, 2019 14:16

cassella reviewed Jul 24, 2019

View reviewed changes

This was referenced Jul 29, 2019

memory leak with coforall reducing an array #13569

Closed

Add .skipif for two-array-radix-sort-bug under valgrind #13567

Merged

vasslitvinov mentioned this pull request Aug 1, 2019

Non-nilable type checking meta-issue #13598

Closed

37 tasks

This was referenced Dec 15, 2020

Distributed sort improvements #14566

Closed

Distributed sort improvements #16871

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two-array radix and sample sorts and a distributed sort #13347

Add two-array radix and sample sorts and a distributed sort #13347

mppf commented Jul 1, 2019 •

edited

Loading

ben-albrecht left a comment

ben-albrecht Jul 23, 2019

mppf Jul 24, 2019

ben-albrecht Jul 24, 2019

mppf Jul 24, 2019

ben-albrecht Jul 23, 2019

cassella Jul 24, 2019

cassella Jul 24, 2019

Add two-array radix and sample sorts and a distributed sort #13347

Add two-array radix and sample sorts and a distributed sort #13347

Conversation

mppf commented Jul 1, 2019 • edited Loading

ben-albrecht left a comment

Choose a reason for hiding this comment

ben-albrecht Jul 23, 2019

Choose a reason for hiding this comment

mppf Jul 24, 2019

Choose a reason for hiding this comment

ben-albrecht Jul 24, 2019

Choose a reason for hiding this comment

mppf Jul 24, 2019

Choose a reason for hiding this comment

ben-albrecht Jul 23, 2019

Choose a reason for hiding this comment

cassella Jul 24, 2019

Choose a reason for hiding this comment

cassella Jul 24, 2019

Choose a reason for hiding this comment

mppf commented Jul 1, 2019 •

edited

Loading