Skip to content

1.16.0 — Taxonomics

Latest
Compare
Choose a tag to compare
@Morwenn Morwenn released this 19 Jun 22:18
5f4ec36

DOI

Over the last few months I spent much more time looking at plants and animals around me than I spent working on sorting algorithms, or working on any computer-related hobby project really. While I was trying to cleanup and finish this small new version — be it only to release the little work that had been done months ago —, I started thinking about using hierarchies inspired by biological taxonomy to classify algorithms. Imagine giving binomial names to algorithms: we might have a Sorteriformes order, with the big Partitionaceae and Mergeaceae families among others, Quicksortum hoareum and Quicksortum lomuto species, with their various optimized subspecies. Would radix sorts be part of the same order, or are they a case of convergent evolution? What would the global cladistic picture look like?

Anyway, for all the fun it might me, I have zero will to actually classify algorithms. That's a task I happily leave to others. In the meantime, please enjoy cpp-sort 1.16.0 for what it brings!

Deprecations

counting_adapter is now deprecated, metrics::comparisons should be used instead.

Deprecated components will be removed in cpp-sort 2.0.0.

Deprecation warnings can be disabled by defining the preprocessor macro CPPSORT_DISABLE_DEPRECATION_WARNINGS.

New features

New metric: metrics::moves: it computes the number of moves performed when sorting a collection. The current implementation is fairly primitive with a number of limitations:

  • It allocates a contiguous memory buffer of a special type to work, so if a sorter behaves differently based on the iterator category, only its behavior over contiguous iterators will have its number of moves counted.
  • Sorters that call operations specific to some types might return a result that is not representative of how they actually perform: this is due to the inner wrapper type not benefiting from the specializations.
  • Projection support is mandatory: metrics::moves passes a projection to the sorter in order to convert the wrapper type to its underlying type.
  • Swap operations are not counted separately: the metric considers each swap operation to be three moves.

Some of these limits might be lifted or at least mitigated in the future.

Improvements

Algorithmic & speed improvements:

  • poplar_sort's sift function was changed to use cycles of moves instead of swaps, cutting roughly in half the number of moves performed by the algorithm in the average case. It makes little difference for primitive types from a speed point of view, but might be noticeable when types become more expensive to move.
    Bar plot showing the number of moves performed by poplar_sort against different patterns in cpp-sort 1.15.0 and 1.16.0

Other improvements:

Tooling

Documentation:

  • I recently realized that the implementation of partition for forward iterators used a Lomuto partition scheme, which runs in O(n) time. For some reason I had wrongly assumed that partition ran in O(n log n) time for forward iterators. The documentation for algorithms relying on partition over forward iterators was consequently updated as follows:
    • quick_merge_sorter runs in O(n log n) time on forward iterators instead of O(n log² n) time.
    • quick_sorter runs in O(n log n) time on forward iterators instead of O(n log² n) time.
  • Add complexity tables to the documentation of drop_merge_adapter and split_adapter.
  • Remove or replace references to deprecated features when possible.

Benchmarks:

  • The cpu_cycles metrics helper can now be converted to a function pointer when empty.
  • The inversions benchmark was moved to a presortedness directory, and tweaked with the goal of making it easier to use to see how different sorting algorithms fare when generating sequences with different kinds of presortedness.
  • A new script was added to that new presortedness directory to plot the accuracy of distributions that generate more or less disorder. knowing how a sorting algorithm behaves with regard to a specific measure of presortedness first requires being able to accurately generate a collection with the desired amount of disorder.
    Plot showing how good dist::inv is at producing Inv(X) disorder
    As we can see in the graph above, dist::inv now produces an amount of Inv(X) close to the required one, but it is not perfect either in that regard.
  • dist::inversions was renamed to dist::inv to better reflect that it is meant to create sequences with more or less Inv, and its implementation was changed to more accurately generate sequences with 0% to 100% inversions.
    heap_sort, drop_merge_adapter(heap_sort) and split_adapter(heap_sort) against Inv(X)=0%-100%
    The way that completion was implemented however shows in the graph above: the [0, 0.5] and [0.5, 1] ranges in dist::inv are implemented in such a way that Inv(X) keeps increasing (as per the previous bullet point), but they use two different algorithms and other measures of presortedness likely give a decreasing presortedness in the second half. If we plot insertion_sort against dist::inv, which is only Inv-adaptive, we can see that it gets slower as Inv increases in a pretty linear fashion:
    Same as the preceding plot, with insertion_sort added

Test suite:

  • Fix Clang -Winvalid-utf8 warnings.
  • Bump downloaded version of Catch2 to v3.6.0.

Miscellaneous:

  • Branches master and develop have been replaced with branches 1.x.y-stable and 1.x.y-develop respectively, paving the way for a future 2.0.0 release.
  • The conanfile.py that ships with the library now only works with Conan 2.0+. The packages available on Conan Center should still work with Conan 1.x.
  • Add a minimal .clang-tidy file to the repository. Some diagnostics still fire as configured, it is not a goal to fix all of them right away.
  • CI: make the communication with codecov more robust.
  • CI: update MacOS runner image to macos-12, the older one being now deprecated.

Known bugs

I didn't manage to fix every bug I could find since the previous release, so you might want to check the list of known bugs.