Over the last few months I spent much more time looking at plants and animals around me than I spent working on sorting algorithms, or working on any computer-related hobby project really. While I was trying to cleanup and finish this small new version — be it only to release the little work that had been done months ago —, I started thinking about using hierarchies inspired by biological taxonomy to classify algorithms. Imagine giving binomial names to algorithms: we might have a Sorteriformes order, with the big Partitionaceae and Mergeaceae families among others, Quicksortum hoareum and Quicksortum lomuto species, with their various optimized subspecies. Would radix sorts be part of the same order, or are they a case of convergent evolution? What would the global cladistic picture look like?
Anyway, for all the fun it might me, I have zero will to actually classify algorithms. That's a task I happily leave to others. In the meantime, please enjoy cpp-sort 1.16.0 for what it brings!
Deprecations
counting_adapter
is now deprecated, metrics::comparisons
should be used instead.
Deprecated components will be removed in cpp-sort 2.0.0.
Deprecation warnings can be disabled by defining the preprocessor macro CPPSORT_DISABLE_DEPRECATION_WARNINGS
.
New features
New metric: metrics::moves
: it computes the number of moves performed when sorting a collection. The current implementation is fairly primitive with a number of limitations:
- It allocates a contiguous memory buffer of a special type to work, so if a sorter behaves differently based on the iterator category, only its behavior over contiguous iterators will have its number of moves counted.
- Sorters that call operations specific to some types might return a result that is not representative of how they actually perform: this is due to the inner wrapper type not benefiting from the specializations.
- Projection support is mandatory:
metrics::moves
passes a projection to the sorter in order to convert the wrapper type to its underlying type. - Swap operations are not counted separately: the metric considers each swap operation to be three moves.
Some of these limits might be lifted or at least mitigated in the future.
Improvements
Algorithmic & speed improvements:
poplar_sort
'ssift
function was changed to use cycles of moves instead of swaps, cutting roughly in half the number of moves performed by the algorithm in the average case. It makes little difference for primitive types from a speed point of view, but might be noticeable when types become more expensive to move.
Other improvements:
- Bump downloaded version of libassert to v1.2.2.
metrics::comparisons
andmetrics::projections
now honouris_probably_branchless_comparison
andis_probably_branchless_projection
respectively: they pretend to be branchless when the wrapped callable is branchless, in order to better represent how many times said callable would be called — they will now report more accurate results for algorithms that change strategies based on branchlessness.
Tooling
Documentation:
- I recently realized that the implementation of
partition
for forward iterators used a Lomuto partition scheme, which runs in O(n) time. For some reason I had wrongly assumed thatpartition
ran in O(n log n) time for forward iterators. The documentation for algorithms relying onpartition
over forward iterators was consequently updated as follows:quick_merge_sorter
runs in O(n log n) time on forward iterators instead of O(n log² n) time.quick_sorter
runs in O(n log n) time on forward iterators instead of O(n log² n) time.
- Add complexity tables to the documentation of
drop_merge_adapter
andsplit_adapter
. - Remove or replace references to deprecated features when possible.
Benchmarks:
- The
cpu_cycles
metrics helper can now be converted to a function pointer when empty. - The inversions benchmark was moved to a presortedness directory, and tweaked with the goal of making it easier to use to see how different sorting algorithms fare when generating sequences with different kinds of presortedness.
- A new script was added to that new presortedness directory to plot the accuracy of distributions that generate more or less disorder. knowing how a sorting algorithm behaves with regard to a specific measure of presortedness first requires being able to accurately generate a collection with the desired amount of disorder.
As we can see in the graph above,dist::inv
now produces an amount of Inv(X) close to the required one, but it is not perfect either in that regard. dist::inversions
was renamed todist::inv
to better reflect that it is meant to create sequences with more or less Inv, and its implementation was changed to more accurately generate sequences with 0% to 100% inversions.
The way that completion was implemented however shows in the graph above: the [0, 0.5] and [0.5, 1] ranges indist::inv
are implemented in such a way that Inv(X) keeps increasing (as per the previous bullet point), but they use two different algorithms and other measures of presortedness likely give a decreasing presortedness in the second half. If we plotinsertion_sort
againstdist::inv
, which is only Inv-adaptive, we can see that it gets slower as Inv increases in a pretty linear fashion:
Test suite:
- Fix Clang
-Winvalid-utf8
warnings. - Bump downloaded version of Catch2 to v3.6.0.
Miscellaneous:
- Branches
master
anddevelop
have been replaced with branches1.x.y-stable
and1.x.y-develop
respectively, paving the way for a future 2.0.0 release. - The
conanfile.py
that ships with the library now only works with Conan 2.0+. The packages available on Conan Center should still work with Conan 1.x. - Add a minimal
.clang-tidy
file to the repository. Some diagnostics still fire as configured, it is not a goal to fix all of them right away. - CI: make the communication with codecov more robust.
- CI: update MacOS runner image to
macos-12
, the older one being now deprecated.
Known bugs
I didn't manage to fix every bug I could find since the previous release, so you might want to check the list of known bugs.