@@ -25,14 +25,14 @@ objects. `Func` needs to have the following signature:
2525
2626Note that the return type of the key ` type_t ` needs to be one of the following
2727: ` [float, uint32_t, int32_t, double, uint64_t, int64_t] ` . ` object_qsort ` has a
28- space complexity of ` O(N) ` . Specifically, it requires ` arrsize*(sizeof(type_t) `
29- \+ ` sizeof(uint32_t)) ` additional space. It allocates two ` std::vectors ` : one
30- for storing all the keys and another storing the indexes of the object array.
31- For performance reasons, we support ` object_qsort ` only when the array size
32- is less than or equal to ` UINT32_MAX ` . An example usage of ` object_qsort `
33- is provided in the [ examples] ( #Sort-an-array-of-Points-using-object_qsort )
34- section. Refer to [ section] ( #Performance-of-object_qsort ) to get a sense
35- of how fast this is relative to ` std::sort ` .
28+ space complexity of ` O(N) ` . Specifically, it requires `arrsize *
29+ sizeof(type_t)` bytes to store a vector with all the keys and an additional
30+ ` arrsize * sizeof(uint32_t) ` bytes to store the indexes of the object array.
31+ For performance reasons, we support ` object_qsort ` only when the array size is
32+ less than or equal to ` UINT32_MAX ` . An example usage of ` object_qsort ` is
33+ provided in the [ examples] ( #Sort-an-array-of-Points-using-object_qsort )
34+ section. Refer to [ section] ( #Performance-of-object_qsort ) to get a sense of
35+ how fast this is relative to ` std::sort ` .
3636
3737## Sort an array of built-in integers and floats
3838``` cpp
@@ -143,23 +143,29 @@ array. You can read details of all the implementations
143143[here](https://github.com/intel/x86-simd-sort/blob/main/src/README.md).
144144
145145## Performance comparison on AVX-512: `object_qsort` v/s `std::sort`
146- `object_qsort` relies on key-value sort which is currently accelerated only on
147- AVX-512 (we plan to add AVX2 version soon). Benchmarks added in
148- [bench-objsort.hpp](./benchmarks/bench-objsort.hpp) measures performance of
149- `object_qsort` relative to `std::sort` when sorting an array of `struct Point
150- {double x, y, z;}` and `struct Point {float x, y, x;}` for various metrics:
146+ Performance of `object_qsort` can vary significantly depending on the defintion
147+ of the custom class and we highly recommend benchmarking before using it. For
148+ the sake of illustration, we provide a few examples in
149+ [./benchmarks/bench-objsort.hpp](./benchmarks/bench-objsort.hpp) which measures
150+ performance of `object_qsort` relative to `std::sort` when sorting an array of
151+ points in the cartesian coordinates represented by the class: `struct Point
152+ {double x, y, z;}` and `struct Point {float x, y, x;}`. We sort these points
153+ based on several different metrics:
151154
152155+ sort by coordinate `x`
153156+ sort by manhanttan distance (relative to origin): `abs(x) + abx(y) + abs(z)`
154157+ sort by Euclidean distance (relative to origin): `sqrt(x*x + y*y + z*z)`
155158+ sort by Chebyshev distance (relative to origin): `max(x, y, z)`
156159
157- The data was collected on a processor with AVX-512 and is shown in the plot
158- below. For the simplest of cases where we want to sort an array of struct by
159- one of its members, `object_qsort` can be up-to 5x faster for 32-bit data type
160- and about 4x for 64-bit data type. It tends to do better when the metric to
161- sort by gets more complicated. Sorting by Euclidean distance can be up-to 10x
162- faster.
160+ The performance data (shown in the plot below) can be collected by building the
161+ benchmarks suite and running `./builddir/benchexe --benchmark_filter==*obj*`.
162+ The data plot shown below was collected on a processor with AVX-512 because
163+ `object_qsort` is currently accelerated only on AVX-512 (we plan to add the
164+ AVX2 version soon). For the simplest of cases where we want to sort an array of
165+ struct by one of its members, `object_qsort` can be up-to 5x faster for 32-bit
166+ data type and about 4x for 64-bit data type. It tends to do even better when
167+ the metric to sort by gets more complicated. Sorting by Euclidean distance can
168+ be up-to 10x faster.
163169
164170
165171
0 commit comments