Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only #4734

AlexGuteniev · 2024-06-18T20:08:43Z

Initially I thought that it could be fixed by using careful minmax implementation, that selects correctly either the first or the last value when the comparands are equivalent.

I've learned the behavior of [v]{min|max}{s|p}{s|d} instructions (thanks @statementreply and @Alcaro for enlightening me on that), figured out that it was possible to control which of the equivalent values is the result, also I've reported the compiler bug DevCom-10686775, and found a reliable workaround for it.

Unfortunately, the control over a single minmax instruction result is not enough. The whole value-based vectorization appoach does not work well with order requirements for equivalent elements Efficient vectorization requires vertical comparisons (same elements on different vector values) to be performed first, and horiziontal comparisons (different elements on the same vector value) to be performed last.

With index-based approach, as in minmax_element, changed order is fine, as we're looking for smallest/greatest index among equal elements.

As a result, we have to resort to using minmax_element approach for floating minmax, unless /fp:fast is specified. Should be not a big loss though -- the benchmark results in #4659 shows that smaller types benefit from minmax approach a lot, but floats not a lot. Definitely still way faster than scalar.

/fp:fast is still fine, as the compiler takes advantage of not distinguishing +0.0 and -0,0 and is able to emit vectorized minmax itself (see related issue #4453)

I decided to keep comparisons reordering for floats in -- this seems to improve the handing of NAN values, which is decided to be unsupported, but why won't keep something that accidentally does things better.

⏱️ Benchmark results

C:\Project\STL>out\bench\benchmark-minmax_element.exe --benchmark_min_time=1s --benchmark_filter=(float^|double)
2024-06-22T13:46:09+03:00
Running out\bench\benchmark-minmax_element.exe
Run on (12 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x6)
  L1 Instruction 32 KiB (x6)
  L2 Unified 1280 KiB (x6)
  L3 Unified 12288 KiB (x1)

Benchmark	main	fix	fix + `/fp:fast`
bm<float, 8021, Op::Min>	1184 ns	1207 ns	1221 ns
bm<float, 8021, Op::Max>	1210 ns	1212 ns	1208 ns
bm<float, 8021, Op::Both>	1357 ns	1379 ns	1362 ns
bm<float, 8021, Op::Min_val>	891 ns	1211 ns	891 ns
bm<float, 8021, Op::Max_val>	915 ns	1222 ns	883 ns
bm<float, 8021, Op::Both_val>	955 ns	1378 ns	940 ns
bm<double, 8021, Op::Min>	2246 ns	2393 ns	2352 ns
bm<double, 8021, Op::Max>	2365 ns	2393 ns	2361 ns
bm<double, 8021, Op::Both>	2719 ns	2753 ns	2727 ns
bm<double, 8021, Op::Min_val>	1880 ns	2365 ns	1849 ns
bm<double, 8021, Op::Max_val>	1877 ns	2358 ns	1868 ns
bm<double, 8021, Op::Both_val>	1933 ns	2688 ns	1913 ns

The reodreding of _mm[256]_{min|max}_p{s|d} args seems a bit unfavorable for performance, but not very much, at least the results difference is within variation.

* `<algorithm>` for `generate` * `<climits>` for `CHAR_BIT` (pre-existing) * `<cmath>` for `signbit` * `<cstddef>` for `size_t` * `<cstdint>` for `uint32_t` * `<cstdio>` for `printf` * `<functional>` for `ref` * `<random>` for `mt19937_64`

Add other point-zeros for consistency.

tests/std/include/test_vector_algorithms_support.hpp

tests/std/tests/VSO_0000000_vector_algorithms_floats/env.lst

tests/std/tests/VSO_0000000_vector_algorithms/test.cpp

stl/inc/xutility

stl/src/vector_algorithms.cpp

StephanTLavavej · 2024-08-25T00:08:12Z

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

StephanTLavavej · 2024-08-25T02:50:53Z

I had to push an additional commit to drop my eternal nemeses:

C:\Temp>cl /EHsc /nologo /W4 /fp:strict meow.cpp
meow.cpp

C:\Temp>cl /clr /nologo /W4 /fp:strict meow.cpp
cl : Command line error D8016 : '/clr' and '/fp:strict' command-line options are incompatible

C:\Temp>cl /clr:pure /nologo /W4 /fp:strict meow.cpp
cl : Command line warning D9035 : option 'clr:pure' has been deprecated and will be removed in a future release
cl : Command line error D8016 : '/clr:pure' and '/fp:strict' command-line options are incompatible

AlexGuteniev · 2024-08-25T05:12:28Z

With #162 it could have been noticed in advance

StephanTLavavej · 2024-08-25T17:31:38Z

Thanks for setting the maximum number of bugs in this area to negative zero! ➖ 0️⃣ 😹

Sedicated test coverage for floating minmax of +0.0 and -0.0 only

b1e4baf

AlexGuteniev requested a review from a team as a code owner June 18, 2024 20:08

AlexGuteniev changed the title ~~Sedicated test coverage for floating minmax of +0.0 and -0.0 only~~ Dedicated test coverage for floating minmax of +0.0 and -0.0 only Jun 18, 2024

StephanTLavavej added the test Related to test code label Jun 18, 2024

StephanTLavavej self-assigned this Jun 18, 2024

expand test

18cc8b1

This comment was marked as resolved.

Sign in to view

AlexGuteniev marked this pull request as draft June 19, 2024 07:02

StephanTLavavej removed their assignment Jun 19, 2024

StephanTLavavej added the blocked Something is preventing work on this label Jun 19, 2024

This comment was marked as resolved.

Sign in to view

AlexGuteniev added 2 commits June 20, 2024 19:13

expand test with canned simple case

fba7701

Fix the bug

f0b43fb

AlexGuteniev marked this pull request as ready for review June 20, 2024 16:15

AlexGuteniev added 2 commits June 20, 2024 19:17

Merge branch 'main' into float_zeros

52eda32

fix merge error

4040d48

AlexGuteniev changed the title ~~Dedicated test coverage for floating minmax of +0.0 and -0.0 only~~ Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only Jun 20, 2024

StephanTLavavej removed the blocked Something is preventing work on this label Jun 20, 2024

StephanTLavavej self-assigned this Jun 20, 2024

StephanTLavavej added bug Something isn't working and removed test Related to test code labels Jun 20, 2024

This comment was marked as resolved.

Sign in to view

more interesting predefined cases

75acc88

AlexGuteniev mentioned this pull request Jun 20, 2024

vectorize min/max_element using SSE4.1 for floats #3928

Merged

AlexGuteniev marked this pull request as draft June 21, 2024 12:08

Even more coverage

48c23b5

AlexGuteniev marked this pull request as ready for review June 21, 2024 14:10

AlexGuteniev added 3 commits June 21, 2024 17:13

Implement minmax in terms of minmax_element

7d00070

Fix ascending order

4302235

tail correctness is not needed anymore

1f124c4

StephanTLavavej added 16 commits August 20, 2024 08:21

Include more headers.

437454e

* `<algorithm>` for `generate` * `<climits>` for `CHAR_BIT` (pre-existing) * `<cmath>` for `signbit` * `<cstddef>` for `size_t` * `<cstdint>` for `uint32_t` * `<cstdio>` for `printf` * `<functional>` for `ref` * `<random>` for `mt19937_64`

Include fewer headers.

684cd27

Add std:: qualification.

206ef5c

Remove std:: qualification.

7a0d538

Drop unnecessary if constexpr suppression.

7891ef7

Take a function object instead of a function pointer.

158a007

Header-only functions should be inline.

1058249

Use consistent preprocessor guards.

1384cf8

Add new test to test.lst.

e50cf12

Fix code typo: -1, 0 => -1.0

129c8d7

Fix bug: -0 => -0.0

3ccad6c

Add other point-zeros for consistency.

Add missing quotes.

09450f6

Drop /fp options instead of whole lines.

b1fb57b

Adjust endif comment to match.

8d55da1

Drop duplicate comment in vector_algorithms.cpp.

2e947b6

Improve comments.

b7c840c

StephanTLavavej reviewed Aug 20, 2024

View reviewed changes

StephanTLavavej approved these changes Aug 20, 2024

View reviewed changes

StephanTLavavej removed their assignment Aug 20, 2024

StephanTLavavej mentioned this pull request Aug 20, 2024

Maintainer priorities #4700

Open

StephanTLavavej self-assigned this Aug 25, 2024

Drop /clr and /clr:pure lines.

e1c4d4f

StephanTLavavej approved these changes Aug 25, 2024

View reviewed changes

StephanTLavavej merged commit 3705e36 into microsoft:main Aug 25, 2024
39 checks passed

AlexGuteniev deleted the float_zeros branch August 25, 2024 18:26

AlexGuteniev mentioned this pull request Nov 3, 2024

Partition vector algorithms test: move out lex compare family #5063

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only #4734

Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only #4734

AlexGuteniev commented Jun 18, 2024 •

edited

Loading

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

StephanTLavavej commented Aug 25, 2024

StephanTLavavej commented Aug 25, 2024

AlexGuteniev commented Aug 25, 2024

StephanTLavavej commented Aug 25, 2024

Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only #4734

Floating minmax: fix negative zero handling and dedicated test coverage for arrays of +0.0 and -0.0 only #4734

Conversation

AlexGuteniev commented Jun 18, 2024 • edited Loading

⏱️ Benchmark results

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

StephanTLavavej commented Aug 25, 2024

StephanTLavavej commented Aug 25, 2024

AlexGuteniev commented Aug 25, 2024

StephanTLavavej commented Aug 25, 2024

AlexGuteniev commented Jun 18, 2024 •

edited

Loading