Prefer xor to sub for counting bits #4693

AZero13 · 2024-05-24T00:32:15Z

For subtracting from 63 or 31 alone, exclusive or is faster than sub because exclusive or doesn't set any flags, unlike sub.

AlexGuteniev · 2024-05-24T02:51:10Z

exclusive or doesn't set any flags, unlike sub.

This is not correct. On x86 and x86-64 xor does set flags.

Still, this is indeed a bit faster, because xor is commutative, the compiler reorders args, and the constant becomes immediate, which saves an instruction.

The previous attempt was rejected due to little gain for more complexity, it was suggested that the compiler should implement this optimization instead.

If we are doing this, should also change the occurrences in vector_algorithms.cpp. And the remaining occurrences in __msvc_bit_utils.hpp. Some may be spelled as _Digits - 1 - _Result or something.

AZero13 · 2024-05-24T14:44:50Z

The compiler doing this was actually rejected for multiple reasons.

AZero13 · 2024-05-24T14:48:46Z

It also is because this only works if the compiler knows bitscan returns a number between 0 and 31

Alcaro · 2024-05-24T15:27:46Z

The relevant previous discussion is #3925 (comment)

Clang can do this optimization, so in theory, it's possible. https://godbolt.org/z/hc8hrj8eo

But in practice, the definition of _BitScanReverse is more complex than __builtin_clz - latter has undefined behavior if input is zero, former returns an unspecified value that's not guaranteed to be in the range 0 to 31 (this optimization is only valid if rhs <= lhs; 31-32 != 31^32).

It is possible to model the builtin in a way that allows this optimization, but it's more complicated than simply 'the return value is always 0 to 31, and if it isn't, you dun goofed and it's not a compiler bug'.

Long-term, teaching the compiler to optimize that is the better solution. But I don't know how long term; it may be long enough that it's wiser to install a short-term solution as well (with a TRANSITION comment).

AlexGuteniev · 2024-05-28T16:04:51Z

Previous discussion was related specifically to vector_algoritmgs.cpp #3925 (comment)

The optimization in that place has less benefits and adds more confusion, so the outcome in general may be reconsidered.

StephanTLavavej · 2024-05-29T21:06:59Z

Thanks for looking into this. We talked about this in the weekly maintainer meeting, and have decided that the potential savings of less than a cycle are outweighed by the decrease in code clarity. We believe that the compiler should ultimately be responsible for emitting ideal codegen here, and DevCom-10487304 is still active, so we're going to close this PR without merging.

AZero13 requested a review from a team as a code owner May 24, 2024 00:32

Prefer xor to sub for counting bits

b5d9173

For subtracting from 63 or 31 alone, exclusive or is faster than sub because exclusive or doesn't set any flags, unlike sub.

AZero13 force-pushed the r branch from d28057c to b5d9173 Compare May 24, 2024 00:32

StephanTLavavej added performance Must go faster decision needed We need to choose something before working on this labels May 28, 2024

StephanTLavavej removed the decision needed We need to choose something before working on this label May 29, 2024

StephanTLavavej closed this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefer xor to sub for counting bits #4693

Prefer xor to sub for counting bits #4693

AZero13 commented May 24, 2024 •

edited

Loading

AlexGuteniev commented May 24, 2024 •

edited

Loading

AZero13 commented May 24, 2024

AZero13 commented May 24, 2024

Alcaro commented May 24, 2024

AlexGuteniev commented May 28, 2024

StephanTLavavej commented May 29, 2024

Prefer xor to sub for counting bits #4693

Prefer xor to sub for counting bits #4693

Conversation

AZero13 commented May 24, 2024 • edited Loading

AlexGuteniev commented May 24, 2024 • edited Loading

AZero13 commented May 24, 2024

AZero13 commented May 24, 2024

Alcaro commented May 24, 2024

AlexGuteniev commented May 28, 2024

StephanTLavavej commented May 29, 2024

AZero13 commented May 24, 2024 •

edited

Loading

AlexGuteniev commented May 24, 2024 •

edited

Loading