-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefer xor to sub for counting bits #4693
Conversation
For subtracting from 63 or 31 alone, exclusive or is faster than sub because exclusive or doesn't set any flags, unlike sub.
This is not correct. On x86 and x86-64 xor does set flags. Still, this is indeed a bit faster, because The previous attempt was rejected due to little gain for more complexity, it was suggested that the compiler should implement this optimization instead. If we are doing this, should also change the occurrences in |
The compiler doing this was actually rejected for multiple reasons. |
It also is because this only works if the compiler knows bitscan returns a number between 0 and 31 |
The relevant previous discussion is #3925 (comment) Clang can do this optimization, so in theory, it's possible. https://godbolt.org/z/hc8hrj8eo But in practice, the definition of _BitScanReverse is more complex than __builtin_clz - latter has undefined behavior if input is zero, former returns an unspecified value that's not guaranteed to be in the range 0 to 31 (this optimization is only valid if rhs <= lhs; 31-32 != 31^32). It is possible to model the builtin in a way that allows this optimization, but it's more complicated than simply 'the return value is always 0 to 31, and if it isn't, you dun goofed and it's not a compiler bug'. Long-term, teaching the compiler to optimize that is the better solution. But I don't know how long term; it may be long enough that it's wiser to install a short-term solution as well (with a TRANSITION comment). |
Previous discussion was related specifically to The optimization in that place has less benefits and adds more confusion, so the outcome in general may be reconsidered. |
Thanks for looking into this. We talked about this in the weekly maintainer meeting, and have decided that the potential savings of less than a cycle are outweighed by the decrease in code clarity. We believe that the compiler should ultimately be responsible for emitting ideal codegen here, and DevCom-10487304 is still active, so we're going to close this PR without merging. |
For subtracting from 63 or 31 alone, exclusive or is faster than sub because exclusive or doesn't set any flags, unlike sub.