Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize ranges::find_last #3925

Merged
merged 14 commits into from
Oct 20, 2023
Merged

Vectorize ranges::find_last #3925

merged 14 commits into from
Oct 20, 2023

Conversation

AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Aug 5, 2023

Resolves #3274

The implementation in vector_algorithm.cpp is similar to the forward find, except:

  • Advancing the _Last to negative direction
    • For negative advance, _Rewind_bytes was introduced. This helps avoiding casting size_t to ptrdiff_t and running into potential UB due to overflow when the range is more than half addressable space
    • The existing _Advance_bytes was made template to fix pre-existing potentially large size_t to ptrdiff_t conversion
  • Advance before the indirection, so that start with past-the-last pointer and stop on first
  • Still returning last pointer on failure, save it for such case
  • _BitScanForward -> _BitScanReverse. _lzcnt_u32 -> _tzcnt_u32
    • both _BitScanForward and_BitScanReverse index from the least significant bit, so no changes here
    • _tzcnt_u32 index from most significant bit, so reverse using 31 - x
    • AVX2 imply _tzcnt_u32 is present, same as _lzcnt_u32. We have precedent in <__msvc_bit_utils.hpp>

The integration in <algorithm> is similar to ranges::find, except that we don't support unsized ranges (in this regard, similar to ranges::count).

The test uses the same random data as for forward find. I made sure it covers all branches of the algorithm.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner August 5, 2023 21:02
@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 6, 2023
@StephanTLavavej StephanTLavavej self-assigned this Aug 7, 2023
@StephanTLavavej StephanTLavavej removed their assignment Oct 7, 2023
@StephanTLavavej StephanTLavavej changed the title vectorize find_last() Vectorize ranges::find_last Oct 7, 2023
@StephanTLavavej StephanTLavavej added the ranges C++20/23 ranges label Oct 7, 2023
@CaseyCarter CaseyCarter self-assigned this Oct 7, 2023
@AlexGuteniev
Copy link
Contributor Author

I observe that #4004 does a bulk change, which applies here too, will do when that PR lands

it is only good for sizeof(_Ty) == 1 and adds too much complexity
@StephanTLavavej

This comment was marked as resolved.

@CaseyCarter CaseyCarter removed their assignment Oct 18, 2023
@StephanTLavavej StephanTLavavej self-assigned this Oct 19, 2023
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 408dd89 into microsoft:main Oct 20, 2023
@StephanTLavavej
Copy link
Member

Thanks for optimizing this new algorithm! 🚀 🚀 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster ranges C++20/23 ranges
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<algorithm>: find_last() could probably be vectorized
3 participants