Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize find and count for vector<bool> #1131

Merged
merged 11 commits into from
Jun 12, 2021

Conversation

miscco
Copy link
Contributor

@miscco miscco commented Aug 2, 2020

This builds uppon the recently merged specialization of fill for vector<bool>.

Note that the specializations require __cpp_lib_bitops

@miscco miscco requested a review from a team as a code owner August 2, 2020 17:15
@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 3, 2020
Copy link
Member

@barcharcraz barcharcraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewed use of bit, and some on functionality. I need to become more familiar with how vector works before I can give more functional feedback, but this should outline next steps.

Sorry it took me so long to get to this.

stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
@StephanTLavavej

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@miscco
Copy link
Contributor Author

miscco commented Oct 14, 2020

@barcharcraz can you give an update on the status of the builtins?

@barcharcraz
Copy link
Member

@barcharcraz can you give an update on the status of the builtins?

we have no intention of adding the builtins, and they have been removed. We found the current way we implement these functions to be better given how our compiler is put together.

You'll have to move popcount to some header that vector can include.

@miscco miscco force-pushed the vector_bool_count branch 2 times, most recently from 3e60dda to 149978e Compare November 28, 2020 09:30
@miscco
Copy link
Contributor Author

miscco commented Nov 28, 2020

Rebased and squashed down now using std::popcount

@miscco miscco force-pushed the vector_bool_count branch 3 times, most recently from cbfe024 to 5d3fbcd Compare November 28, 2020 12:10
@miscco
Copy link
Contributor Author

miscco commented Nov 28, 2020

@barcharcraz I believe this is ready for another round of review

@miscco miscco force-pushed the vector_bool_count branch from 52ed289 to 9192585 Compare December 3, 2020 11:43
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

I've pushed a merge with master. There was a merge conflict in <bit> caused by clang-format 11 adjusting whitespace in code you removed; I accepted your change.

@miscco
Copy link
Contributor Author

miscco commented Dec 15, 2020

Thanks a lot, I was 99% sure that i already merge master.
Seems like not pushing is a bad idea

@miscco
Copy link
Contributor Author

miscco commented Dec 15, 2020

Urgh, Seems I has an unclean state and the use cbegin for fill commit is still there 🤦

@StephanTLavavej

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@miscco
Copy link
Contributor Author

miscco commented May 3, 2021

@barcharcraz are there any further blockers here?

Copy link
Member

@barcharcraz barcharcraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with suggestions (it still needs final review). Also: before / after benchmarks would be awesome.

stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved

template <class _InIt, class _Ty>
_NODISCARD _CONSTEXPR20 _InIt _Find_vbool(_InIt _First, const _InIt _Last, const _Ty& _Val) {
// Find _Val in [_First, _Last)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not a super helpful comment and could be removed (we generally remove these sorts of comments when we see them, as there are many in older code). It doesn't need to block this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it as we have it in all others

@miscco
Copy link
Contributor Author

miscco commented May 28, 2021

So I ran the itsy.bitsy benchmarks from the wonderfull @ThePhD

The relevant results are for current version:

Running C:\ws\itsy_bitsy\build\x64\bin\Release\itsy.bitsy.benchmarks.exe
Run on (8 X 1992 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
count_vector_bool                    95765 ns        96257 ns         7467
count_itsy_bitsy                       842 ns          854 ns       896000
find_vector_bool                     84993 ns        85449 ns         8960
find_itsy_bitsy                      49712 ns        50000 ns        10000

With this PR:

Running C:\ws\itsy_bitsy\build\x64\bin\Release\itsy.bitsy.benchmarks.exe
Run on (8 X 1992 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x4)
  L1 Instruction 32 KiB (x4)
  L2 Unified 256 KiB (x4)
  L3 Unified 8192 KiB (x1)
--------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations
--------------------------------------------------------------------------
count_vector_bool                     4666 ns         4353 ns       186667
count_itsy_bitsy                      1607 ns         1562 ns       640000
find_vector_bool                      3205 ns         3209 ns       224000
find_itsy_bitsy                      50459 ns        51562 ns        10000

So we are still losing out against itsy.bitsy, which is not unexpected. But the performance increase is

count:    95'000 ns -> 5'000ns -> 19x improvement
find:     85'000ns  -> 3'200ns -> 26x improvement

I believe this is no too shabby

@miscco
Copy link
Contributor Author

miscco commented May 28, 2021

@barcharcraz I added the optimizations to ranges::count and ranges::find

stl/inc/limits Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
@miscco miscco force-pushed the vector_bool_count branch from ab9bf06 to e1cbee3 Compare June 4, 2021 06:23
@StephanTLavavej StephanTLavavej self-assigned this Jun 4, 2021
@StephanTLavavej StephanTLavavej removed their assignment Jun 10, 2021
@StephanTLavavej StephanTLavavej self-assigned this Jun 11, 2021
@StephanTLavavej
Copy link
Member

Internal testing revealed that /clr:pure rejects this intrinsic (as it does for almost all intrinsics). I belatedly realized that CUDA and the Intel C++ Compiler may also be incompatible. I've extracted these conditions into a macro and verified that /clr:pure is happy now.

_Countr_zero's usage of tzcnt is unaffected because it behaves differently - it uses the constexpr fallback for C++17 mode, and only for C++20 mode (when is_constant_evaluated() is available) does it consider activating the runtime optimization. This inherently excludes /clr:pure and also excludes CUDA for the time being (and possibly the Intel C++ Compiler).

@StephanTLavavej StephanTLavavej merged commit 264765b into microsoft:main Jun 12, 2021
@StephanTLavavej
Copy link
Member

Thanks for these order-of-magnitude speedups! 🚀 🚀 🚀

@miscco miscco deleted the vector_bool_count branch June 12, 2021 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants