Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize fill and fill_n for vector<bool> #879

Merged
merged 13 commits into from
Aug 1, 2020

Conversation

miscco
Copy link
Contributor

@miscco miscco commented Jun 5, 2020

This partially addresses #625 by specializing fill and fill_n

This is essentially a subpart of #750 with one of the algorithms where the implementation is both simple and a clear win.

@miscco miscco requested a review from a team as a code owner June 5, 2020 10:09
@miscco miscco force-pushed the vector_bool_fill branch 2 times, most recently from 1461cae to dbc59be Compare June 5, 2020 12:35
@StephanTLavavej StephanTLavavej added the performance Must go faster label Jun 5, 2020
@miscco miscco force-pushed the vector_bool_fill branch from dbc59be to 8bfcede Compare June 5, 2020 18:41
@miscco miscco force-pushed the vector_bool_fill branch from 8bfcede to f477816 Compare June 5, 2020 19:37
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
@miscco miscco force-pushed the vector_bool_fill branch from 5144f95 to d9be996 Compare June 26, 2020 18:30
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
@BillyONeal
Copy link
Member

BillyONeal commented Jul 21, 2020

I spent some time messing with this and ended up effectively identical to yours except the calculation of _LastSourceMask is guarded by _Last._Myoff != 0 which resolves the forbidden full shift issue.

template <class _FwdIt, class _Ty>
_CONSTEXPR20 void _Fill_vbool(_FwdIt _First, _FwdIt _Last, const _Ty& _Val) {
    // Set [_First, _Last) to _Val
    if (_First == _Last) {
        return;
    }

    const auto _FillVal         = static_cast<_Vbase>(_Val ? -1 : 0);
    _Vbase* _Vb_first            = const_cast<_Vbase*>(_First._Myptr);
    const _Vbase* const _Vb_last = const_cast<_Vbase*>(_Last._Myptr);

    _STL_INTERNAL_CHECK(_First._Myoff != _VBITS);
    const auto _FirstSourceMask = static_cast<_Vbase>(-1) << _First._Myoff;
    const auto _FirstDestMask   = static_cast<_Vbase>(~_FirstSourceMask);

    if (_Vb_first == _Vb_last) {
        // special case only one block
        _STL_INTERNAL_CHECK(_Last._Myoff != 0); // because we have at least one bit to set
        const auto _LastSourceMask  = static_cast<_Vbase>(-1) >> (_VBITS - _Last._Myoff);
        const auto _LastDestMask    = static_cast<_Vbase>(~_LastSourceMask);
        const auto _SourceMask = _FirstSourceMask & _LastSourceMask;
        const auto _DestMask   = _FirstDestMask | _LastDestMask;
        *_Vb_first               = (*_Vb_first & _DestMask) | (_FillVal & _SourceMask);
        return;
    }

    // fill in trailing bits of the first block and move _Vb_first up
    *_Vb_first               = (*_Vb_first & _FirstDestMask) | (_FillVal & _FirstSourceMask);
    ++_Vb_first;

#ifdef __cpp_lib_is_constant_evaluated
    if (_STD is_constant_evaluated()) {
        for (; _Vb_first != _Vb_last; ++_Vb_first) {
            *_Vb_first = _FillVal;
        }
    } else
#endif // __cpp_lib_is_constant_evaluated
    {
        const auto _Vb_first_ch = reinterpret_cast<const char*>(_Vb_first);
        const auto _Vb_last_ch  = reinterpret_cast<const char*>(_Vb_last);
        const auto _Count_ch  = static_cast<size_t>(_Vb_last_ch - _Vb_first_ch);
        const auto _ValChar   = static_cast<unsigned char>(_Val ? -1 : 0);
        _CSTD memset(_Vb_first, _ValChar, _Count_ch);
        _Vb_first += _Vb_last - _Vb_first;
    }

    if (_Last._Myoff != 0) {
        // fill in leading bits of the last block *_Vb_last
        const auto _LastSourceMask  = static_cast<_Vbase>(-1) >> (_VBITS - _Last._Myoff);
        const auto _LastDestMask    = static_cast<_Vbase>(~_LastSourceMask);
        *_Vb_last = (*_Vb_last & _LastDestMask) | (_FillVal & _LastSourceMask);
    }
}

does that seem OK?

Copy link
Member

@BillyONeal BillyONeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to not do the forbidden full shift; other than that my other comments were extreme nitpicks or wrong.

@miscco
Copy link
Contributor Author

miscco commented Jul 22, 2020

Thanks that looks really good. I believe the _STL_INTERNAL_CHECK(_Last._Myoff != 0); // because we have at least one bit to set in the special case is not needed as we have the _First == _Last early return at the beginning.

I agree that static casts are not safe per se. That said, integer promotions are notoriously tricky and who knows what happens if _Vbase changes some time in the future.

@BillyONeal
Copy link
Member

Thanks that looks really good. I believe the _STL_INTERNAL_CHECK(_Last._Myoff != 0); // because we have at least one bit to set in the special case is not needed as we have the _First == _Last early return at the beginning.

The comment follows exactly because of that check at the beginning; asserts are free for product code :)

@miscco
Copy link
Contributor Author

miscco commented Jul 23, 2020

Thanks a lot @BillyONeal

Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for optimizing vector<bool>! I found some issues, but the core approach appears to be solid.

stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Jul 29, 2020
@miscco miscco force-pushed the vector_bool_fill branch from 40a3478 to 8493d52 Compare July 30, 2020 07:19
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, will validate and push minor changes for the last remaining issues. Thanks!

stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
stl/inc/vector Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej merged commit 1ff3d1e into microsoft:master Aug 1, 2020
@StephanTLavavej
Copy link
Member

Thanks for this significant perf improvement - it fills me with joy! 😹

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants