std::swap of arrays, why is there no specialization for trivial types #2683

monamimani · 2022-04-28T05:25:25Z

I was looking at what swap was doing for arrays and in the MSVC stl it loops trough the array and swap each element. (libstdc++ and libc++ do the same thing).

The thing is, it was an array of std::byte, I was surprised that none of the STL have specializations for trivially copyable/movable types that would call memcopy. Now I figure, this is maybe how the spec is. So my first question is there something preventing those specialization from existing.

My second question which is maybe off-topic for this is even tough clang and gcc also do a loop, "I believe" their auto vectorizer see the pattern and emits SSE instructions but MSVC doesn't it still loops.

Here is a compiler explorer link
https://godbolt.org/z/W8GxPv1ov

Now this is maybe something more for the compiler team, I don't know.
In any case I saw the differences and I thought I would point it out.

Thank you

frederick-vs-ja · 2022-04-28T06:45:28Z

I think this is a compiler issue, as libstdc++'s implementation of std::swap is naive.

Perhaps we can use the same mechanism as that of std::swap_ranges for vectorization, but I don't know if it is suitable.

StephanTLavavej · 2022-04-28T21:53:35Z

@monamimani , note that we can't simply use memcpy/memmove as they don't swap, they overwrite (so we'd need temporary storage).

@frederick-vs-ja , I believe that the vectorized swap_ranges implementation is indeed applicable to std::swap() for built-in arrays; I also observe that std::array::swap already uses it:

STL/stl/inc/array

Lines 461 to 463 in 314f65f

    
           _CONSTEXPR20 void swap(array& _Other) noexcept(_Is_nothrow_swappable<_Ty>::value) {
 
               _Swap_ranges_unchecked(_Elems, _Elems + _Size, _Other._Elems);
 
           }

STL/stl/inc/xutility

Lines 6129 to 6143 in 314f65f

    
           template <class _FwdIt1, class _FwdIt2>
 
           _CONSTEXPR20 _FwdIt2 _Swap_ranges_unchecked(_FwdIt1 _First1, const _FwdIt1 _Last1, _FwdIt2 _First2) {
 
               // swap [_First1, _Last1) with [_First2, ...)
 
           #if _USE_STD_VECTOR_ALGORITHMS
 
               using _Elem1 = remove_reference_t<_Iter_ref_t<_FwdIt1>>;
 
               using _Elem2 = remove_reference_t<_Iter_ref_t<_FwdIt2>>;
 
               if constexpr (is_same_v<_Elem1, _Elem2> && _Is_trivially_swappable_v<_Elem1> //
 
                             && _Iterators_are_contiguous<_FwdIt1, _FwdIt2>) {
 
           #if _HAS_CXX20
 
                   if (!_STD is_constant_evaluated())
 
           #endif // _HAS_CXX20
 
                   {
 
                       __std_swap_ranges_trivially_swappable_noalias(
 
                           _To_address(_First1), _To_address(_Last1), _To_address(_First2));

monamimani · 2022-04-28T22:06:38Z

@StephanTLavavej, of course it would need temporary storage like swapping ints. Sorry that I wasn't clear enough. :)

Because I feel this is more an issue with the vectorizer I also opened an issue on the Developper Community

StephanTLavavej · 2022-04-29T02:47:43Z

I can implement this - it just needs a bit of surgery in <xutility> and <utility>. PR coming soon...

frederick-vs-ja · 2022-04-29T05:21:28Z

I can implement this - it just needs a bit of surgery in <xutility> and <utility>. PR coming soon...

Thank you!

But I found that <utility> is categorized as a core header in MSVC STL, while <tuple> is not. I'm afraid that it is possibly the vectorized algorithms declared in <xutility> that make <tuple> non-core. BTW, I think <tuple> should also be a core header.

monamimani · 2022-05-01T19:36:08Z

Great thanks!

StephanTLavavej · 2022-05-03T22:16:23Z

I need to revert my PR as this broke users of std::swap who were including only <utility>, see #2699. Reopening this issue as we may be able to solve this later, but we need to carefully think about users like ATL who are including <type_traits> and expecting no IDL mismatch pragmas to be dragged in. (We could possibly separate out the pragma that drags in the import lib / static lib from the rest of the <yvals.h> machinery.)

monamimani · 2022-05-04T16:45:53Z

As I figured, at the core it might be a Vectorizer issue I also opened an issue over at the Developper Community, here

They have acknowledge the issue and are working on it, it may take time. People that have access might be able to know more. I at least wanted to tell you they are aware of this.

I don't exactly know what the mechanism of std::swap_ranges does but, as a stop gap is it possible to just do a specialization for trivial type, constrain it and use memcpy? I bet the standard specify more and would prevent that.

Why not do something stupid like this?

    std::byte storageTmp[Size];
    std::memcpy(storageTmp, arrayA, Size);
    std::memcpy(arrayA, arrayB, Size);
    std::memcpy(arrayB, storageTmp, Size);

I am pretty sure that something that I don't know about will make this none desirable, but this should get vectorized.

monamimani · 2022-05-04T16:53:31Z

I might add that from their own page The 13xx reason codes apply to the vectorizer.

void code_1300(int *A, int *B)
{
    // Code 1300 is emitted when the compiler detects that there is
    // no computation in the loop body.

    for (int i=0; i<1000; ++i)
    {
        A[i] = B[i]; // Do not vectorize, instead emit memcpy
    }
}

It says it should emit memcpy. but it doesn't. :)

StephanTLavavej · 2022-05-04T21:22:05Z

Thanks - the autovectorizer may not be able to deal with the swap algorithm, but we do plan to properly fix this in the STL in the future. (Our vectorized algorithm is faster than memcpy, by the way.)

AlexGuteniev · 2024-09-24T10:10:23Z

There's a possibility to implement that in headers using memcpy and fixed power-of-two portions.

See how copying by 32-bit portions is vectorized using SSE2 and AVX2: https://godbolt.org/z/YcEPz848W

Note how despite using immediate buffer in C++ it is not used in the assembly.

Can do this in two or more loops with descending sizes to handle the variety of sizes.

The exact portions size and the way of handling the tail (memcpy of variable length vs manual loop vs unrolled loop) is sort of trade-off between code size and performance.

monamimani added the question Further information is requested label Apr 28, 2022

StephanTLavavej added performance Must go faster and removed question Further information is requested labels Apr 28, 2022

StephanTLavavej mentioned this issue Apr 29, 2022

Vectorize swap and ranges::swap for arrays #2689

Merged

StephanTLavavej closed this as completed in #2689 May 1, 2022

StephanTLavavej added the fixed Something works now, yay! label May 1, 2022

StephanTLavavej reopened this May 3, 2022

StephanTLavavej removed the fixed Something works now, yay! label May 3, 2022

StephanTLavavej mentioned this issue May 3, 2022

Scalarize swap and ranges::swap for arrays #2700

Merged

frederick-vs-ja mentioned this issue Apr 12, 2024

Vectorize more algorithms for x86 / x64 using SSE4.2 and/or AVX2 #4415

Open

AlexGuteniev mentioned this issue Sep 29, 2024

Auto-vectorize arrays swap #4991

Merged

StephanTLavavej closed this as completed in #4991 Oct 24, 2024

StephanTLavavej added the fixed Something works now, yay! label Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::swap of arrays, why is there no specialization for trivial types #2683

std::swap of arrays, why is there no specialization for trivial types #2683

monamimani commented Apr 28, 2022

frederick-vs-ja commented Apr 28, 2022

StephanTLavavej commented Apr 28, 2022

monamimani commented Apr 28, 2022

StephanTLavavej commented Apr 29, 2022

frederick-vs-ja commented Apr 29, 2022

monamimani commented May 1, 2022

StephanTLavavej commented May 3, 2022

monamimani commented May 4, 2022

monamimani commented May 4, 2022 •

edited

Loading

StephanTLavavej commented May 4, 2022

AlexGuteniev commented Sep 24, 2024 •

edited

Loading

std::swap of arrays, why is there no specialization for trivial types #2683

std::swap of arrays, why is there no specialization for trivial types #2683

Comments

monamimani commented Apr 28, 2022

frederick-vs-ja commented Apr 28, 2022

StephanTLavavej commented Apr 28, 2022

monamimani commented Apr 28, 2022

StephanTLavavej commented Apr 29, 2022

frederick-vs-ja commented Apr 29, 2022

monamimani commented May 1, 2022

StephanTLavavej commented May 3, 2022

monamimani commented May 4, 2022

monamimani commented May 4, 2022 • edited Loading

StephanTLavavej commented May 4, 2022

AlexGuteniev commented Sep 24, 2024 • edited Loading

monamimani commented May 4, 2022 •

edited

Loading

AlexGuteniev commented Sep 24, 2024 •

edited

Loading