Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The last channel handling logic in the SSE version of ShuffleChannel cause buffer-overflow #5734

Open
junwha opened this issue Oct 14, 2024 · 0 comments · May be fixed by #5735
Open

The last channel handling logic in the SSE version of ShuffleChannel cause buffer-overflow #5734

junwha opened this issue Oct 14, 2024 · 0 comments · May be fixed by #5735

Comments

@junwha
Copy link

junwha commented Oct 14, 2024

context

In SSE version of ShuffleChannel, the last channel is shuffled from the offset with the half of the granularity.
Here, it causes buffer-overflow at the load of the ptr1 at the last iteration of the for loop.

For example, in the case of (elempack == 4) && (_group == 2 && channels % _group != 0) in AVX512 optimization,

The ptr1 initially can be accessed for the range [ptr1, ptr1+4*size),
and the range is reduced into [ptr1, ptr1+4*size-2) after ptr1 += 2;.
However, at the last iteration of the for loop, it loads [ptr1+4*size, ptr1+4*(size+1)) to _p1, which leads to buffer-overflow.

Since it causes both buffer-overflow read (ptr1) and buffer-overflow write (outptr), it could lead to incorrect result of the model.

{
      const float* ptr0 = bottom_blob.channel(channels_per_group);
      const float* ptr1 = bottom_blob.channel(channels_per_group * 2);
      float* outptr = top_blob.channel(channels_per_group * 2);

      ptr1 += 2;

      for (int i = 0; i < size; i++)
      {
          __m128 _p0 = _mm_loadu_ps(ptr0);
          __m128 _p1 = _mm_loadu_ps(ptr1);

          __m128 _lo = _mm_unpacklo_ps(_p0, _p1);

          _mm_storeu_ps(outptr, _lo);

          ptr0 += 4;
          ptr1 += 4;
          outptr += 4;
      }
  }

x86
https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L117
https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L373
https://github.com/Tencent/ncnn/blob/master/src/layer/x86/shufflechannel_x86.cpp#L608

arm
https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L118
https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L365
https://github.com/Tencent/ncnn/blob/master/src/layer/arm/shufflechannel_arm.cpp#L599

how to reproduce

  1. Build with SSE in x86 or arm
  2. ./test_shufflechannel

more

I will open a PR of the patch for this:)

junwha added a commit to junwha/ncnn that referenced this issue Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant