Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wip: trying to debug windows CI failures #128

Closed
wants to merge 4 commits into from

Conversation

Dr-Emann
Copy link
Member

No description provided.

@Dr-Emann Dr-Emann force-pushed the debug_win_ci branch 10 times, most recently from f00499a to bbbf3f2 Compare January 25, 2024 04:32
@Dr-Emann
Copy link
Member Author

Dr-Emann commented Jan 25, 2024

@lemire, It does appear to reproduce with c only, building this file with msvc cl: see the "msvc c roaring" step of this job (which is set to continue on error for that step).

It's built with https://github.com/Dr-Emann/croaring-rs/blob/debug_win_ci/.github/workflows/rust.yml#L53-L60, based on the flags/environment rust is compiling the c code with

running
Hardware support: 0x1
Going to or many
-1073741795

(where -1073741795 is 0xC000001D (STATUS_ILLEGAL_INSTRUCTION)).

croaring_hardware_support() = 0x1 doesn't appear to indicate AVX512 is the problem.

The issues began (with I believe no meaningful changes in CRoaring/croaring-rs, this PR is right off master, with no changes to CRoaring/croaring-rs) sometime between 3 months ago and 2 weeks ago
image

I have no idea why we don't see any issues in CRoaring itself

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Jan 25, 2024

@lemire
I was able to save the built executable from github actions (download link), and running it on my windows machine does error, and I'm able to open it in windbg.

Here's the assembly of (the first part of) the function crashing

    roaring_fast_or!avx2_harley_seal_popcount256:
00007ff7`fc9595e0 488bc4               mov     rax, rsp
00007ff7`fc9595e3 55                   push    rbp
00007ff7`fc9595e4 4881ec60010000       sub     rsp, 160h
00007ff7`fc9595eb 0f2970e8             movaps  xmmword ptr [rax-18h], xmm6
00007ff7`fc9595ef 0f2978d8             movaps  xmmword ptr [rax-28h], xmm7
00007ff7`fc9595f3 440f2940c8           movaps  xmmword ptr [rax-38h], xmm8
00007ff7`fc9595f8 440f2948b8           movaps  xmmword ptr [rax-48h], xmm9
00007ff7`fc9595fd 440f2950a8           movaps  xmmword ptr [rax-58h], xmm10
00007ff7`fc959602 440f295898           movaps  xmmword ptr [rax-68h], xmm11
00007ff7`fc959607 440f296088           movaps  xmmword ptr [rax-78h], xmm12
00007ff7`fc95960c 440f29a878ffffff     movaps  xmmword ptr [rax-88h], xmm13
00007ff7`fc959614 440f29b068ffffff     movaps  xmmword ptr [rax-98h], xmm14
00007ff7`fc95961c 440f29b858ffffff     movaps  xmmword ptr [rax-0A8h], xmm15
00007ff7`fc959624 488d6c2420           lea     rbp, [rsp+20h]
00007ff7`fc959629 4883e5e0             and     rbp, 0FFFFFFFFFFFFFFE0h
00007ff7`fc95962d 488bc2               mov     rax, size (rdx)
00007ff7`fc959630 4c8bc9               mov     r9, data (rcx)
00007ff7`fc959633 41b800000000         mov     r8d, 0
00007ff7`fc959639 c5c1efff             vpxor   xmm7, xmm7, xmm7
00007ff7`fc95963d c44139efc0           vpxor   xmm8, xmm8, xmm8
00007ff7`fc959642 62a15500efed         vpxord  xmm21, xmm21, xmm21 <- This instruction crashes
00007ff7`fc959648 c44131efc9           vpxor   xmm9, xmm9, xmm9
00007ff7`fc95964d c5fe7f7d00           vmovdqu ymmword ptr [rbp], ymm7
00007ff7`fc959652 c44109eff6           vpxor   xmm14, xmm14, xmm14
00007ff7`fc959657 c57e7f4520           vmovdqu ymmword ptr [rbp+20h], ymm8

Call stack:

[0x0]   roaring_fast_or!avx2_harley_seal_popcount256+0x62   0xc576affba0   0x7ff7fc980e0c   
[0x1]   roaring_fast_or!container_repair_after_lazy+0x78   0xc576affd10   0x7ff7fc974b69   
[0x2]   roaring_fast_or!roaring_bitmap_repair_after_lazy+0xbc   0xc576affd10   0x7ff7fc974b69   
[0x3]   roaring_fast_or!roaring_bitmap_or_many+0x19   0xc576affd60   0x7ff7fc98b3b8   
[0x4]   roaring_fast_or!main+0xe9   0xc576affd60   0x7ff7fc98b3b8   

Source code line according to debug info (may be wrong?):

inline static uint64_t avx2_harley_seal_popcount256(const __m256i *data,
                                                    const uint64_t size) {
    __m256i total = _mm256_setzero_si256();
    __m256i ones = _mm256_setzero_si256();
    __m256i twos = _mm256_setzero_si256();
    __m256i fours = _mm256_setzero_si256();
    __m256i eights = _mm256_setzero_si256();
    __m256i sixteens = _mm256_setzero_si256();
    __m256i twosA, twosB, foursA, foursB, eightsA, eightsB;

    const uint64_t limit = size - size % 16;
    uint64_t i = 0;  // <- This line is shown as where the crash occurs

    for (; i < limit; i += 16) {
        CSA(&twosA, &ones, ones, _mm256_lddqu_si256(data + i),
            _mm256_lddqu_si256(data + i + 1));
        CSA(&twosB, &ones, ones, _mm256_lddqu_si256(data + i + 2),
            _mm256_lddqu_si256(data + i + 3));

@Dr-Emann
Copy link
Member Author

Given no code changes between croaring/croaring-rs, I'm thinking it's the MSVC version bump from 17.7 to 17.8 which occurred between the last working test vs the first failing one.

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Jan 25, 2024

This feels awfully relevant: https://developercommunity.visualstudio.com/t/Invalid-AVX512-instructions-generated-wh/10521872

After upgrading to MSVC 17.8, one some machines the AVX2 SIMD would start failing with illegal instruction faults. After some debugging, it seems MSVC embeds AVX512 instructions in the AVX2 compiled code.

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Jan 25, 2024

Also relevant, looks like if MSVC figures out we're going to avx512 in the function, it can use it anywhere. I don't see anywhere we could use any avx512 intrinsics in the avx2 code path though, from a somewhat close look:
https://developercommunity.visualstudio.com/t/Invalid-code-gen-when-using-AVX2-and-SSE/10527298

We recently made a compiler change that leverages AVX512 intrinsics that are guaranteed to run on a code path and use this information to leverage AVX512 on that code path.

We have reviewed the provided repro and it uses AVX512 intrinsics (hidden behind the macros) and these intrinsics run on a code path that always executes which triggers the optimization above. The only reason that the AVX512 intrinsics worked previously was that we were downcasting them to AVX instructions to save binary space.

EDIT:
I'm almost certain we aren't, I manually removed the AVX512 sections of the file, and ran grep -o '\b_m\w*' < include/roaring/bitset_util.h | sort -u, and checked every found intrinsic against The intel intrinsics guide, and verified that all were NOT avx512.

@lemire
Copy link
Member

lemire commented Jan 25, 2024

looks like if MSVC figures out we're going to avx512 in the function, it can use it anywhere.

I think we can disable AVX-512 code paths entirely (under Windows).

@Dr-Emann
Copy link
Member Author

Looks like disabling AVX512 isn't enough, we have to disable AVX entirely 🫤

@lemire
Copy link
Member

lemire commented Jan 27, 2024

Looks like disabling AVX512 isn't enough, we have to disable AVX entirely

It is certainly possible to have hardware lacking AVX instruction support. But we should handle this scenario.

You may have seen that CRoaring now effectively runs your small tests from the amalgamated files in CI, and it appears to work.

@lemire
Copy link
Member

lemire commented Jan 27, 2024

@Dr-Emann Can you review CI tests
https://github.com/RoaringBitmap/CRoaring/actions/runs/7670008347/job/20905194245

You can see that it is running the amalgamation tests which involve precisely the function you suggested. It is building it using the default Visual Studio 2022 provided by GitHub.

What are we missing?

@Dr-Emann Dr-Emann force-pushed the debug_win_ci branch 5 times, most recently from f8a774f to 59745f6 Compare January 29, 2024 02:50
@lemire
Copy link
Member

lemire commented Jan 29, 2024

But it is not going to be super useful because my laptop does support AVX instructions. :-/

@lemire
Copy link
Member

lemire commented Jan 29, 2024

@Dr-Emann

vpxord is AVX512F + AVX512VL.

@lemire
Copy link
Member

lemire commented Jan 29, 2024

My best guess is that this code confuses Visual Studio into using AVX-512 throughout:

int bitset_container_compute_cardinality(const bitset_container_t *bitset) {
    int support = croaring_hardware_support();
#if CROARING_COMPILER_SUPPORTS_AVX512
    if( support & ROARING_SUPPORTS_AVX512 ) {
      return (int) avx512_vpopcount(
        (const __m512i *)bitset->words,
        BITSET_CONTAINER_SIZE_IN_WORDS / (WORDS_IN_AVX512_REG));
    } else
#endif // CROARING_COMPILER_SUPPORTS_AVX512
    if( support & ROARING_SUPPORTS_AVX2 ) {
      return (int) avx2_harley_seal_popcount256(
        (const __m256i *)bitset->words,
        BITSET_CONTAINER_SIZE_IN_WORDS / (WORDS_IN_AVX2_REG));
    } else {
      return _scalar_bitset_container_compute_cardinality(bitset);

    }
}

What confuses me is that you are say that disabling AVX-512 did not help.

@lemire
Copy link
Member

lemire commented Jan 29, 2024

@Dr-Emann Please review... RoaringBitmap/CRoaring#579

If it makes sense to you, we shall try this approach. If it solves the issue, then we are lucky.

@Dr-Emann
Copy link
Member Author

I'm fairly sure that when I tried with CROARING_COMPILER_SUPPORTS_AVX512=0, it didn't work, no.

I think there were two "bugs" I posted.

The first was acknowledged as a bug by microsoft, and it seemed to be that the compiler would insert avx512 instructions when only AVX2 was enabled (or maybe that the AVX2 intrinsics were producing avx512 instructions?), and was marked as Fixed - Pending Release almost a month ago.

The second was closed as not a bug, where the compiler figured out that the codepath was going to use avx512 intrinsics in the same codepath, so optimized assuming it could use avx512.

I think we're hitting the first issue, not the second.

@lemire
Copy link
Member

lemire commented Jan 30, 2024

@Dr-Emann If you take my PR, the one where I move the AVX-512, and you use that in your rust binding, does it help?

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Jan 31, 2024

Nope, it still ends up with avx512 instructions with that branch (even with -DCROARING_COMPILER_SUPPORTS_AVX512=0) when compiling with MSVC 14.38 (and I've taken rust totally out of the picture, it's just running c code on this PR branch.

@lemire
Copy link
Member

lemire commented Jan 31, 2024

@Dr-Emann Hmmmm.... I don't know what else to do.

@Dr-Emann
Copy link
Member Author

I really think it's just an MSVC bug that'll hopefully be fixed in 14.39. I think our only real options are:

  • ignore it for now, hopefully it'll be fixed soon
  • disable the avx2 path entirely if defined( _MSC_VER ) && _MSC_VER >= 1938

@lemire
Copy link
Member

lemire commented Jan 31, 2024

Disabling AVX seems extreme. I would happily disable AVX-512 under Windows if it were needed, but I feel uneasy about disabling AVX since almost all Windows machines that we care about have AVX2.

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Jan 31, 2024

I really don't see any way around it for the versions of MSVC that has this bug, the alternative is miscompilation which will cause a crash.

Here's a godbolt link including only CSA, popcount256, and avx2_harley_seal_popcount256, and you can see the output includes a vpxord instruction, even though there's no mention of avx512 ANYWHERE, never mind in the same function: https://c.godbolt.org/z/d764KGWK1 (line 77 of the assembly output)

And if you change back to MSVC 19.37 (still don't understand the versioning, why is it 14 some places and 19 elsewhere? EDIT: see https://devblogs.microsoft.com/oldnewthing/20221219-00/?p=107601, 14.38 is the toolchain version 19.38 is the compiler version (and is related to the _MSC_VER)), there's no vpxord instruction.

@yeastplume
Copy link

Seeing same issue, glad to see there's a thread here about this because I've been tearing my hair out over this for a couple of weeks now:

mimblewimble/grin-gui#73 (comment)

@lemire
Copy link
Member

lemire commented Feb 1, 2024

@yeastplume

I think that @Dr-Emann has demonstrated that it is a compiler bug.

@lemire
Copy link
Member

lemire commented Feb 1, 2024

@yeastplume For now, I recommend you just avoid the offending compiler version. If I were Microsoft, I would just patch it and make sure that the bug goes away.

I don't think we will do anything.

@yeastplume
Copy link

@yeastplume For now, I recommend you just avoid the offending compiler version. If I were Microsoft, I would just patch it and make sure that the bug goes away.

I don't think we will do anything.

Yes, makes sense and thanks for your work tracking this down.

Slightly tangential, I guess, but does anyone have a quick pointer as to how to force github actions to use a particular compiler version?

@lemire
Copy link
Member

lemire commented Feb 1, 2024

Slightly tangential, I guess, but does anyone have a quick pointer as to how to force github actions to use a particular compiler version?

The current windows-latest runner should only have two versions installed, a very old one, and the very latest one, see the specs...

https://github.com/actions/runner-images/blob/main/images/windows/Windows2022-Readme.md

I don't think it is possible to install another version of Visual Studio on the GitHub hosted runners.

So we have to wait for Microsoft to fix this.

@lemire
Copy link
Member

lemire commented Feb 1, 2024

@lemire
Copy link
Member

lemire commented Mar 1, 2024

@Dr-Emann Waiting for a fix from Microsoft did not help. They released 17.9 and it is the version we have in the GitHub runners, but the issue remains for us.

We need to change our code to try and fix this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants