Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV when tested with gcc-12 on macos #3740

Closed
devcrocod opened this issue Aug 25, 2022 · 27 comments · Fixed by #3745
Closed

SIGSEGV when tested with gcc-12 on macos #3740

devcrocod opened this issue Aug 25, 2022 · 27 comments · Fixed by #3745

Comments

@devcrocod
Copy link

os: macOS 12.5.1
arch: x86_64
C compiler: gcc-12 (Homebrew GCC 12.1.0) 12.1.0
Fortran compiler: GNU Fortran (Homebrew GCC 12.1.0) 12.1.0
OpenBLAS version: 0.3.20/0.3.21/develop

make command:

make CC=gcc-12 FC=gfortran-12 HOSTCC=gcc BINARY=64 F_COMPILER=GFORTRAN FEXTRALIB=-lgfortran USE_OPENMP=0 NO_AVX512=1 DYNAMIC_ARCH=1 NUM_THREADS=64

When testing Complex BLAS, I get segmentation fault:

 Complex BLAS Test Program Results


 Test of subprogram number  1            ZDOTC 

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x10ef5b90e
#1  0x10ef5aaed
#2  0x7ff80b637dfc
#3  0x10e7980c0
#4  0x10e798361
#5  0x10be5ee06
#6  0x10be5f0d4
#7  0x10e924e8e
make[5]: *** [level1] Segmentation fault: 11
make[5]: *** Waiting for unfinished jobs....

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x1060d990e
#1  0x1060d8aed
#2  0x7ff80b637dfc
#3  0x1059025a0
#4  0x105902841
#5  0x102f9ac86
#6  0x102f991f6
#7  0x102f92cd1
#8  0x102f979b3
#9  0x105a8f13e
/bin/sh: line 1: 72622 Segmentation fault: 11  OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 ./zblat2 < ./zblat2.dat
@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 26, 2022

hmm, this used to work with earlier gcc, so should not be anything fundamental (like internal representation of complex numbers).I wonder if this could be a regression in gcc-12 itself. (Have not updated gcc in our CI yet and am currently travelling)

@martin-frbg
Copy link
Collaborator

fails at -O2, passes at just -g (so -O0), looks like overenthusiastic optimization by version 12.1 to me

@devcrocod
Copy link
Author

Unfortunately, I can no longer use gcc-11. I also tried running on gcc version 12.2.0 and got the same error

@martin-frbg
Copy link
Collaborator

make ... COMMON_OPT=-O1 also works with gcc-12, it only goes wrong at O2

@martin-frbg
Copy link
Collaborator

make COMMON_OPT="-O2 -fno-tree-vectorize is probably the best solution for now.(Default activation of the autovectorizer at O2 instead of O3 was one of the major changes in gcc-12)

@martin-frbg
Copy link
Collaborator

This "gcc-12" problem is not seen on x86_64 running Linux, nor on an Apple M1 running Linux, so may be peculiar to OSX builds

@mmuetzel
Copy link
Contributor

mmuetzel commented Aug 28, 2022

It looks like this also happens with the gcc-12 in MSYS2 on Windows: msys2/MINGW-packages#12857
Checking currently if re-building with -O2 -fno-tree-vectorize helps.

Edit: Building with -O2 -fno-tree-vectorize seems to fix the segfault on Windows, too.

@mmuetzel
Copy link
Contributor

Is there a recommended optimization level for building OpenBLAS?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 30, 2022

Traditionally just -O2 (see Makefile.rule) but cmake defaults to -O3 for Release builds. I had already deactivated the tree vectorizer for building the tests with gcc-11/cmake earlier, maybe more changed in gcc-12's vectorizer than just default activation but it will take time to identify the miscompiled code.

@mmuetzel
Copy link
Contributor

mmuetzel commented Aug 30, 2022

Is it these lines?
https://github.com/xianyi/OpenBLAS/blob/00534523ad999d89945d23b7df0eafc69c31f1b3/Makefile.system#L1552
https://github.com/xianyi/OpenBLAS/blob/00534523ad999d89945d23b7df0eafc69c31f1b3/Makefile.system#L1556

Only tangentially related to this issue: Would it make sense for the cmake build rules to set CMAKE_C_FLAGS_RELEASE="-O2" (and similarly for Fortran and Assembler)? IIUC, that would make the resulting binaries more similar when using cmake or make.
Edit: And it would probably come with the benefit of avoiding the issue here with default settings.

@martin-frbg
Copy link
Collaborator

Narrowed down to the level 1 BLAS kernels (Makefile.L1)

@mmuetzel
Copy link
Contributor

mmuetzel commented Sep 1, 2022

In case this is the same as the segfault with MSYS2, the backtrace there was the following:
msys2/MINGW-packages#12857 (comment)

* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x7ffab19ab690: Access violation reading location 0x201e19b7010
    frame #0: 0x00007ffab19ab690 libopenblas.dll`zgemv_n_ZEN + 4992
libopenblas.dll`zgemv_n_ZEN:
->  0x7ffab19ab690 <+4992>: vinsertf128 $0x1, (%r9), %ymm0, %ymm0
    0x7ffab19ab696 <+4998>: movq   0x88(%rsp), %rdi
    0x7ffab19ab69e <+5006>: vpermpd $0x11, %ymm0, %ymm2       ; ymm2 = ymm0[1,0,1,0]
    0x7ffab19ab6a4 <+5012>: vpermpd $0x44, %ymm0, %ymm0       ; ymm0 = ymm0[0,1,0,1]
(lldb) bt
* thread #1, stop reason = Exception 0xc0000005 encountered at address 0x7ffab19ab690: Access violation reading location 0x201e19b7010
  * frame #0: 0x00007ffab19ab690 libopenblas.dll`zgemv_n_ZEN + 4992
    frame #1: 0x00007ffab037bbff libopenblas.dll`zgemv_ + 1039
    frame #2: 0x00007ffab27c65a7 libopenblas.dll`zlatrd_ + 2455
    frame #3: 0x00007ffab272f6fd libopenblas.dll`zhetrd_ + 2093
    frame #4: 0x00007ffab272461b libopenblas.dll`zheev_ + 859
    frame #5: 0x00007ff6ae741690 test_openblas.exe`main + 544
    frame #6: 0x00007ff6ae7413d7 test_openblas.exe`__tmainCRTStartup at crtexe.c:329:15
    frame #7: 0x00007ff6ae741436 test_openblas.exe`mainCRTStartup at crtexe.c:206:9
    frame #8: 0x00007ffbbd1754e0 kernel32.dll`BaseThreadInitThunk + 16
    frame #9: 0x00007ffbbde2485b ntdll.dll`RtlUserThreadStart + 43

@martin-frbg
Copy link
Collaborator

Thank you, very useful as it does not appear to be easy to get meaningful backtraces on OSX.

@mmuetzel
Copy link
Contributor

mmuetzel commented Sep 1, 2022

Should the MINGW builds in the CI be configured with -DCMAKE_BUILD_TYPE=Release?
This might have been caught earlier if they were...

@martin-frbg
Copy link
Collaborator

Should the MINGW builds in the CI be configured with -DCMAKE_BUILD_TYPE=Release? This might have been caught earlier if they were...

With hindsight - yes, probably. Or one with the build type set and one without ?

@mmuetzel
Copy link
Contributor

mmuetzel commented Sep 1, 2022

The segfault here and the one with MSYS2 might be different ones. At least, the self-tests didn't trigger a segfault in #3750.

@martin-frbg
Copy link
Collaborator

closing as fixed by #3745

@littlewu2508
Copy link

littlewu2508 commented Feb 11, 2023

This issue is also observed on Linux,using gcc (Gentoo 12.2.1_p20230121-r1 p10) 12.2.1 20230121

zdot is suffering from sigsegv.

@martin-frbg
Copy link
Collaborator

@littlewu2508 I cannot reproduce this with the current develop branch, what is your hardware please ?

@littlewu2508
Copy link

littlewu2508 commented Feb 11, 2023 via email

@martin-frbg
Copy link
Collaborator

Zen3 here as well, maybe the gentoo gcc carries extra patches. Guess I'll need to build a gcc13 snapshot next

@martin-frbg
Copy link
Collaborator

btw I'm confused by the tensorflow ticket you linked to, not sure I see the connection except it's a segfault somewhere. Also the fix discussed here was made after the latest (0.3.21) release, you'd need to build a snapshot of develop to get it

@littlewu2508
Copy link

btw I'm confused by the tensorflow ticket you linked to, not sure I see the connection except it's a segfault somewhere. Also the fix discussed here was made after the latest (0.3.21) release, you'd need to build a snapshot of develop to get it

Oh, it's because under that ticket we are discussing the tensorflow ROCm support, and I provided the rocBLAS nightly built with openblas as a test reference. During the test there are segfaults appearing, and after some debugging I found the -O2 optimized openblas caused this issue.

@littlewu2508
Copy link

Zen3 here as well, maybe the gentoo gcc carries extra patches. Guess I'll need to build a gcc13 snapshot next

Probably. I cherry picked 739c3c4 and removed the platform check (defined(OS_DARWIN) || defined(OS_WINDOWS)) &&, which mitigates the issue I met.

@martin-frbg
Copy link
Collaborator

Still not reproduced with today's gcc from git, neither at O2 nor at O3.

@martin-frbg
Copy link
Collaborator

Also not reproduced with current develop in a vm installation of gentoo on Zen3 (with the gentoo gcc 12.2.1_p20230121-r1 p10)

@littlewu2508
Copy link

Also not reproduced with current develop in a vm installation of gentoo on Zen3 (with the gentoo gcc 12.2.1_p20230121-r1 p10)

Thanks for your efforts. I'll try again reproducing my issue and make a MWE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants