Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable AVX-512 on Windows #1708

Closed
ararslan opened this issue Aug 2, 2018 · 16 comments · Fixed by #1717
Closed

Disable AVX-512 on Windows #1708

ararslan opened this issue Aug 2, 2018 · 16 comments · Fixed by #1717
Milestone

Comments

@ararslan
Copy link
Contributor

ararslan commented Aug 2, 2018

There is apparently a GCC bug that prevents building AVX-512 code on Windows. We observed this when building OpenBLAS on the 64-bit Windows Julia buildbot (which I'm told has a Skylake-X processor). Is there a way to disable OpenBLAS building AVX-512 on Windows? Or perhaps a patch that disables it automatically?

@brada4
Copy link
Contributor

brada4 commented Aug 3, 2018

But you do not use -O3 , so the bug is not relevant, nor pertains windows globally.

Most likely your 'as' is much older than your gcc, so it does not understand assembly emitted by gcc... It does not choke on openblas code.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 3, 2018

Ideally the compile test in c_check (and system_check.cmake) should catch this (and define NO_AVX512 automatically), but from the GCC bug it appears to depend on some particular use of registers that is not reflected in the current test code. (Not entirely sure if we can just borrow their test from the end of https://gcc.gnu.org/viewcvs/gcc/trunk/libgfortran/acinclude.m4?view=markup&pathrev=244636 (GCC PR79127) as that would move it from GPL to three-clause BSD, but it is probably easy to come up with an equivalent test now that the underlying problem is understood.)

@martin-frbg martin-frbg added this to the 0.3.3 milestone Aug 3, 2018
@brada4
Copy link
Contributor

brada4 commented Aug 3, 2018

Red hat6 could use avx2 test of a class, no gpl does not remove that easily unless you convince author to give it to you without gpl

@martin-frbg
Copy link
Collaborator

martin-frbg commented Aug 3, 2018

Perhaps you could try if simply changing the zmm2 register now used in the test at
https://github.com/xianyi/OpenBLAS/blob/9e654305049caf80bf53369304d1fc4e3662ba7e/cmake/system_check.cmake#L70
or
https://github.com/xianyi/OpenBLAS/blob/9e654305049caf80bf53369304d1fc4e3662ba7e/c_check#L207
to zmm16 is sufficient to catch this ? (If not, adding a clobber line similar to what they used in the gfortran PR would hopefully do the trick without causing any licensing complications)

@brada4
Copy link
Contributor

brada4 commented Aug 3, 2018

Also worth experimenting in direction of planting avx512 capable binutils to compilers.
Indication one needs 2.24 https://gitlab.com/ultr/glibc/commit/f43cb35c9b3c35addc6dc0f1427caf51786ca1d2
@ararslan - gcc uses libexec/gcc-as in absence of executable as, probably that does not stand true for cross-build.
For inspiration : https://github.com/xianyi/OpenBLAS/wiki/faq#binutils

Can you show as --version and x86_64-w64-mingw32-as --version
Your gcc should have 2.25.1, with AVX512 support, but maybe "system" as lags behind and gets in the way?

@ararslan
Copy link
Contributor Author

ararslan commented Aug 3, 2018

My colleague @staticfloat knows more about the toolchain we're using, perhaps he can chime in here.

@staticfloat
Copy link
Contributor

$ as --version
GNU assembler (GNU Binutils) 2.29.1.20171006
Copyright (C) 2017 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-pc-cygwin'.

$ x86_64-w64-mingw32-as --version
GNU assembler (GNU Binutils) 2.29.1.20171006
Copyright (C) 2017 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `x86_64-w64-mingw32'.

We are using a Cygwin-based environment, within which we install a mingw32-based GCC toolchain. Looks like we have pretty recent assemblers, so that's most likely not the issue here. Just to make sure, I ran the failing compiler command again, but with -v to see the toolchain invocations:

$ x86_64-w64-mingw32-gcc -march=x86-64 -m64 -c -O2 -O2 -DMS_ABI -DMAX_STACK_ALLOC=2048 -Wall -m64 -DF_INTERFACE_GFORT  -DDYNAMIC_ARCH -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=16 -DMAX_PARALLEL_NUMBER=1 -march=skylake-avx512 -DASMNAME=ssymv_L_SKYLAKEX -DASMFNAME=ssymv_L_SKYLAKEX_ -DNAME=ssymv_L_SKYLAKEX_ -DCNAME=ssymv_L_SKYLAKEX -DCHAR_NAME=\"ssymv_L_SKYLAKEX_\" -DCHAR_CNAME=\"ssymv_L_SKYLAKEX\" -DNO_AFFINITY -DTS=_SKYLAKEX -I.. -DBUILD_KERNEL -DTABLE_NAME=gotoblas_SKYLAKEX -UDOUBLE  -UCOMPLEX -UCOMPLEX -UDOUBLE -DLOWER ../kernel/x86_64/ssymv_L.c -o ssymv_L_SKYLAKEX.obj -v
Using built-in specs.
COLLECT_GCC=x86_64-w64-mingw32-gcc
Target: x86_64-w64-mingw32
Configured with: /cygdrive/i/szsz/tmpp/cygwin64/mingw64-x86_64/mingw64-x86_64-gcc-6.4.0-2.x86_64/src/gcc-6.4.0/configure --srcdir=/cygdrive/i/szsz/tmpp/cygwin64/mingw64-x86_64/mingw64-x86_64-gcc-6.4.0-2.x86_64/src/gcc-6.4.0 --prefix=/usr --exec-prefix=/usr --localstatedir=/var --sysconfdir=/etc --docdir=/usr/share/doc/mingw64-x86_64-gcc --htmldir=/usr/share/doc/mingw64-x86_64-gcc/html -C --build=x86_64-pc-cygwin --host=x86_64-pc-cygwin --target=x86_64-w64-mingw32 --without-libiconv-prefix --without-libintl-prefix --with-sysroot=/usr/x86_64-w64-mingw32/sys-root --with-build-sysroot=/usr/x86_64-w64-mingw32/sys-root --disable-multilib --disable-win32-registry --enable-languages=c,c++,fortran,lto,objc,obj-c++ --enable-fully-dynamic-string --enable-graphite --enable-libgomp --enable-libquadmath --enable-libquadmath-support --enable-libssp --enable-version-specific-runtime-libs --enable-libgomp --enable-libada --with-dwarf2 --with-gnu-ld --with-gnu-as --with-tune=generic --with-cloog-include=/usr/include/cloog-isl --with-system-zlib --enable-threads=posix --libexecdir=/usr/lib
Thread model: posix
gcc version 6.4.0 (GCC)
COLLECT_GCC_OPTIONS='-march=x86-64' '-c' '-O2' '-O2' '-D' 'MS_ABI' '-D' 'MAX_STACK_ALLOC=2048' '-Wall' '-m64' '-D' 'F_INTERFACE_GFORT' '-D' 'DYNAMIC_ARCH' '-D' 'SMP_SERVER' '-D' 'NO_WARMUP' '-D' 'MAX_CPU_NUMBER=16' '-D' 'MAX_PARALLEL_NUMBER=1' '-march=skylake-avx512' '-D' 'ASMNAME=ssymv_L_SKYLAKEX' '-D' 'ASMFNAME=ssymv_L_SKYLAKEX_' '-D' 'NAME=ssymv_L_SKYLAKEX_' '-D' 'CNAME=ssymv_L_SKYLAKEX' '-D' 'CHAR_NAME="ssymv_L_SKYLAKEX_"' '-D' 'CHAR_CNAME="ssymv_L_SKYLAKEX"' '-D' 'NO_AFFINITY' '-D' 'TS=_SKYLAKEX' '-I' '..' '-D' 'BUILD_KERNEL' '-D' 'TABLE_NAME=gotoblas_SKYLAKEX' '-U' 'DOUBLE' '-U' 'COMPLEX' '-U' 'COMPLEX' '-U' 'DOUBLE' '-D' 'LOWER' '-o' 'ssymv_L_SKYLAKEX.obj' '-v'
 /usr/lib/gcc/x86_64-w64-mingw32/6.4.0/cc1.exe -quiet -v -I .. -D_REENTRANT -D MS_ABI -D MAX_STACK_ALLOC=2048 -D F_INTERFACE_GFORT -D DYNAMIC_ARCH -D SMP_SERVER -D NO_WARMUP -D MAX_CPU_NUMBER=16 -D MAX_PARALLEL_NUMBER=1 -D ASMNAME=ssymv_L_SKYLAKEX -D ASMFNAME=ssymv_L_SKYLAKEX_ -D NAME=ssymv_L_SKYLAKEX_ -D CNAME=ssymv_L_SKYLAKEX -D CHAR_NAME="ssymv_L_SKYLAKEX_" -D CHAR_CNAME="ssymv_L_SKYLAKEX" -D NO_AFFINITY -D TS=_SKYLAKEX -D BUILD_KERNEL -D TABLE_NAME=gotoblas_SKYLAKEX -U DOUBLE -U COMPLEX -U COMPLEX -U DOUBLE -D LOWER ../kernel/x86_64/ssymv_L.c -quiet -dumpbase ssymv_L.c -march=x86-64 -m64 -march=skylake-avx512 -auxbase-strip ssymv_L_SKYLAKEX.obj -O2 -O2 -Wall -version -o /tmp/ccL3UFlg.s
GNU C11 (GCC) version 6.4.0 (x86_64-w64-mingw32)
	compiled by GNU C version 6.4.0, GMP version 6.1.2, MPFR version 3.1.5-p10, MPC version 1.0.3, isl version 0.14 or 0.13
warning: MPFR header version 3.1.5-p10 differs from library version 3.1.6-p1.
warning: MPC header version 1.0.3 differs from library version 1.1.0.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring nonexistent directory "/usr/x86_64-w64-mingw32/sys-root/usr/local/include"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-w64-mingw32/6.4.0/../../../../x86_64-w64-mingw32/include"
#include "..." search starts here:
#include <...> search starts here:
 ..
 /usr/lib/gcc/x86_64-w64-mingw32/6.4.0/include
 /usr/lib/gcc/x86_64-w64-mingw32/6.4.0/include-fixed
 /usr/x86_64-w64-mingw32/sys-root/mingw/include
End of search list.
GNU C11 (GCC) version 6.4.0 (x86_64-w64-mingw32)
	compiled by GNU C version 6.4.0, GMP version 6.1.2, MPFR version 3.1.5-p10, MPC version 1.0.3, isl version 0.14 or 0.13
warning: MPFR header version 3.1.5-p10 differs from library version 3.1.6-p1.
warning: MPC header version 1.0.3 differs from library version 1.1.0.
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 5406d774635579063101db625e337f18
COLLECT_GCC_OPTIONS='-march=x86-64' '-c' '-O2' '-O2' '-D' 'MS_ABI' '-D' 'MAX_STACK_ALLOC=2048' '-Wall' '-m64' '-D' 'F_INTERFACE_GFORT' '-D' 'DYNAMIC_ARCH' '-D' 'SMP_SERVER' '-D' 'NO_WARMUP' '-D' 'MAX_CPU_NUMBER=16' '-D' 'MAX_PARALLEL_NUMBER=1' '-march=skylake-avx512' '-D' 'ASMNAME=ssymv_L_SKYLAKEX' '-D' 'ASMFNAME=ssymv_L_SKYLAKEX_' '-D' 'NAME=ssymv_L_SKYLAKEX_' '-D' 'CNAME=ssymv_L_SKYLAKEX' '-D' 'CHAR_NAME="ssymv_L_SKYLAKEX_"' '-D' 'CHAR_CNAME="ssymv_L_SKYLAKEX"' '-D' 'NO_AFFINITY' '-D' 'TS=_SKYLAKEX' '-I' '..' '-D' 'BUILD_KERNEL' '-D' 'TABLE_NAME=gotoblas_SKYLAKEX' '-U' 'DOUBLE' '-U' 'COMPLEX' '-U' 'COMPLEX' '-U' 'DOUBLE' '-D' 'LOWER' '-o' 'ssymv_L_SKYLAKEX.obj' '-v'
 /usr/lib/gcc/x86_64-w64-mingw32/6.4.0/../../../../x86_64-w64-mingw32/bin/as.exe -v -I .. --64 -o ssymv_L_SKYLAKEX.obj /tmp/ccL3UFlg.s
GNU assembler version 2.29.1 (x86_64-w64-mingw32) using BFD version (GNU Binutils) 2.29.1.20171006
/tmp/ccL3UFlg.s: Assembler messages:
/tmp/ccL3UFlg.s:146: Error: invalid register for .seh_savexmm

@martin-frbg
Copy link
Collaborator

So could you test please if modifying the c_check code is sufficient to work around lack of avx512 support in mingw ?

@carlkl
Copy link

carlkl commented Aug 4, 2018

Mayby compiling with -fno-asynchronous-unwind-tables could help. See https://stackoverflow.com/questions/43152633/invalid-register-for-seh-savexmm-in-cygwin

@staticfloat
Copy link
Contributor

Modifying the c_check code to say zmm16 instead of zmm2 is not sufficient; I get the same problem.

Adding -fno-asynchronous-unwind-tables worked! We'll do that for now, perhaps OpenBLAS should do that by default if running on cygwin? (You can check via $($CC -dumpmachine) == *-cygwin)

@martin-frbg
Copy link
Collaborator

Thanks. From your description I assume your environment would have been detected as OSNAME=CYGWIN_NT (in the first line of Makefile.conf) already, so Makefile.x86_64 could append the option (where it adds -march=skylake-avx512) based on this ?

ararslan added a commit to JuliaLang/julia that referenced this issue Aug 5, 2018
Adding `-fno-asynchronous-unwind-tables` to the C compiler flags passed
to OpenBLAS works around errors claiming an invalid register for
`.seh_savexmm`. We've been running into this on the 64-bit Windows
buildbots.

See discussion in OpenMathLib/OpenBLAS#1708.
@staticfloat
Copy link
Contributor

Yes, that seems correct.

ararslan added a commit to JuliaLang/julia that referenced this issue Aug 6, 2018
Adding `-fno-asynchronous-unwind-tables` to the C compiler flags passed
to OpenBLAS works around errors claiming an invalid register for
`.seh_savexmm`. We've been running into this on the 64-bit Windows
buildbots.

See discussion in OpenMathLib/OpenBLAS#1708.
ararslan added a commit to JuliaLang/julia that referenced this issue Aug 6, 2018
Adding `-fno-asynchronous-unwind-tables` to the C compiler flags passed
to OpenBLAS works around errors claiming an invalid register for
`.seh_savexmm`. We've been running into this on the 64-bit Windows
buildbots.

See discussion in OpenMathLib/OpenBLAS#1708.
ararslan added a commit to JuliaLang/julia that referenced this issue Aug 6, 2018
Adding `-fno-asynchronous-unwind-tables` to the C compiler flags passed
to OpenBLAS works around errors claiming an invalid register for
`.seh_savexmm`. We've been running into this on the 64-bit Windows
buildbots.

See discussion in OpenMathLib/OpenBLAS#1708.

(cherry picked from commit 651a727)
@teepean
Copy link

teepean commented Oct 24, 2018

This is a closed issue but on MSYS2 clang 7.0.0 should be compatible with AVX-512.

@martin-frbg
Copy link
Collaborator

Thanks. Despite the title of this issue, AVX-512 was never actually disabled on windows - on Cygwin hosts and Windows with gcc, a gcc option is added to the CFLAGS to work around a mingw-gcc bug. Clang builds should not be affected as long as they manage to compile the small test code correctly.

@teepean
Copy link

teepean commented Oct 24, 2018

The code compiles but it won't run as I do not have AVX-512 capable CPU.

@brada4
Copy link
Contributor

brada4 commented Oct 24, 2018

It will run just fine once you have CPU.
Close approximation of avx512 support is patches mentioned in vs2017 v15.3 release notes.

KristofferC pushed a commit to JuliaLang/julia that referenced this issue Feb 11, 2019
Adding `-fno-asynchronous-unwind-tables` to the C compiler flags passed
to OpenBLAS works around errors claiming an invalid register for
`.seh_savexmm`. We've been running into this on the 64-bit Windows
buildbots.

See discussion in OpenMathLib/OpenBLAS#1708.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants