-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic mul!
of ComplexF32
arrays under Apple Silicon
#1051
Comments
Can you try it with AppleAccelerate.jl and also 1.11-dev (which uses a new openblas)? |
OpenBLAS on 1.11-dev, AppleAccelerate on 1.9.4 and AppleAccelerate on 1.10.0 all work fine. So no need to bisect then, the fix should be JuliaLang/julia#52762, right? Could that be backported to 1.10? EDIT: not so sure... I just tried and it still fails on JuliaLang/julia#52762. Trying JuliaLang/julia#52928 now. |
Does it work if you start julia with
? |
@giordano, on what version? On 1.10 the On JuliaLang/julia#52928 there is no need to set |
On 1.10.0, because it ships OpenBLAS v0.3.23 which still falls back on the generic armv8 kernels on Apple Silicon, but in v0.3.26 (what's used now in
I'm a bit confused, the build in JuliaLang/julia#52928 shouldn't theoretically affect Apple Silicon, it doesn't have SVE, unless the problem was in the specific version of the compiler used. |
Ok, so your hunch is that setting the
Me too. So I've tried JuliaLang/julia#52762 again, which failed for me before, but now it works (!). I might have forgotten to do a So at this point my understanding is that something in OpenBLAS v0.3.26 fixed the issue, but that it was still present with OpenBLAS v0.3.23, even with the |
Not really, I was wondering if you were hitting a bug in the generic armv8 kernels, and switching to the neoversen1 ones would work around it, but apparently it isn't the case. Using
That's good, switching to neoversen1 kernels wasn't the only change between v0.3.23 and v0.3.26, the diff is pretty large, mine was a low-effort attempt to guess a possible solution, but perhaps there was some other bug somewhere else. |
@giordano will the openblas bump that is making its way through Yggdrasil fix the perf issue you mention here - even on 1.10? |
So I managed to bisect this (took a while!) and it turns out the fix was JuliaLang/julia#51660. EDIT: FWIW the corresponding Julia version for the fix is 1.11.0-DEV.632 |
Perhaps it is this? OpenMathLib/OpenBLAS#4003 |
That sounds very suspicious |
@pablosanjose if you have a fortran compiler on your M1 (I don't) would you be able to bisect openblas between 0.3.23 and 0.3.24? You don't need to touch Julia, can use 1.10, you only need to use BLAS.lbt_forward to run blas functions using the just built library, building openblas takes relatively little. |
@pablosanjose A script like this (untested!) should help for bisecting: #!/bin/bash
make clean || true
make DYNAMIC_ARCH=1 LIBPREFIX=libopenblas64_ INTERFACE64=1 SYMBOLSUFFIX=64_ -j || exit 125
julia --startup-file=no -e '
using LinearAlgebra
BLAS.lbt_forward(realpath("./libopenblas64_.dylib"); clear=true)
function test()
C = ComplexF32
mat = zeros(C, 1, 1)
for _ in 1:100
v = [C(1-0.2im) C(2+0.3im)]
mul!(mat, v, adjoint(v), C(1+im), 1)
end
return mat
end
exit(Int(!allequal(test() for _ in 1:100_000)))
' You can run it with
Side note, besides not having a Fortran compiler at hand, I can't even reproduce the issue on my MacBook Pro (13-inch, M1, 2020), macOS 12.6: julia> using LinearAlgebra
julia> function test()
C = ComplexF32
mat = zeros(C, 1, 1)
for _ in 1:100
v = [C(1-0.2im) C(2+0.3im)]
mul!(mat, v, v', C(1+im), 1)
end
return mat
end
test (generic function with 1 method)
julia> allequal(test() for _ in 1:1_000_000)
true
julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 on 4 virtual cores Same in Julia v1.9.4. |
FWIW, OpenBLAS 0.3.24 release notes do say there was a bug in CGEMM for Apple aarch64. Perhaps OpenMathLib/OpenBLAS#4003 can be backported to 0.3.23 and we can test if this issue is still present. The issue is OpenMathLib/OpenBLAS#3995 and seems like is only on M2 and higher. |
@giordano, thanks for the pointers, I'll try. |
I'm having trouble compiling. It must be some stupid path or flag issue, but I'm stuck, so I'd appreciate some advice from anyone with some
fails for me with these last lines
|
Gah, it was staring me in the face, |
That does look like a good candidate, but I believe having a successful bisect to directly identify the fix would be better than guessing, as well educated the guess is 🙂 |
To save some time I focused directly on the two subsequent commits 51dd1339e and efcf71255 (the actual OpenMathLib/OpenBLAS#4003), and I can confirm that in v1.10, the first one fails and the second one doesn't. |
Awesome, thanks for confirming that's it! Now we can backport that single patch. |
@pablosanjose Will you be able to make a PR in Yggdrasil for OpenBLAS and OpenBLAS32 0.3.23? |
I'd love to try! Great learning experience for me, never dipped my toes into the Tree of Life itself :-D. I'll do my best, and I'll ask for help if required. |
In this case all you have to do is to place https://github.com/OpenMathLib/OpenBLAS/commit/efcf71255aaf0c636fc76e5235dde8bb9cf76469.patch in the |
That simple? Ok, done! |
Closes #53054 CC: @giordano @ViralBShah
Tests issue #53054. The same test is included in backport #53074.
Fixed by JuliaLang/julia#53074. |
Closes #53054 CC: @giordano @ViralBShah
Closes #53054 CC: @giordano @ViralBShah
mul!
ofComplexF32
arrays is subtly broken under Apple Silicon in Julia 1.9 and 1.10Here is a MWE
This should give always
true
, but it randomly givesfalse
in the version/OSs above. The deviations of "bad" runs oftest()
(a small percentage of the 10000) is small, but well within the precision ofComplexF32
.This was tested on a Macbook Pro M1 (fails) and a Xeon server under Linux (does not fail). Can someone reproduce?
Notably, current master does not have this problem on macos. I'll try to bisect to see when it was fixed, in case we can backport the patch.
The text was updated successfully, but these errors were encountered: