-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incompatible with MKL.jl #1683
Comments
It works with MKL.jl version 0.5.0 (current is 0.7.0), so this is probably the culprit: JuliaLinearAlgebra/MKL.jl#104 |
Enzyme also works with JuliaLinearAlgebra/MKL.jl#164, which was reverted and never part of a release. Paging @ViralBShah and @amontoison---this LP64/ILP64 stuff seems a bit inscrutable |
Basically, with With |
Out of curiosity, can you try some other linear algebra function, like a matvec? |
@staticfloat When we get this error with a missing forward, is it possible to print which symbol is problematic? |
Plain old
using Enzyme, LinearAlgebra
N = 3
const A = rand(1, N)
@show A
f(x) = only(A * x)
x = ones(N)
@show f(x)
@show gradient(Reverse, f, x)
using MKL
@show f(x)
@show gradient(Reverse, f, x)
|
It is using the fallback BLAS replacement - which suggests that MKL is not being used. What do you get with |
No here it is not, most likely. Theres probably a conditional check if it’s symmetric to either use symv or gemv — using the symv fallback only in that case and gemv here (decided by runtime value) |
Something while loading Enzyme must be resetting the forwarding tables within LBT. @wsmoses Does Enzyme do anything special with BLAS in Julia? Does it use |
Note that regular BLAS calls keep working regardless of package combination. Enzyme doesn't break anything outside itself. It's just Enzyme's own gradient that stops working when MKL is loaded, for certain BLAS functions. Could it be that Enzyme's BLAS rules use the symbols exposed by OpenBLAS instead of those from LBT? |
The symbol names in openblas and MKL are the same (took a long time to get those changes into MKL and Apple). The error you are seeing is actually coming from LBT - which happens when some LBT function does not point to an appropriate BLAS function. |
Confirming that Enzyme works if JuliaLinearAlgebra/MKL.jl#164 is reinstated but everything else from the current release of MKL.jl is kept. In particular, if I |
I used the hack from MKL.jl's test suite to dump a stack trace instead of the function debug_missing_function()
println("Missing BLAS/LAPACK function!")
display(stacktrace())
end
BLAS.lbt_set_default_func(@cfunction(debug_missing_function, Cvoid, ())) Here's the stack trace I get from the MWE:
For convenience, here's the implicated line in the revision I'm running: https://github.com/JuliaLang/julia/blob/v1.10.4/stdlib/LinearAlgebra/src/blas.jl#L345 As far as I can tell, the primal call goes through the same line, so it seems mysterious that it only errors when called from inside |
OK, I'm beginning to see what's happening here.
The bug only affects It's not clear to me what the appropriate fix is. You can't really blame Enzyme for being consistent, although I guess a Julia-specific custom rule could be added to Enzyme.jl. Another solution would be for LBT to expand its collection of adapters to at least cover |
@danielwe axpy only possibly calls axpy and dot, so issues re recursive explosion aren't as concerning: https://github.com/EnzymeAD/Enzyme/blob/3d97f1742790e0d03977dab1f15108b4b3fd12da/enzyme/Enzyme/BlasDerivatives.td#L219 |
copy uses scal and dot. So worst case set is just copy/axpy/scal (in addition to dot) https://github.com/EnzymeAD/Enzyme/blob/3d97f1742790e0d03977dab1f15108b4b3fd12da/enzyme/Enzyme/BlasDerivatives.td#L253 |
That's good news! What I find confusing is that both copy and axpy take an integer argument for the number of elements to act on, so MKL should be exporting
|
I just want you guys to know that I've spent the day looking into this (and building better tooling for LBT to make this easier to discover). The issue is that the name mangling rules they chose for the cblas symbols is different enough from their fortran symbols that LBT doesn't find them correctly. I'll figure out the right way to fix this soon. |
As @danielwe noted, @staticfloat Will Apple Accelerate have similar issues too? |
Alright, I think I fixed the issue. If it is convenient, please try out JuliaLinearAlgebra/libblastrampoline#137 by building it and replacing the
I already solved the analogous issues with Accelerate. |
Confirming that the patched |
Fantastic. I'll try to push a new LBT out soon. |
@wsmoses Looks like nothing needs to be done on the Enzyme end, so I'm closing this |
This includes support to properly forward MKL v2024's ILP64 CBLAS symbols, which fixes this Enzyme issue [0]. [0]: EnzymeAD/Enzyme.jl#1683
Julia PR: JuliaLang/julia#55330 |
This includes support to properly forward MKL v2024's ILP64 CBLAS symbols, which fixes this [Enzyme issue](EnzymeAD/Enzyme.jl#1683)
This includes support to properly forward MKL v2024's ILP64 CBLAS symbols, which fixes this [Enzyme issue](EnzymeAD/Enzyme.jl#1683) (cherry picked from commit 602b582)
This includes support to properly forward MKL v2024's ILP64 CBLAS symbols, which fixes this [Enzyme issue](EnzymeAD/Enzyme.jl#1683) (cherry picked from commit 602b582)
This includes support to properly forward MKL v2024's ILP64 CBLAS symbols, which fixes this [Enzyme issue](EnzymeAD/Enzyme.jl#1683)
After loading MKL, I get
ERROR: Error: no BLAS/LAPACK library loaded!
and no gradient for BLAS-invoking functions. Before loading MKL it works as expected. MWE:This is in a clean environment with current Enzyme, 0.12.25.
julia> versioninfo() Julia Version 1.10.4 Commit 48d4fd48430 (2024-06-04 10:41 UTC) Build Info: Official https://julialang.org/ release Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 8 × Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-15.0.7 (ORCJIT, skylake) Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
The text was updated successfully, but these errors were encountered: