-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SnoopPrecompile with pkgimages chokes on non-native code #338
Comments
Timings should probably exclude or special handle custom abstract interpreters. cc: @aviatesk |
I believe this is basically JuliaLang/julia#48453. If so, GPUCompiler can implement a workaroud on its side. xref: https://github.com/aviatesk/JET.jl/blob/c0adbac27844dd07ee1df7e92689fdc571acba85/src/analyzers/jetanalyzer.jl#L44-L56 |
We will likely run into that next, but this is more that a custom absint is reporting it's timings and those are taking as truth by SnoopCompile. We should note Wich AbsInt cause a timings entry to create |
This should be addressed now in Precompiler, since that's the future. (I don't think I can transfer to an organization.) Would a viable solution be to specify a module whose code would not be precompiled? Note Precompiler/SnoopPrecompile is best for intervening at the "root" of inference, i.e., if the given method is dispatched at runtime. If it's deeper into an inferrable callgraph, we'd have to write code to look for it. In either case, I think the only option would be to nix the entire tree; caching part of an inferrable callgraph but omitting other parts introduces all sorts of trouble. |
GPU and CPU code is often part of the same module; disabling based on the AbstractInterpreter (involved) seems like the best way. |
Fixed in JuliaLang/julia#49391 which is also part of |
Thanks! |
With exclusion, we may need to do some work to ensure whole "inference trees" are really ready to go, cf JuliaLang/julia#35972. But that's clearly not 1.9 work. |
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (timholy/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools.
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (timholy/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools. (cherry picked from commit c0611e8)
I was trying out SnoopPrecompile.jl with CUDA.jl, on Julia 1.9, doing some minimal kernel compilation during precompilation:
This results in an LLVM-related abort when Julia writes out the package image:
It looks like Julia is trying to generate host-native code for GPU-only functionality here. After discussing this with @vchuravy, we think this happens because SnoopPrecompile.jl tracks code that's inferred, which includes GPU-code, and queues that up for precompilation. Setting
verbose[] = true
does indeed show that it compiles GPU-only functionality:That function is implemented here, https://github.com/JuliaGPU/CUDA.jl/blob/3d1670c9fe0bd12fb5d44e8427ab50d5f85a3d6a/src/device/runtime.jl#L35-L47, calling
threadfence_system()
which in turn is implemented using thellvm.nvvm.membar.sys
intrinsic.I guess that we somehow should avoid this code from getting in the pkgimage, for now. Normally we avoid polluting host caches with GPU code by using a custom AbstractInterpreter, and registering that to codegen using the
lookup
codegen-parameter. Maybe some property derived from this needs to be added to the data inCore.Compiler.Timings._timings
so that SnoopPrecompile can decide to skip this code?The text was updated successfully, but these errors were encountered: