Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SnoopPrecompile with pkgimages chokes on non-native code #338

Closed
maleadt opened this issue Jan 16, 2023 · 8 comments
Closed

SnoopPrecompile with pkgimages chokes on non-native code #338

maleadt opened this issue Jan 16, 2023 · 8 comments

Comments

@maleadt
Copy link

maleadt commented Jan 16, 2023

I was trying out SnoopPrecompile.jl with CUDA.jl, on Julia 1.9, doing some minimal kernel compilation during precompilation:

@precompile_setup let
    @precompile_all_calls begin
        target = PTXCompilerTarget(; cap=v"7.5.0")
        params = CUDACompilerParams()
        job = CompilerJob(target, FunctionSpec(identity, Tuple{Nothing}, true), params)
        GPUCompiler.code_native(devnull, job)
    end
end

This results in an LLVM-related abort when Julia writes out the package image:

LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.membar.sys

[53782] signal (6.-6): Aborted
in expression starting at none:0
unknown function (ip: 0x7fd1bcc9564c)
gsignal at /usr/lib/libc.so.6 (unknown line)
abort at /usr/lib/libc.so.6 (unknown line)
_ZN4llvm18report_fatal_errorERKNS_5TwineEb at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel15CannotYetSelectEPNS_6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel16SelectCodeCommonEPNS_6SDNodeEPKhj at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel6SelectEPN4llvm6SDNodeE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel22DoInstructionSelectionEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.975 at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/tim/Julia/src/julia/build/dev/usr/bin/../lib/libLLVM-14jl.so (unknown line)
operator() at /home/tim/Julia/src/julia/src/aotcompile.cpp:698
jl_dump_native_impl at /home/tim/Julia/src/julia/src/aotcompile.cpp:710
ijl_write_compiler_output at /home/tim/Julia/src/julia/src/precompile.c:126
ijl_atexit_hook at /home/tim/Julia/src/julia/src/init.c:258
jl_repl_entrypoint at /home/tim/Julia/src/julia/src/jlapi.c:718
main at /home/tim/Julia/src/julia/cli/loader_exe.c:59
unknown function (ip: 0x7fd1bcc3028f)
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at /build/glibc/src/glibc/csu/../sysdeps/x86_64/start.S:115
Allocations: 66463518 (Pool: 66446875; Big: 16643); GC: 87

It looks like Julia is trying to generate host-native code for GPU-only functionality here. After discussing this with @vchuravy, we think this happens because SnoopPrecompile.jl tracks code that's inferred, which includes GPU-code, and queues that up for precompilation. Setting verbose[] = true does indeed show that it compiles GPU-only functionality:

MethodInstance for CUDA.signal_exception()

That function is implemented here, https://github.com/JuliaGPU/CUDA.jl/blob/3d1670c9fe0bd12fb5d44e8427ab50d5f85a3d6a/src/device/runtime.jl#L35-L47, calling threadfence_system() which in turn is implemented using the llvm.nvvm.membar.sys intrinsic.

I guess that we somehow should avoid this code from getting in the pkgimage, for now. Normally we avoid polluting host caches with GPU code by using a custom AbstractInterpreter, and registering that to codegen using the lookup codegen-parameter. Maybe some property derived from this needs to be added to the data in Core.Compiler.Timings._timings so that SnoopPrecompile can decide to skip this code?

@vchuravy
Copy link

Timings should probably exclude or special handle custom abstract interpreters.

cc: @aviatesk

@aviatesk
Copy link
Collaborator

I believe this is basically JuliaLang/julia#48453. If so, GPUCompiler can implement a workaroud on its side. xref: https://github.com/aviatesk/JET.jl/blob/c0adbac27844dd07ee1df7e92689fdc571acba85/src/analyzers/jetanalyzer.jl#L44-L56

@vchuravy
Copy link

We will likely run into that next, but this is more that a custom absint is reporting it's timings and those are taking as truth by SnoopCompile. We should note Wich AbsInt cause a timings entry to create

@timholy
Copy link
Owner

timholy commented Apr 17, 2023

This should be addressed now in Precompiler, since that's the future. (I don't think I can transfer to an organization.) Would a viable solution be to specify a module whose code would not be precompiled?

Note Precompiler/SnoopPrecompile is best for intervening at the "root" of inference, i.e., if the given method is dispatched at runtime. If it's deeper into an inferrable callgraph, we'd have to write code to look for it. In either case, I think the only option would be to nix the entire tree; caching part of an inferrable callgraph but omitting other parts introduces all sorts of trouble.

@maleadt
Copy link
Author

maleadt commented Apr 17, 2023

Would a viable solution be to specify a module whose code would not be precompiled?

GPU and CPU code is often part of the same module; disabling based on the AbstractInterpreter (involved) seems like the best way.

@vchuravy
Copy link

Fixed in JuliaLang/julia#49391 which is also part of 1.9.0-rc3

@maleadt maleadt closed this as completed Apr 26, 2023
@timholy
Copy link
Owner

timholy commented Apr 26, 2023

Thanks!

@timholy
Copy link
Owner

timholy commented Apr 26, 2023

With exclusion, we may need to do some work to ensure whole "inference trees" are really ready to go, cf JuliaLang/julia#35972. But that's clearly not 1.9 work.

vchuravy added a commit to JuliaLang/julia that referenced this issue Apr 15, 2024
…reters (#54069)

Partially reverts #49391

PrecompileTools uses the timing infrastructure to snoop on the inference
process.
The reason for #49391 was that this could lead to accidental pollution
of the caches
with foreign results
(timholy/SnoopCompile.jl#338)

After #52233 and especially #53336 we now filter results by cache owner
and
don't try to cache foreign code using the native pipeline.

Motivated by JuliaGPU/GPUCompiler.jl#567 which
demonstrated
that a foreign code instance would not be cached without
PrecompileTools.
KristofferC pushed a commit to JuliaLang/julia that referenced this issue Apr 17, 2024
…reters (#54069)

Partially reverts #49391

PrecompileTools uses the timing infrastructure to snoop on the inference
process.
The reason for #49391 was that this could lead to accidental pollution
of the caches
with foreign results
(timholy/SnoopCompile.jl#338)

After #52233 and especially #53336 we now filter results by cache owner
and
don't try to cache foreign code using the native pipeline.

Motivated by JuliaGPU/GPUCompiler.jl#567 which
demonstrated
that a foreign code instance would not be cached without
PrecompileTools.

(cherry picked from commit c0611e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants