-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot precompile GPU code with PrecompileTools #2006
Comments
cc @vchuravy |
beorostica
changed the title
Cannot precompile GPU code with SnoopPrecompile
Cannot precompile GPU code with PrecompileTools
Jul 24, 2023
Same here for a module FastAIStartup
using FastAI, FastVision, Metalhead
import FastVision: RGB, N0f8
import PrecompileTools: @setup_workload, @compile_workload
@setup_workload begin
labels = ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
@compile_workload begin
data = ([rand(RGB{N0f8}, 32, 32) for _ in 1:100],
[rand(labels) for _ in 1:100])
blocks = (Image{2}(), FastAI.Label{String}(labels))
task = ImageClassificationSingle(blocks)
learner = tasklearner(task, data, backbone=ResNet(18).layers[1])
fitonecycle!(learner, 2)
end
end
end # module FastAIStartup See the stacktrace(FastAIStartup) pkg> precompile
Precompiling project...
✗ FastAIStartup
0 dependencies successfully precompiled in 32 seconds. 297 already precompiled.
ERROR: The following 1 direct dependency failed to precompile:
FastAIStartup [bf55ac65-409a-4d86-bfc7-3fe70994b7f0]
Failed to precompile FastAIStartup [bf55ac65-409a-4d86-bfc7-3fe70994b7f0] to "/home/romeo/.julia/compiled/v1.10/FastAIStartup/jl_uOXydd".
ERROR: LoadError: LLVM error: Symbol name with unsupported characters
Stacktrace:
[1] handle_error(reason::Cstring)
@ LLVM ~/.julia/packages/LLVM/Od0DH/src/core/context.jl:134
[2] LLVMTargetMachineEmitToMemoryBuffer(T::LLVM.TargetMachine, M::LLVM.Module, codegen::LLVM.API.LLVMCodeGenFileType, ErrorMessage::Base.RefValue{…}, OutMemBuf::Base.RefValue{…})
@ LLVM.API ~/.julia/packages/LLVM/Od0DH/lib/15/libLLVM_h.jl:4326
[3] emit(tm::LLVM.TargetMachine, mod::LLVM.Module, filetype::LLVM.API.LLVMCodeGenFileType)
@ LLVM ~/.julia/packages/LLVM/Od0DH/src/targetmachine.jl:45
[4] mcgen(job::GPUCompiler.CompilerJob, mod::LLVM.Module, format::LLVM.API.LLVMCodeGenFileType)
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/mcgen.jl:72
[5] macro expansion
@ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
[6] macro expansion
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:432 [inlined]
[7] macro expansion
@ GPUCompiler ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
[8] macro expansion
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:429 [inlined]
[9]
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:89
[10] emit_asm
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/utils.jl:83 [inlined]
[11]
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:149
[12] codegen
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:110 [inlined]
[13]
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:106
[14] compile
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:98 [inlined]
[15] #1037
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:104 [inlined]
[16] JuliaContext(f::CUDA.var"#1037#1040"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/driver.jl:47
[17] compile(job::GPUCompiler.CompilerJob)
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/compilation.jl:103
[18] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:125
[19] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
@ GPUCompiler ~/.julia/packages/GPUCompiler/YO8Uj/src/execution.jl:103
[20] macro expansion
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:318 [inlined]
[21] macro expansion
@ CUDA ./lock.jl:267 [inlined]
[22] cufunction(f::GPUArrays.var"#broadcast_kernel#26", tt::Type{Tuple{…}}; kwargs::@Kwargs{})
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:313
[23] cufunction
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:310 [inlined]
[24] macro expansion
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/compiler/execution.jl:104 [inlined]
[25] #launch_heuristic#1080
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:17 [inlined]
[26] launch_heuristic
@ CUDA ~/.julia/packages/CUDA/tVtYo/src/gpuarrays.jl:15 [inlined]
[27] _copyto!
@ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:65 [inlined]
[28] copyto!
@ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:46 [inlined]
[29] copy
@ GPUArrays ~/.julia/packages/GPUArrays/5XhED/src/host/broadcast.jl:37 [inlined]
[30] materialize
@ Base.Broadcast ./broadcast.jl:903 [inlined]
[31] broadcast_preserving_zero_d
@ Base.Broadcast ./broadcast.jl:892 [inlined]
[32] +(A::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, B::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
@ Base ./arraymath.jl:8
[33] add_sum
@ Base ./reduce.jl:24 [inlined]
[34] BottomRF
@ Base ./reduce.jl:86 [inlined]
[35] afoldl
@ Base ./operators.jl:543 [inlined]
[36] _foldl_impl
@ Base ./reduce.jl:68 [inlined]
[37] foldl_impl
@ Base ./reduce.jl:48 [inlined]
[38] mapfoldl_impl
@ Base ./reduce.jl:44 [inlined]
[39] mapfoldl
@ Base ./reduce.jl:175 [inlined]
[40] mapreduce
@ Base ./reduce.jl:307 [inlined]
[41] sum
@ Base ./reduce.jl:535 [inlined]
[42] sum
@ Base ./reduce.jl:564 [inlined]
[43] rrule
@ ChainRules ~/.julia/packages/ChainRules/9sNmB/src/rulesets/Base/mapreduce.jl:25 [inlined]
[44] rrule
@ ChainRulesCore ~/.julia/packages/ChainRulesCore/0t04l/src/rules.jl:134 [inlined]
[45] chain_rrule
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/chainrules.jl:223 [inlined]
[46] macro expansion
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0 [inlined]
[47] _pullback
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:81 [inlined]
[48] addact
@ Metalhead ~/.julia/packages/Metalhead/qOYEz/src/utilities.jl:19 [inlined]
[49] _apply
@ Core ./boot.jl:836 [inlined]
[50] adjoint
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
[51] _pullback
@ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
[52] #_#1
@ Zygote ~/.julia/packages/PartialFunctions/LzDRN/src/PartialFunctions.jl:24 [inlined]
[53] _pullback(::Zygote.Context{…}, ::PartialFunctions.var"##_#1", ::@Kwargs{}, ::PartialFunctions.PartialFunction{…}, ::CUDA.CuArray{…}, ::CUDA.CuArray{…})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
[54] _apply(::Function, ::Vararg{Any})
@ Core ./boot.jl:836
[55] adjoint
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
[56] _pullback
@ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
[57] PartialFunction
@ Zygote ~/.julia/packages/PartialFunctions/LzDRN/src/PartialFunctions.jl:24 [inlined]
[58] _pullback(::Zygote.Context{…}, ::PartialFunctions.PartialFunction{…}, ::CUDA.CuArray{…}, ::CUDA.CuArray{…})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
[59] _apply(::Function, ::Vararg{Any})
@ Core ./boot.jl:836
[60] adjoint
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/lib/lib.jl:203 [inlined]
[61] _pullback
@ Zygote ~/.julia/packages/ZygoteRules/OgCVT/src/adjoint.jl:66 [inlined]
[62] Parallel
@ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:527 [inlined]
[63] _pullback(ctx::Zygote.Context{…}, f::Flux.Parallel{…}, args::CUDA.CuArray{…})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
[64] macro expansion
@ Flux ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:53 [inlined]
[65] _applychain
@ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:53 [inlined]
[66] _pullback(::Zygote.Context{…}, ::typeof(Flux._applychain), ::Tuple{…}, ::CUDA.CuArray{…})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
[67] Chain
@ Zygote ~/.julia/packages/Flux/n3cOc/src/layers/basic.jl:51 [inlined]
--- the last 5 lines are repeated 2 more times ---
[78] _pullback(ctx::Zygote.Context{true}, f::Flux.Chain{Tuple{…}}, args::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
[79] #74
@ Zygote ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:54 [inlined]
[80] _pullback(ctx::Zygote.Context{…}, f::FluxTraining.var"#74#76"{…}, args::Flux.Chain{…})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
[81] #77
@ Zygote ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:70 [inlined]
[82] _pullback(::Zygote.Context{true}, ::FluxTraining.var"#77#78"{FluxTraining.var"#74#76"{…}, Flux.Chain{…}})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface2.jl:0
[83] pullback(f::Function, ps::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:384
[84] gradient(f::Function, args::Zygote.Params{Zygote.Buffer{Any, Vector{Any}}})
@ Zygote ~/.julia/packages/Zygote/JeHtr/src/compiler/interface.jl:96
[85] _gradient(f::FluxTraining.var"#74#76"{…}, ::Flux.Optimise.Adam, m::Flux.Chain{…}, ps::Zygote.Params{…})
@ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:70
[86] (::FluxTraining.var"#73#75"{…})(handle::FluxTraining.var"#handlefn#82"{…}, state::FluxTraining.PropDict{…})
@ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:53
[87] runstep(stepfn::FluxTraining.var"#73#75"{…}, learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase, initialstate::@NamedTuple{…})
@ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:133
[88] step!(learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase, batch::Tuple{…})
@ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:51
[89] (::FluxTraining.var"#71#72"{FluxTraining.Learner, FluxTraining.Phases.TrainingPhase, MLUtils.DataLoader{…}})(::Function)
@ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:24
[90] runepoch(epochfn::FluxTraining.var"#71#72"{…}, learner::FluxTraining.Learner, phase::FluxTraining.Phases.TrainingPhase)
@ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:105
[91] epoch!
@ FluxTraining ~/.julia/packages/FluxTraining/vSpWs/src/training.jl:22 [inlined]
[92] (::FastAI.var"#157#159"{Tuple{Pair{…}, Pair{…}}, FluxTraining.Learner, Int64})()
@ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:31
[93] withcallbacks(f::FastAI.var"#157#159"{…}, learner::FluxTraining.Learner, callbacks::FluxTraining.Scheduler)
@ FastAI ~/.julia/packages/FastAI/f27xT/src/training/utils.jl:77
[94] #156
@ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:28 [inlined]
[95] withfields(f::FastAI.var"#156#158"{…}, x::FluxTraining.Learner; kwargs::@Kwargs{…})
@ FastAI ~/.julia/packages/FastAI/f27xT/src/training/utils.jl:52
[96] fitonecycle!(learner::FluxTraining.Learner, nepochs::Int64, maxlr::Float64; phases::Tuple{…}, wd::Float64, kwargs::@Kwargs{})
@ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:27
[97] fitonecycle!(learner::FluxTraining.Learner, nepochs::Int64, maxlr::Float64)
@ FastAI ~/.julia/packages/FastAI/f27xT/src/training/onecycle.jl:16
[98] macro expansion
@ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:14 [inlined]
[99] macro expansion
@ ~/.julia/packages/PrecompileTools/0yi7r/src/workloads.jl:74 [inlined]
[100] macro expansion
@ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:8 [inlined]
[101] macro expansion
@ ~/.julia/packages/PrecompileTools/0yi7r/src/workloads.jl:136 [inlined]
[102] top-level scope
@ ~/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:6
[103] include
@ Base ./Base.jl:489 [inlined]
[104] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{…}, dl_load_path::Vector{…}, load_path::Vector{…}, concrete_deps::Vector{…}, source::Nothing)
@ Base ./loading.jl:2216
in expression starting at /home/romeo/Documents/julia_playground/FastAIStartup.jl/src/FastAIStartup.jl:1in expression starting at stdin:3 |
FYI this seems to be fixed now, although I haven't run extensive tests. But the code snippet I posted above runs. versioninfo()julia> versioninfo()
Julia Version 1.10.0-beta2
Commit a468aa198d0 (2023-08-17 06:27 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, tigerlake)
Threads: 1 on 16 virtual cores
(FastAIStartup) pkg> st
Project FastAIStartup v0.1.0
Status `~/Documents/julia_playground/FastAIStartup.jl/Project.toml`
⌃ [5d0beca9] FastAI v0.5.1
[7bf02486] FastVision v0.1.1
⌅ [587475ba] Flux v0.13.17
⌃ [dbeba491] Metalhead v0.8.2
⌃ [aea7be01] PrecompileTools v1.1.2
[02a925ec] cuDNN v1.1.0 `https://github.com/JuliaGPU/CUDA.jl.git:lib/cudnn#master`
|
Great, thanks for reporting back! |
Thank you @RomeoV! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm using PrecompileTools for precompiling some functions that use CUDA in a repo I'm working on (KomaMRICore).
Particularly, when a precompile the "simulate()" function in my devolpement environment with the GPU enabled like so:
and then performing the same workload in the julia REPL of my development environment:
I get the error
JIT session error: Symbols not found: [ __nv_hypotf ]
.Note that this problem doesn't show up when the cpu is used instead of the gpu (by setting simParams["gpu"] = false).
This problem seem to be related with this issue #1870 CUDA, which was solved by adding directly some changes in the julia repo (which apparently is already part of 1.9.0-rc3, see issue #338 SnoopCompile, so it should work for julia 1.9.2, I'm not completely sure though).
Any suggestion for solving this or some pointers on how to continue to debug this issue?
The text was updated successfully, but these errors were encountered: