Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A precompilation error on the latest assertion build #45302

Closed
aviatesk opened this issue May 13, 2022 · 20 comments
Closed

A precompilation error on the latest assertion build #45302

aviatesk opened this issue May 13, 2022 · 20 comments

Comments

@aviatesk
Copy link
Sponsor Member

aviatesk commented May 13, 2022

Especially I found a precompilation on LoopVectorization fails on the latest assertion build with the following error:

❯ echo """
FORCE_ASSERTIONS=1
LLVM_ASSERTIONS=1
""" >! Make.user && make cleanall && make
[...]

❯ ./usr/bin/julia -e 'using Pkg; Pkg.add("LoopVectorization")'
[...]
Precompiling project...
  ✗ LoopVectorization
  ✗ TriangularSolve
  ✗ RecursiveFactorization
  118 dependencies successfully precompiled in 100 seconds

ERROR: The following 2 direct dependencies failed to precompile:

RecursiveFactorization [f2c3362d-daeb-58d1-803e-2bc74f2840b4]

Failed to precompile RecursiveFactorization [f2c3362d-daeb-58d1-803e-2bc74f2840b4] to ~/.julia/compiled/v1.9/RecursiveFactorization/jl_aoQ7V0.
Assertion failed: (New->getType() == getType() && "replaceAllUses of value with new value of different type!"), function doRAUW, file /workspace/srcdir/llvm-project/llvm/lib/IR/Value.cpp, line 494.

signal (6): Abort trap: 6
in expression starting at ~/.julia/packages/LoopVectorization/rmlXk/src/LoopVectorization.jl:239
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
pthread_kill at /usr/lib/system/libsystem_pthread.dylib (unknown line)
abort at /usr/lib/system/libsystem_c.dylib (unknown line)
__assert_rtn at /usr/lib/system/libsystem_c.dylib (unknown line)
_ZN4llvm5Value6doRAUWEPS0_NS0_19ReplaceMetadataUsesE at ~/julia/julia4/usr/lib/libLLVM.dylib (unknown line)
operator() at ~/julia/julia4/src/jitlayers.cpp:1353 [inlined]
withModuleDo<(lambda at ~/julia/julia4/src/jitlayers.cpp:1288:29)> at ~/julia/julia4/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:136 [inlined]
operator() at ~/julia/julia4/src/jitlayers.cpp:1288 [inlined]
withModuleDo<(lambda at ~/julia/julia4/src/jitlayers.cpp:1287:26)> at ~/julia/julia4/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:136 [inlined]
jl_merge_module at ~/julia/julia4/src/jitlayers.cpp:1287
emit_function at ~/julia/julia4/src/codegen.cpp:8008
jl_emit_code at ~/julia/julia4/src/codegen.cpp:8049
jl_emit_codeinst at ~/julia/julia4/src/codegen.cpp:8097
_jl_compile_codeinst at ~/julia/julia4/src/jitlayers.cpp:127
jl_generate_fptr_impl at ~/julia/julia4/src/jitlayers.cpp:357
jl_compile_method_internal at ~/julia/julia4/src/gf.c:2106
_jl_invoke at ~/julia/julia4/src/gf.c:2384
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
jl_apply at ~/julia/julia4/src/./julia.h:1840 [inlined]
do_call at ~/julia/julia4/src/interpreter.c:126
eval_body at ~/julia/julia4/src/interpreter.c:0
jl_interpret_toplevel_thunk at ~/julia/julia4/src/interpreter.c:750
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:912
jl_eval_module_expr at ~/julia/julia4/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:715
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:856
ijl_toplevel_eval at ~/julia/julia4/src/toplevel.c:921 [inlined]
ijl_toplevel_eval_in at ~/julia/julia4/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1317
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
_include at ./loading.jl:1377
include at ./Base.jl:427 [inlined]
include_package_for_output at ./loading.jl:1443
jfptr_include_package_for_output_34216 at ~/julia/julia4/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
jl_apply at ~/julia/julia4/src/./julia.h:1840 [inlined]
do_call at ~/julia/julia4/src/interpreter.c:126
eval_body at ~/julia/julia4/src/interpreter.c:0
jl_interpret_toplevel_thunk at ~/julia/julia4/src/interpreter.c:750
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:912
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:856
ijl_toplevel_eval at ~/julia/julia4/src/toplevel.c:921 [inlined]
ijl_toplevel_eval_in at ~/julia/julia4/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1317
include_string at ./loading.jl:1327
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
exec_options at ./client.jl:301
_start at ./client.jl:518
jfptr__start_35441 at ~/julia/julia4/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
jl_apply at ~/julia/julia4/src/./julia.h:1840 [inlined]
true_main at ~/julia/julia4/src/jlapi.c:566
jl_repl_entrypoint at ~/julia/julia4/src/jlapi.c:710
Allocations: 30100446 (Pool: 30085864; Big: 14582); GC: 41
ERROR: LoadError: Failed to precompile LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890] to ~/.julia/compiled/v1.9/LoopVectorization/jl_11E50M.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::IO, internal_stdout::IO, ignore_loaded_modules::Bool)
    @ Base ./loading.jl:1594
  [3] compilecache
    @ ./loading.jl:1538 [inlined]
  [4] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:1239
  [5] _require_prelocked(uuidkey::Base.PkgId)
    @ Base ./loading.jl:1112
  [6] macro expansion
    @ ./loading.jl:1092 [inlined]
  [7] macro expansion
    @ ./lock.jl:267 [inlined]
  [8] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1056
  [9] include(mod::Module, _path::String)
    @ Base ./Base.jl:427
 [10] include(x::String)
    @ RecursiveFactorization ~/.julia/packages/RecursiveFactorization/ZLxNf/src/RecursiveFactorization.jl:1
 [11] top-level scope
    @ ~/.julia/packages/RecursiveFactorization/ZLxNf/src/RecursiveFactorization.jl:3
 [12] include
    @ ./Base.jl:427 [inlined]
 [13] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
    @ Base ./loading.jl:1443
 [14] top-level scope
    @ stdin:1
in expression starting at ~/.julia/packages/RecursiveFactorization/ZLxNf/src/lu.jl:1
in expression starting at ~/.julia/packages/RecursiveFactorization/ZLxNf/src/RecursiveFactorization.jl:1
in expression starting at stdin:1

LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890]

Failed to precompile LoopVectorization [bdcacae8-1622-11e9-2a5c-532679323890] to ~/.julia/compiled/v1.9/LoopVectorization/jl_uDbP3F.
Assertion failed: (New->getType() == getType() && "replaceAllUses of value with new value of different type!"), function doRAUW, file /workspace/srcdir/llvm-project/llvm/lib/IR/Value.cpp, line 494.

signal (6): Abort trap: 6
in expression starting at ~/.julia/packages/LoopVectorization/rmlXk/src/LoopVectorization.jl:239
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
pthread_kill at /usr/lib/system/libsystem_pthread.dylib (unknown line)
abort at /usr/lib/system/libsystem_c.dylib (unknown line)
__assert_rtn at /usr/lib/system/libsystem_c.dylib (unknown line)
_ZN4llvm5Value6doRAUWEPS0_NS0_19ReplaceMetadataUsesE at ~/julia/julia4/usr/lib/libLLVM.dylib (unknown line)
operator() at ~/julia/julia4/src/jitlayers.cpp:1353 [inlined]
withModuleDo<(lambda at ~/julia/julia4/src/jitlayers.cpp:1288:29)> at ~/julia/julia4/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:136 [inlined]
operator() at ~/julia/julia4/src/jitlayers.cpp:1288 [inlined]
withModuleDo<(lambda at ~/julia/julia4/src/jitlayers.cpp:1287:26)> at ~/julia/julia4/usr/include/llvm/ExecutionEngine/Orc/ThreadSafeModule.h:136 [inlined]
jl_merge_module at ~/julia/julia4/src/jitlayers.cpp:1287
emit_function at ~/julia/julia4/src/codegen.cpp:8008
jl_emit_code at ~/julia/julia4/src/codegen.cpp:8049
jl_emit_codeinst at ~/julia/julia4/src/codegen.cpp:8097
_jl_compile_codeinst at ~/julia/julia4/src/jitlayers.cpp:127
jl_generate_fptr_impl at ~/julia/julia4/src/jitlayers.cpp:357
jl_compile_method_internal at ~/julia/julia4/src/gf.c:2106
_jl_invoke at ~/julia/julia4/src/gf.c:2384
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
jl_apply at ~/julia/julia4/src/./julia.h:1840 [inlined]
do_call at ~/julia/julia4/src/interpreter.c:126
eval_body at ~/julia/julia4/src/interpreter.c:0
jl_interpret_toplevel_thunk at ~/julia/julia4/src/interpreter.c:750
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:912
jl_eval_module_expr at ~/julia/julia4/src/toplevel.c:203 [inlined]
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:715
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:856
ijl_toplevel_eval at ~/julia/julia4/src/toplevel.c:921 [inlined]
ijl_toplevel_eval_in at ~/julia/julia4/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1317
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
_include at ./loading.jl:1377
include at ./Base.jl:427 [inlined]
include_package_for_output at ./loading.jl:1443
jfptr_include_package_for_output_34194 at ~/julia/julia4/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
jl_apply at ~/julia/julia4/src/./julia.h:1840 [inlined]
do_call at ~/julia/julia4/src/interpreter.c:126
eval_body at ~/julia/julia4/src/interpreter.c:0
jl_interpret_toplevel_thunk at ~/julia/julia4/src/interpreter.c:750
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:912
jl_toplevel_eval_flex at ~/julia/julia4/src/toplevel.c:856
ijl_toplevel_eval at ~/julia/julia4/src/toplevel.c:921 [inlined]
ijl_toplevel_eval_in at ~/julia/julia4/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1317
include_string at ./loading.jl:1327
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
exec_options at ./client.jl:301
_start at ./client.jl:518
jfptr__start_35441 at ~/julia/julia4/usr/lib/julia/sys.dylib (unknown line)
_jl_invoke at ~/julia/julia4/src/gf.c:2373
ijl_apply_generic at ~/julia/julia4/src/gf.c:2574
jl_apply at ~/julia/julia4/src/./julia.h:1840 [inlined]
true_main at ~/julia/julia4/src/jlapi.c:566
jl_repl_entrypoint at ~/julia/julia4/src/jlapi.c:710
Allocations: 30100384 (Pool: 30085806; Big: 14578); GC: 42

Stacktrace:
 [1] pkgerror(msg::String)
   @ Pkg.Types ~/julia/julia4/usr/share/julia/stdlib/v1.9/Pkg/src/Types.jl:68
 [2] precompile(ctx::Pkg.Types.Context, pkgs::Vector{String}; internal_call::Bool, strict::Bool, warn_loaded::Bool, already_instantiated::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Pkg.API ~/julia/julia4/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:1427
 [3] precompile
   @ ~/julia/julia4/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:1058 [inlined]
 [4] #precompile#225
   @ ~/julia/julia4/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:1057 [inlined]
 [5] precompile (repeats 2 times)
   @ ~/julia/julia4/usr/share/julia/stdlib/v1.9/Pkg/src/API.jl:1057 [inlined]
 [6] top-level scope
   @ none:1

The error disappears if we build without the assertion flags.

@aviatesk
Copy link
Sponsor Member Author

It seems like this doesn't happen on Linux? I couldn't reproduce it on:

Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7502 32-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, znver2)
  Threads: 1 on 128 virtual cores

@Keno
Copy link
Member

Keno commented May 13, 2022

@gbaraldi I suspect this may be related to the allocation hoisting change. Could you take a look?

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 13, 2022

Having written something similar to recursively_adjust_ptr_type a few times lately, I suspect that function might be incorrect. I don't know why it would show up here though, rather than being caught by the verifier.

@gbaraldi
Copy link
Member

@aviatesk what OS did you use to get the bug? Because on my mac m1 I just works 🤔

@aviatesk
Copy link
Sponsor Member Author

I don't have the access to my mac atm, but it's not using M1 tip at least.

@gbaraldi
Copy link
Member

It doesn't look to be the allocation hoisting pr. Will bisect

@gbaraldi
Copy link
Member

While I couldn't get the precompile errors, I can get them when running the LoopVectorization tests, doing that I bisected to 4422a1d. @aviatesk can you confirm that on the precompile error? I don't have an intel mac to test.

@aviatesk
Copy link
Sponsor Member Author

Thanks so much for the bisect, @gbaraldi ! Just confirmed 4422a1d still yields the precompilation error but 48ae154 (that is the one just before) doesn't result in the error.

@pchintalapudi
Copy link
Member

The error is probably in VectorizationBase.jl here, where there are 3 typ in a list instead of 2: https://github.com/JuliaSIMD/VectorizationBase.jl/blob/9ae47c5c12c7bec9b57c132c24f67c4f14090c9e/src/llvm_intrin/masks.jl#L336

@Keno
Copy link
Member

Keno commented May 14, 2022

Why doesn't the IR verifier catch that?

@pchintalapudi
Copy link
Member

The IR verification currently runs in a few optimization passes only, and furthermore cannot catch every possible IR error (I've seen mismatched context errors on attributes escape detection for a while). I haven't added it to codegen because I'm not sure it produces 100% valid IR until after all modules have been merged due to globals and such (more likely I haven't figured out the validity point). Since this error happens during module merging, I imagine it would be difficult to detect unless we added a verification step prior to module merge, which would probably cause a large slowdown in debug builds due to number of modules.

@Keno
Copy link
Member

Keno commented May 15, 2022

Fair enough. I thought this was in codegen, but if it's during module merging that makes sense.

@gbaraldi
Copy link
Member

I wonder if we should run pkgeval with assertions on. This commit was out for quite a while before this was found. Not sure how much it would add to the runtime of the test but it would've caught this a lot earlier.

@maleadt
Copy link
Member

maleadt commented May 16, 2022

I wonder if we should run pkgeval with assertions on.

FYI, that's already possible by specifying buildflags to the @nanosoldier invocation. See https://github.com/JuliaCI/Nanosoldier.jl#trigger-syntax-1=

@aviatesk
Copy link
Sponsor Member Author

@chriselrod can you fix this problem on LoopVectorization.jl side?
c.f.: #45302 (comment)

@chriselrod
Copy link
Contributor

JuliaSIMD/VectorizationBase.jl@96c5a7d

@chriselrod
Copy link
Contributor

chriselrod commented May 19, 2022

Note that @llvm.x86.bmi.bzhi is x86 only (so that code path shouldn't have been hit on M1 macs). Furthermore, LV/VB should only be hitting that codepath on AVX512 machines.
Without AVX512, it should be preferring to generate masks differently (via comparison instructions).

This likely explains difficulties reproducing the problem, e.g. on znver2.

@aviatesk
Copy link
Sponsor Member Author

Given the difficulty for the verifier to catch this error, should we close this?

Or do we want to discuss whether we should enable the assertions by default on nanosoldier?

@Keno
Copy link
Member

Keno commented May 19, 2022

For this particular case, couldn't we have called the intrinsic with ccall and the llvmcall calling convention? That might have allowed us to do better validation that the number of arguments is correct.

@aviatesk
Copy link
Sponsor Member Author

Close this since the original issue has beee resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants