Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in subtyping/method lookup that causes unreacheable reached #53366

Closed
kpamnany opened this issue Feb 16, 2024 · 13 comments
Closed

Bug in subtyping/method lookup that causes unreacheable reached #53366

kpamnany opened this issue Feb 16, 2024 · 13 comments
Labels
bug Indicates an unexpected problem or unintended behavior types and dispatch Types, subtyping and method dispatch

Comments

@kpamnany
Copy link
Contributor

Testing with Julia 1.10.1 + some patches, we see:

...
Unreachable reached at 0x7fff395e787b
make: *** [Makefile:160: hydra/benchmark.micro.sorting] Illegal instruction (core dumped)
...

@Drvi then ran this test with a debug build of Julia and got:

φ node 207 is not at the beginning of the basic block 53
Internal error: encountered unexpected error in runtime:
MethodError(f=Base.Multimedia.display, args=(Core.Compiler.IRCode(stmts=Core.Compiler.InstructionStream(inst=Array{Any, (272,)}[
  Core.UpsilonNode(val=Core.Argument(n=2)),
  Expr(:enter, 76),
  Expr(:call, Base.getfield, Core.Argument(n=2), :(:cond)),
...

We will share the entire error separately.

@kpamnany
Copy link
Contributor Author

Attn: @vchuravy and @Keno

@vchuravy vchuravy added compiler:codegen Generation of LLVM IR and native code compiler:inference Type inference bug Indicates an unexpected problem or unintended behavior labels Feb 16, 2024
@topolarity
Copy link
Member

Do you have an MWE?

φ node 207 is not at the beginning of the basic block 53

This indicates that Julia produced invalid IR at some point.

@vchuravy vchuravy added the needs more info Clarification or a reproducible example is required label Feb 23, 2024
@kpamnany
Copy link
Contributor Author

I spent some time on this with @gbaraldi and we were able to find the broken IR and also narrow the problem down further. He should also be able to reproduce the error.

Over the last few days, we found that the PackageCompiler step for our system occasionally segfaults with no stack trace. Trying this with a debug build of Julia 1.10.1 with LLVM assertions on reveals many such invalid IR errors:

...
φ node 163 is not at the beginning of the basic block 39
...
φ node 207 is not at the beginning of the basic block 53
...
φ node 320 is not at the beginning of the basic block 105
...
φ node 371 is not at the beginning of the basic block 120
...
φ node 436 is not at the beginning of the basic block 135
...

This has blocked our plan to upgrade to Julia 1.10.1 this week. :(

@kpamnany
Copy link
Contributor Author

Cc: @KristofferC whom @Drvi had pinged about this, and @Keno, who added this error report.

@topolarity
Copy link
Member

I spent some time on this with @gbaraldi and we were able to find the broken IR and also narrow the problem down further. He should also be able to reproduce the error.

Do you have some IR or an MWE? There's still no code attached to this issue...

@kpamnany
Copy link
Contributor Author

kpamnany commented Feb 26, 2024

Shared privately.

Edited to add: sorry for the brevity here; there's more discussion on Slack. To summarize, we have not yet been able to reduce the code that triggers this error enough to share it here -- the "MWE" still contains a bunch of proprietary code.

However, @gbaraldi is able to reproduce, and hopefully the problem will be tracked down quickly!

@gbaraldi
Copy link
Member

So me and @topolarity reduced this to

function flush!(sc::Threads.Condition)
    @lock sc begin
        try
            if Core.Compiler.inferencebarrier(true)
                return nothing
            end

            return nothing
        finally
        end
    end
end
code_typed(flush!, (Threads.Condition,))

Note that this does not fail if you do a manual macro expansion. It also does not fail on master, so somewhere along the line we fixed this.

@topolarity topolarity added compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) and removed needs more info Clarification or a reproducible example is required compiler:codegen Generation of LLVM IR and native code compiler:inference Type inference labels Feb 27, 2024
@NHDaly
Copy link
Member

NHDaly commented Feb 28, 2024

Awesome! Thanks for that! Is the issue present on 1.9 as well? Can you see from the code that's being generated, even though the assertion isn't present in 1.9?

@topolarity
Copy link
Member

The problem with the bad ϕ-nodes has been resolved, but this issue persists unfortunately.

@gbaraldi
Copy link
Member

gbaraldi commented Mar 1, 2024

So after @topolarity and me stared at this for way too long, this got minimized to

struct Foo{T} end
struct Bar{T} end
struct Baz{T} end
struct Lala{T} end

function swap!(pg::Foo{T}, page::Bar{Baz{T}}) where T
        return nothing
end

TT1 = Tuple{
    typeof(swap!),
    Foo{Lala{T}} where T,
    Bar{T} where T<:Baz
}

A = Foo{Lala{T}} where T
T = TypeVar(:T, UnionAll(A.var, Baz{A.var}))
B = UnionAll(T, Bar{T})
TT2 = Tuple{typeof(swap!), A, B}

@assert TT1 == TT2

@assert TT1 === TT2
matches1 = Base._methods_by_ftype(TT1, nothing, -1, ccall(:jl_get_world_counter, UInt, ())) # 1 match
matches2 = Base._methods_by_ftype(TT2, nothing, -1, ccall(:jl_get_world_counter, UInt, ())) # 0 matches

@show matches1 matches2

This is probably a bug in subtyping somewhere. Note that TT1 is considered egal and isequal to TT2 but they behave differently

@gbaraldi gbaraldi changed the title Unreachable reached at 0x... Bug in subtyping/method lookup that causes unreacheable reached Mar 1, 2024
@gbaraldi gbaraldi added the types and dispatch Types, subtyping and method dispatch label Mar 1, 2024
@topolarity
Copy link
Member

This is probably a bug in subtyping somewhere

Either that or it's illegal to construct a type with a shared TypeVar like this, in which case the bug is that we ended up doing that somewhere in inference

@N5N3
Copy link
Member

N5N3 commented Mar 2, 2024

Either that or it's illegal to construct a type with a shared TypeVar like this

jl_rename_unionall should take care of this though.
I haven't verified it locally, but this bug seems to be caused by this TODO

julia/src/subtype.c

Lines 3059 to 3075 in 2b95956

// if the var for this unionall (based on identity) already appears somewhere
// in the environment, rename to get a fresh var.
// TODO: might need to look inside types in btemp->lb and btemp->ub
int envsize = 0;
while (btemp != NULL) {
envsize++;
if (envsize > 120) {
vb->limited = 1;
return t;
}
if (btemp->var == u->var || btemp->lb == (jl_value_t*)u->var ||
btemp->ub == (jl_value_t*)u->var) {
u = jl_rename_unionall(u);
break;
}
btemp = btemp->prev;
}

An easy fix is fusing unalias_unionall here.

@topolarity topolarity removed the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Mar 4, 2024
topolarity added a commit that referenced this issue Mar 7, 2024
This is a partial back-port of #50924, where we discovered that the
optimizer would ignore:
  1. must-throw `%XX = SlotNumber(_)` statements
  2. must-throw `goto #bb if not %x` statements


This is mostly harmless, except that in the case of (1) we can
accidentally fall through the statically deleted (`Const()`-wrapped)
code from inference and end up observing a control-flow edge that never
existed.

If the spurious edge is to a catch block, then the edge is invalid
semantically and breaks our SSA conversion.

This one-line change fixes (1) but not (2), which is enough for IR
validity.

Resolves part of #53366.
@topolarity
Copy link
Member

Between #53512 and #53553, I believe this is fully resolved now!

Drvi pushed a commit to RelationalAI/julia that referenced this issue Mar 8, 2024
…53512)

This is a partial back-port of JuliaLang#50924, where we discovered that the
optimizer would ignore:
  1. must-throw `%XX = SlotNumber(_)` statements
  2. must-throw `goto #bb if not %x` statements

This is mostly harmless, except that in the case of (1) we can
accidentally fall through the statically deleted (`Const()`-wrapped)
code from inference and end up observing a control-flow edge that never
existed.

If the spurious edge is to a catch block, then the edge is invalid
semantically and breaks our SSA conversion.

This one-line change fixes (1) but not (2), which is enough for IR
validity.

Resolves part of JuliaLang#53366.

(cherry picked from commit 035d17a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests

6 participants