Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set default max_methods to 3 #36208

Merged
merged 3 commits into from
Jun 19, 2020
Merged

set default max_methods to 3 #36208

merged 3 commits into from
Jun 19, 2020

Conversation

JeffBezanson
Copy link
Member

Until we settle on a generally-better algorithm (which may not exist), this is a good way to get some free performance. I'm curious to find out what PkgEval says.

Here are some numbers. Baseline from master (note my environment also includes JuliaPlots/PlotUtils.jl#96):

Base  ─────────── 38.462095 seconds
Stdlibs: ────  54.332004 seconds 58.551%
-rw-r--r-- 1 jeff jeff  12831441 Jun  8 19:59 corecompiler.ji
-rwxr-xr-x 1 jeff jeff 151354656 Jun  8 20:07 sys.so
julia> @time using Plots
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
144.572065 seconds (9.39 M allocations: 534.732 MiB, 0.14% gc time)

julia> @time display(plot(rand(10)))
 11.177675 seconds (11.11 M allocations: 574.409 MiB, 1.57% gc time)

julia> @time using SIMD
[ Info: Precompiling SIMD [fdea26ae-647d-5447-a871-4b548cad5224]
  2.551328 seconds (2.15 M allocations: 110.859 MiB, 0.87% gc time)

julia> @time display(plot(rand(10)))
  9.076342 seconds (20.95 M allocations: 1.047 GiB, 6.05% gc time)

This PR:

Base  ─────────── 36.959470 seconds
Stdlibs: ────  51.727160 seconds 58.3257%
-rw-r--r-- 1 jeff jeff  12750431 Jun  8 21:34 corecompiler.ji
-rwxr-xr-x 1 jeff jeff 145160120 Jun  8 21:41 sys.so
julia> @time using Plots
[ Info: Precompiling Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
125.626309 seconds (7.76 M allocations: 445.493 MiB, 0.10% gc time)

julia> @time display(plot(rand(10)))
 10.892562 seconds (10.28 M allocations: 530.079 MiB, 2.52% gc time)

julia> @time using SIMD
[ Info: Precompiling SIMD [fdea26ae-647d-5447-a871-4b548cad5224]
  1.415139 seconds (285.21 k allocations: 18.519 MiB)

julia> @time display(plot(rand(10)))
  0.409602 seconds (982.29 k allocations: 49.991 MiB, 2.96% gc time)

The last item is clearly volatile; there are probably just a few critical backedges that this plus #36200 manage to cut.

@JeffBezanson JeffBezanson added the compiler:latency Compiler latency label Jun 9, 2020
@JeffBezanson
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Jun 10, 2020

Could this be back-ported to 1.5 (9 to 0.4 sec. would be really nice, also if similar other latency PRs)? Assuming it gets merged:

In nanosoldier (that I'm new to): "GC corruption detected" is a little worrying, but could it or all failures, be false alarms? And if it's not, if this common/expected for such a small change?

For first the first failing package:

Preparing to run in parallel on 2 processors
	 Check ABC rejection algorithm correctly infers parameters
[..]
   Evaluated: isapprox(0.4225548433380043, 0.4; rtol = 0.05)

@JeffBezanson
Copy link
Member Author

No I don't believe this is backportable.

So far I see two packages failing @test_inferred due to this, which I think is a pretty acceptable level (though not for backporting). I'll look into those as well as try to see if any of the 1000 packages that also fail on master are affected.

GC corruption is probably intermittent; my guess is it's just luck that it happened here, but I'll look into it of course in case it's a different bug in julia or that package.

The isapprox failures seem to be numerical issues that almost certainly couldn't be caused by this, but those package authors would have to review.

@JeffBezanson
Copy link
Member Author

Ok, searching the full logs there are new inference tests failing in:
FieldMetaData, PencilFFTs, POMDPModelTools, TypedTables

There are inference tests failing that also fail without this PR in:
TensorDecompositions, TypedPolynomials

@JeffBezanson
Copy link
Member Author

Added some mitigations. Here's the analysis:

  • POMDPModelTools: makes some calls to rand(::AbstractRNG, ...), can be fixed with one type declaration in Random.
  • FieldMetadata: adds one method per annotated field, and the test case has 4 fields. Inference depends on the number of fields so this is not a meaningful regression.
  • PencilFFTs: not clear, similar to problem with the unreliable approximation of Core.Compiler.return_type #35800 (in the caller the type is Any but when you drill down the type is Int).
  • TypedTables: fixed by making tmerge sharper: when merging concrete types from the same family but with different parameters, keep parameters they have in common (e.g. merging various types of Vector gives Vector instead of Array).

@JeffBezanson
Copy link
Member Author

@rfourquet Is the type declaration in Random valid?

@rfourquet
Copy link
Member

@rfourquet Is the type declaration in Random valid?

Yes, fortunately rand(X::Type) is required to return an X object by the current documented API.

@JeffBezanson
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@PallHaraldsson
Copy link
Contributor

This improves, I see e.g. 0.43 but also down to 41.24 times slower (are some of these false?);
["union", "array", "("map", *, Float32, (false, false))"]

@JeffBezanson
Copy link
Member Author

Unsurprisingly, the big regressions are in union-splitting cases that involve 4 cases (Union{T,Nothing} x Union{T,Nothing}). Inference will actually look at more than max_methods methods if max_union_split is greater, but inlining won't. #35891 should fix that, so we should merge that first and retry.

@JeffBezanson
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@JeffBezanson
Copy link
Member Author

I think this is ready to merge after CI. @Keno @vtjnash ok?

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, those results seem good. I think it should be split into 3ish commits though.

Comment on lines 395 to 397
uw = unwrap_unionall(wr)
ui = unwrap_unionall(ti)
uj = unwrap_unionall(tj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uw = unwrap_unionall(wr)
ui = unwrap_unionall(ti)
uj = unwrap_unionall(tj)
uw = unwrap_unionall(wr)::DataType
ui = unwrap_unionall(ti)::DataType
uj = unwrap_unionall(tj)::DataType

merged = merged{uw.parameters[k]}
end
end
widen = rewrap_unionall(merged, wr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is entirely sound, due mainly to losing where bounds, but also wondering if it'd mismatch TypeVars coming from different places. What if we had something like:

struct MyType{A, B, C, D} end
T = TypeVar(:T)
S = TypeVar(:S)
A = UnionAll(T, MyType{T, S, T, Pair{T, S}})
B = UnionAll(T, MyType{S, T, T, Pair{T, S}})

The merge result would appear to be:

MyType{A, B, T, Pair{T, S}} where B where A

Which, I think the only issue it has is that it's missing where T, but just checking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I'll exclude anything with free typevars.

@JeffBezanson JeffBezanson merged commit 4db59da into master Jun 19, 2020
@JeffBezanson JeffBezanson deleted the jb/mm3 branch June 19, 2020 20:58
jipolanco added a commit to jipolanco/PencilFFTs.jl that referenced this pull request Jun 20, 2020
See JuliaLang/julia#36208

For some reason, the type returned by ManyPencilArray is correctly
inferred when `length.(stuff)` is replaced by `map(length, stuff)` in
`size_local()`.

Using `broadcast` instead of `map` also works, which is surprising given
that the dot syntax should be equivalent to broadcast.
@jipolanco
Copy link
Contributor

jipolanco commented Jun 20, 2020

For what it's worth, the PencilFFTs inference issue is fixed by replacing a broadcast over a tuple using the dot syntax, by an explicit broadcast using map or broadcast (see the commit message above).

Maybe this can help the mitigation effort. I can also try to provide a simpler example that reproduces the issue.

@timholy timholy mentioned this pull request Jun 28, 2020
@nalimilan
Copy link
Member

Bisection revealed that this PR is reponsible for the fact that the return type of Base.Broadcast.copyto_nonleaf!([1], Base.Broadcast.Broadcasted(+, ([1,2], [1,1])), [1, 2], 1, 1) is no longer inferred as Array, which was the case in Julia 1.5. Unfortunately this means that even if after #30485 (comment) the return element type of broadcast is inferred for small Union inputs, the container type isn't inferred, which defeats the goal of inferring a concrete type.

Do you think something can be done about this? FWIW, I've noticed that in copyto_nonleaf! the compiler doesn't seem to realize that typeof(val) cannot be of type T in the second branch of the if (since it's handled in the first branch). This adds a type instability which isn't real and which may be the reason why inference bails out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants