-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Julep: tfunc
s for all
#36463
Comments
If I understand this correctly, the reference to tfuncs might be misleading. A tfunc tells the compiler the return type of something which it will blindly trust. Here, on the other hand, it seems you'd like to see type assertions to be inserted automatically. Of course, the compiler then can assume the value has the desired type after verifying the assertion, but it won't trust it blindly. (it may remove the assertion if it can be proven to always hold, of course.) That said, being able to impose restrictions on the return type for all methods matching a specific signature does seem useful. It can make APIs more explicit and help inference. |
Good point, I definitely did not mean that the output type should be trusted blindly. I tried to clarify my description. |
For those not steeped in compiler details, it may also help to give a bit more explanation: on julia> cts = code_typed(eof, (IO,));
julia> collect(zip(methods(eof), last.(cts)))
9-element Array{Tuple{Method,DataType},1}:
(eof(pipe::Base64.Base64DecodePipe) in Base64 at /home/tim/src/julia-master/usr/share/julia/stdlib/v1.6/Base64/src/decode.jl:126, Bool)
(eof(io::Base.SecretBuffer) in Base at secretbuffer.jl:155, Bool)
(eof(f::Base.Filesystem.File) in Base.Filesystem at filesystem.jl:200, Bool)
(eof(s::Base.BufferStream) in Base at stream.jl:1354, Bool)
(eof(s::IOStream) in Base at iostream.jl:232, Bool)
(eof(::Base.DevNull) in Base at coreio.jl:22, Bool)
(eof(io::Base.AbstractPipe) in Base at io.jl:414, Bool)
(eof(io::Base.GenericIOBuffer) in Base at iobuffer.jl:335, Bool)
(eof(s::Base.LibuvStream) in Base at stream.jl:104, Bool) (This is not true of earlier Julia versions). However, julia> f(io::IO) = eof(io)
f (generic function with 1 method)
julia> code_typed(f, (IO,))
1-element Array{Any,1}:
CodeInfo(
1 ─ %1 = Main.eof(io)::Any
└── return %1
) => Any because 9 methods of |
There is a long-standing idea (IIRC, @StefanKarpinski was a proponent, and I could swear there was an issue for it but I can't find it (not unusual)) to allow declaring the return type for an entire generic function, e.g. I think the ideal way to handle it is to insert a typeassert in all matching methods when they are defined, and then the type can simply be assumed at call sites. We already have return type declarations on some methods, and we could potentially exploit those in inference. But, I suspect the value of that is surprisingly low. There are several reasons:
|
Another possible option is to allow "soft annotations" which basically would tell the compiler to "union split" on the asserted type, i.e. when it sees x = eof(io)
if x isa Bool
# fast path
else
# slow path
end in other words, this would be a mechanism to tell the compiler what type to expect, although it would still have to generate code that works even if that's wrong, given that branch prediction is generally fast when it's predictable, the performance could nearly as good as actually knowing the type. |
Yep, I proposed using return-type annotations in #35904 (comment). One problem though is that currently return-type annotations call @returntype eof(::IO)::Bool that could insert something in a Dict, if someone else could convert that into actual syntax. |
Would it be possible and/or desirable to change this? Instead of calling Edit: I think Jeff may have already addressed this above when he wrote "But, I suspect the value of that is surprisingly low." |
Current per-method return type declarations call
We actually do both; call |
One such issue may be #1090 (comment)
|
Excellent GitHub archaeology! |
I'm guessing I don't fully understand that comment---naively, I imagine that these can be straighforwardly handled as "AND" requirements. To be concrete, let's take a call,
This seems to set up the conflict that I think you're describing. But wouldn't we just All this is to explain why I'm failing to see the conflict that concerns you, not that I think it's not there. |
Or, as I suggested, make them non-fatal and just generate two branches: promise satisfied / promise violated. |
Jeff knows this, but to make sure it's clear to everyone: the reason that
isn't what we want for this circumstance is that calling
True, but with a cost: backedges to the method you wish you weren't needing. Not saying we can't do it, but it's not without cost. (Invalidations are on my mind, but I'm not religious about them. It's just that heretofore I've never tried to engineer Julia code to avoid them so this is new territory for me and one worth thinking about carefully even if we eventually decide to live with them for certain methods.) EDIT: to be clearer, if !eof(io) & (length(A) > 3) can be rewritten as if &(!(eof(io)), length(A) > 3) Let's say you don't know that if &(!(convert(Bool, eof(io))::Bool), length(A) > 3) then we know we're calling And if you have to have a branch that still might call |
I wonder if it's possible to use dynamic method calls in the slow path to avoid backedges. I.e. generate slow, dynamic code that won't be invalidated. |
That's a great idea. It would pretty much get us everything we want: better performance and lower risk to invalidation while ensuring we aren't introducing breaking changes because we've not enforced such constraints previously. The trickiest bit would be handling the potential combinatorial explosion, sort of like now with Union-splitting though I fear it would be much worse. |
How about two paths:
Basically, adding expected type signatures for functions would be a non-breaking change but it would be highly beneficial to follow the signature. We can start opening issues on any packages that violate expected signatures. There could be a mode where a violation is fatal so that we can catch them when debugging. |
The end-game would be that these kinds of type signatures would be strict in 2.0 and by then we've already gotten the entire package ecosystem into compliance so it's a smooth transition, so I think this would be something to add as a first-class feature, not just some t-function table interface, although could be a first cut. |
You'd have to be able to switch to dynamic at any point in the call chain. Though I guess you could do it with y = f(x)
isa(y, Y) || @goto labelY
z = g(y)
isa(z, Z) || @goto labelZ
...
@label labelY
z = @rtinvoke g(y)
@label labelZ
... |
I wonder if this can be piggy backed with a more comprehensive approach to defining "overload API." I'm thinking something like @overloadable function push!(xs::T, x) where {T}
@impl
return xs::T
end
# lowered to (say)
function push!(xs::T, x) where {T}
__push!__(xs, t)
return xs::T
end to declare overload API which can be used as @impl push!(xs::MyArray, x) = # implementation
# lowered to
Base.__push!__(xs::MyArray, x) = implementation This would avoid issues like #34202 where you have implementations that is not compatible with the expected API. It'd also help people overloading a function that requires complex pre- and post-processing (e.g., It'd be also nice if this can be used to detect dispatching on unsupported argument, to avoid introducing method ambiguity: julia> @impl push!(xs, x::MyType) = ...
ERROR: dispatching on the second argument is not allowed Of course, we need to be able to seal function/method #31222 for this to work reliably.
BTW, I think another interpretation of the syntax abstract type Monoid <: Function end
function *::Monoid end
# lowered to:
struct Mul <: Function end
const * = Mul() |
We already do that when we split out some cases with branches, if the slow case was inferred as
So that code only depends on the methods for Bool and Missing. |
Ah, so is the point of |
|
The way I'm thinking about it, the point of the type hint is to avoid having to do inference: the hint tells the compiler what type to expect and it can "trust but verify". Doesn't help not knowing which precise method to call, but if we knew that then the number of methods present wouldn't matter anyway (since we know which one). |
This is an interesting idea to think about. But I'm starting to think it is too fundamental a change to our type paradigm. Currently, any type we have can be assumed to be correct. This would introduce the idea of a "probably correct" type, and they propagate. If Taken to its logical conclusion, this approach leads to basically assuming a fixed world of type signatures, and compiling two versions of everything: one that uses those assumptions, and another that dynamically dispatches everything, with side exits from the first to the second. The first version will sometimes be faster/better-typed than what we have now, but the second version will be slower. When a rogue method definition comes along, instead of recompiling things your program just gets (potentially a lot) slower. So I'm skeptical of whether that is worthwhile. It seems better to me to try to sharpen invalidation using the inferred type of new methods. I don't yet know exactly how to do that, but it seems likely the payoff would be large. It could also potentially be combined with a type-hinting mechanism like that proposed here --- basically just use the current invalidation system (instead of side exits) if the declaration is violated. The simplest version of that doesn't even need declarations: we could just infer every method on its declared signature, and if all methods of a function have the same return type save it for quick lookup during inference. The only problem with that is that experiments show it's way too expensive, so one possibility is just to have a hint telling us it's worthwhile to do that for a particular function. |
Can we add For the record, the script here really doesn't take very long to run, and it covers all of Base and the stdlibs. |
why was this closed? |
See Jeff's explanation above #36463 (comment) |
TL;DR: one of the harder compiler problems is handling inference on abstract types. This proposes to cheat by adding lookup tables for a few functions. This also analyzes how many poorly-inferred calls we actually have.
There's been a lot of thinking lately about how to handle abstract inference, e.g., #34742, "typocalypse" #36208, and many of the invalidation PRs are nibbling at individual pieces of this problem. Some of the discussion in #35904 centered on whether it would be possible for inference be more assertive about output types, but the cost of running inference on everything is large and recent progress has focused on doing less inference, not more.
I have a growing sense that the majority of inference problems affect a relatively small fraction of functions. One potential solution (which I'm sure must have been considered before) is to do something akin to the
tfunc
mechanism† for generic functions, automatically adding type-assertions as needed for particular functions (EDIT: previous was clarified in response to @martinholters). I'm not exactly sure how it would work (and therein lies the rub), but something like "if you're calling a method inModuleA
with abstract types as wide as the method signature itself, then its permissible to impose custom tfuncs on its callees." This is essentially a strong form of piracy-prevention.To see how this might work, let's consider the example of
eof
. We document that it returns aBool
. In a sense, if you're calling one of "our"eof
implementations, and your object is a subtype ofIO
, then this would seem to be a restriction you should have to live with (meaning, we'd enforce it by automatically adding a type-assert). So if you have a methodmyreader(io::IO, args...)
and you calleof(io)
, then it seems fair to require that thiseof
call return aBool
. Conversely, if you duck-type the method,myreader(io, args...)
, and if inference ends up producing a purely-abstract version without knowing the type ofio
, then the abstract call is oneof(::Any)
, and in that case we would not make any guarantees about the return type. (Even though Base "owns"Any
, we make an exception and pretend it doesn't.) If this is still too restrictive, it could conceivably be loosened by exempting known concrete types from the pseudo-tfunc
constraints. I suspect this isn't a good idea, though, as program behavior would change based on the success or failure of inference, and moreover it just seems weird to allow specific concrete subtypes to escape constraints that we otherwise apply to the abstract type itself.Somewhat in the spirit of #36393, I thought that perhaps the best way to think about this more clearly would be to get a sense for the scale of the problem. So I did an analysis on our current codebase. The script is in https://gist.github.com/timholy/9d2aabaeabb22239b5e7a4e95e35d298, but in rough terms what it does is go through all methods, run inference on them with their declared signatures (i.e., the widest type signature permissible), and then look for poorly inferred calls to a specific subset of functions. It exempts a "hit" if it is immediately followed by a typeassert. For example, one of the functions on the list is
==
, which should returnUnion{Bool,Missing}
. The methodwould be deemed OK because of the type-assert that follows the
==
call (which has an inferred return type ofAny
). But without that type-assert, this would be deemed a violation of the "contract" that==
must satisfy. This Julep is essentially exploring the possibility of adding the missing type-assert systematically to everything that doesn't infer to the expected type and doesn't already have its own equally- or more-restrictive type-assert.While you'll want to see the code for full details, roughly speaking this will count violations of things that I think most people would agree on, e..g., we expect
convert(T, x)::T
,isempty(x)::Bool
, etc.The results of the analysis: out of ~16000 methods, about 1500 of them have one or more "violating" calls to one of the functions I put on the
tfunc
list. That's more than I hoped but fewer than I feared. Here are the functions on the list and a count of the number of calling Methods they have (somewhat like "unrealized backedges" since these depend on the inferred code but not MethodInstances):In terms of paths forward (aside from just closing this issue...), we could:
tfunc
-like mechanism in inference to impose these limits "behind the scenes"The third is kind of the default, but I was curious to know how many annotations would be required to shut most of these down. It turns out, quite a few! In contrast with my impressions from #36393, where I felt that we seem to closing in on eliminating many of the biggest sources of invalidation, from this perspective I think we've barely started: the lists look eerily similar for 1.4 and master.
†If you're unfamiliar with
tfunc
, it's an inference lookup table for Julia's intrinsic & builtin functions. See https://github.com/JuliaLang/julia/blob/master/base/compiler/tfuncs.jlThe text was updated successfully, but these errors were encountered: