Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split SourceInfo out of LambdaInfo #18413

Merged
merged 9 commits into from
Sep 14, 2016
Merged

split SourceInfo out of LambdaInfo #18413

merged 9 commits into from
Sep 14, 2016

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Sep 8, 2016

A number of recent changes to the system subtly broke the definition of LambdaInfo (most recently culminating in #18191), this aims to rectify that shift in identity by revising the way this data is passed around the system. And I expect this to be useful for fix #265, since the current issue I'm running into there is the possibility the system will try to duplicate an existing LambdaInfo (for example to re-infer it:

julia/base/inference.jl

Lines 1601 to 1621 in 424b3c5

if inferred && code.inferred && linfo !== code
# This case occurs when the IR for a function has been deleted.
# `code` will be a newly-created LambdaInfo, and we need to copy its
# contents to the existing one to copy the info to the method cache.
linfo.inInference = true
linfo.code = code.code
linfo.slotnames = code.slotnames
linfo.slottypes = code.slottypes
linfo.slotflags = code.slotflags
linfo.ssavaluetypes = code.ssavaluetypes
linfo.pure = code.pure
linfo.inlineable = code.inlineable
linfo.propagate_inbounds = code.propagate_inbounds
ccall(:jl_set_lambda_rettype, Void, (Any, Any), linfo, code.rettype)
if code.jlcall_api == 2
linfo.constval = code.constval
linfo.jlcall_api = 2
end
linfo.inferred = true
linfo.inInference = false
end
) which made it difficult to track them and somewhat impossible to accurately invalidate them.

Major changes:

  • several functions no longer have "update their cache" as a goal. instead they return a value (they do still transparently maintain a cache, but that's now an implementation detail). most significantly, these include: jl_type_infer / typeinf_ext, jl_compile_for_dispatch / jl_compile_linfo, and jl_generate_fptr
  • type-inference now returns a SourceInfo tree (which may or may not be cached in the LambdaInfo that was passed in), allowing better (earlier) code deletion in some cases and easier thread-safety (less race-y to look at the code field)
  • the code field of LambdaInfo is gone, replaced by inferred and which is usually null. This fixes two of the lock priority inversion bugs, as now no Julia code needs to get run to create a LambdaInfo. And it should make it easier to delete the code (as that's a return to the default state rather than the presence of a third rare state), so that's done more aggressively now.
  • jlcall_api has been incremented by 1 to make it thread-safe to read without needing a lock or atomics ops in the fast-path of jl_call_method_internal (except jlcall_api = 2, which remains unchanged)
  • printing of LambdaInfo (e.g. from expand, code_lowered, code_typed, etc.) is corrected / improved / fixed, since it is no longer ambiguous whether to print the source or the signature

@StefanKarpinski StefanKarpinski added the compiler:codegen Generation of LLVM IR and native code label Sep 8, 2016
@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 8, 2016
jl_methtable_t *mt;
jl_sym_t *name;
jl_method_t *m = NULL;
JL_GC_PUSH2(&f, &m);
jl_tupletype_t *argtype = jl_apply_tuple_type(atypes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change to make the tuple type here instead of in the front end? The method signature needs to be a type in any case, and in the future could be other kinds of types --- soon to be UnionAll types for methods with static parameters, and possibly later any type that's a subtype of Tuple.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument names and isva are directly mapped from the atypes svec. The translation to a tuple type in some cases is not information preserving (in particular with Vararg), so it is not an equivalent (or at least reversible) representation of the source information. In those cases this function becomes unable to form the map from the input argument list to the function argument number that stores contains value. I added a test for this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug acknowledged (and I'm very glad to get rid of the awful jl_is_rest_arg), but I don't think this is necessary to fix it. We could instead call jl_is_va_tuple on the tuple type. The natural interface for adding a method is to specify a type and a function body. In fact, any code that tries to manually take apart types will increasingly be wrong or buggy. For example Union{Tuple{A,B}, Tuple{C,D}} could be the type of a method, and you'll need to ask the type system to do something non-trivial to determine whether it is varargs, what the type of the nth argument is, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isvararg != isvatuple. one such counter example is NTuple

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's true, but long term, this needs to be a type.

Also: do we want f(x::T) to be varargs based on whether T evaluates to Vararg{...}? That seems like a misfeature to me. And since as you point out Tuple{Int,Int} and Tuple{Vararg{Int,2}} are equal types, the presence of Vararg is not a good basis for identifying varargs functions. It seems like this needs to be syntax-based and passed as a separate flag, since it's not really a property of the signature but of how argument names are bound to argument values. I suppose we can deal with that later though.

@JeffBezanson
Copy link
Member

We should think about the names of the types a bit. LambdaInfo was always a bit jargony, but kind of made sense since it basically corresponded to a single function (as in non-multimethod languages). Now it is less clear how LambdaInfo is a "lambda" but SourceInfo is not.

SourceInfo also doesn't contain source code, but IR. Maybe it should be called CodeInfo? Or something containing IR?

It's also a bit hard to understand what a LambdaInfo is now. I would say its primary job is to associate a tuple type with a callable function pointer. Maybe something like CompiledFunction? In any case this would be a good opportunity to eliminate the use of "lambda" here.

@vtjnash
Copy link
Member Author

vtjnash commented Sep 9, 2016

CodeInfo was my other top name, so I can change to that. The only reason I went with SourceInfo was that source and linfo are the same number of characters, so the replacement was easier :P. This could even be the LambdaInfo type, although I agree "lambda" is a bit jargon-y and doesn't necessarily provide any additional clarity over using common words like CodeInfo.

Perhaps MethodSpecialization? It's a bit long, but I don't think this name will appear too frequently in user code. It also could be considered a leaf type realization of an Arrow, but I haven't been able to form that into a useable name.

@JeffBezanson
Copy link
Member

Specialization or MethodSpecialization is a pretty good description, but the code isn't necessarily specialized. I almost want to call it Method and rename the existing Method to MethodDef. Callable might make sense, but that sounds like an abstract type. CallTarget? FunctionPointer?

@vtjnash
Copy link
Member Author

vtjnash commented Sep 9, 2016

Semantically, to some extent, "unspecialized" is a specialization (just not a leaf one). But point taken. I'm reluctant use use "Function" in the name, since I think that nomenclature is already sufficiently taken (and brings back memories of the old generic function / anonymous function confusion for new users). I agree calling this a Method isn't bad, although name shifts are a bit annoying to handle for deprecations. Another similar option might be Thunk, although again, that's perhaps too jargon-y. Another word to toss around might be "Signature".

@JeffBezanson
Copy link
Member

How about CompiledMethod? It's not necessarily compiled, but conveys that it's a method that's had some processing done to it to put it into a callable form.

@vtjnash
Copy link
Member Author

vtjnash commented Sep 9, 2016

I'm not sure that's really its key attribute. I think really the only key property is that it contains a specTypes key, since it basically exists just to merge a bunch of independent caches that could instead have been in independent TypeMaps (or computed directly from sig). How about MethodSignature?

@JeffBezanson
Copy link
Member

I do think it's significant that it contains actual function pointers. It's very nearly a traditional symbol table entry, mapping a name (or type in our case) to an address.

@JeffBezanson
Copy link
Member

MethodInstance? Idea being that it's an instantiation of a method.

@vtjnash
Copy link
Member Author

vtjnash commented Sep 9, 2016

That sounds good. I think that also helps indicate that this is a singleton, which is also good.

@vtjnash vtjnash force-pushed the jn/sourceinfo branch 4 times, most recently from 9d0034a to 7fec8c4 Compare September 10, 2016 03:23
vtjnash added a commit that referenced this pull request Sep 12, 2016
…lizer

fix #18449
(the real, non-buggy fix is in #18413 for v0.6-dev master)
vtjnash added a commit that referenced this pull request Sep 12, 2016
…lizer

fix #18449
(the real, non-buggy fix is in #18413 for v0.6-dev master)
@vtjnash
Copy link
Member Author

vtjnash commented Sep 12, 2016

Will merge once the CI build finishes (so I can resume work on #265 PR).

@tkelman
Copy link
Contributor

tkelman commented Sep 12, 2016

does this need more deprecations? was it properly reviewed considering how much code it changes?

s = ccall(:jl_copy_code_info, Ref{CodeInfo}, (Any,), s)
s.code = ccall(:jl_uncompress_ast, Array{Any,1}, (Any, Any), m, s.code)
end
return s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the maybe-aliasing here potentially an issue?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the callee doesn't own this return value. we would need to make a deepcopy if that's your concern, but this is not new.

@tkelman
Copy link
Contributor

tkelman commented Sep 13, 2016

and if we're serious about performance tracking, any significant changes like this should run through @nanosoldier runbenchmarks(ALL, vs = ":master") first

jl_finalize_function(F, collector ? collector : m.get());
static void jl_merge_recursive(Module *m, Module *collector)
{
// probably not many unresolved declarations, but be sure iterate over their Names,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be sure to iterate

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@vtjnash
Copy link
Member Author

vtjnash commented Sep 13, 2016

cool, @nanosoldier runbenchmarks(ALL, vs = ":master") found an (existing) bug in a previously unused value (probably a rebase error in 1dae0e0?) that ended up making the JSON test a bit slower.

due to other recent changes, LambdaInfo is now
much more expensive to create and difficult to destroy

it is also hard to keep track of where modification is allowed

the SourceInfo type represents an atomic chunk of information
that needs to be immutable after construction

this sets up some of the work needed for #265,
and decreases the size of the sysimg :)
this lets us avoid needing atomic operations to order the reads of fptr and jlcall_api
since both will be null-initialized, set-once while the codegen lock is held
allows removing inCompile, which wasn't necessarily entirely accurate anyways
for complex recursive calls from type-inference, it theoretically could have
compiled part of a cycle, but not the whole cycle, and failed when it tried to
finalize it, without realizing part of the cycle was still being compiled
…od idea

this is required by the threadcall function
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@vtjnash vtjnash merged commit 0baa867 into master Sep 14, 2016
@vtjnash vtjnash deleted the jn/sourceinfo branch September 14, 2016 14:07

``ftpr`` - The generic jlcall entry point

``jlcall_api`` - The ABI to use when calling ``fptr``. Some significant ones include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api vs abi still confusing terminology

maleadt added a commit that referenced this pull request Oct 13, 2016
Probably introduced after #18413. Re-uses the unused 'thunk' root.
maleadt added a commit that referenced this pull request Oct 13, 2016
Probably introduced after #18413. Re-uses the unused 'thunk' root.
@maleadt maleadt mentioned this pull request Oct 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

automatic recompilation of dependent functions
7 participants