Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TupleMap type #15779

Merged
merged 23 commits into from
Apr 16, 2016
Merged

TupleMap type #15779

merged 23 commits into from
Apr 16, 2016

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Apr 6, 2016

The purpose of this change set is to separate the method caching logic used by generic functions into a independent utility. The motivation for this changes was #265 and (more recently, but related) #15555. Since both require the compiler to be able to invalidate various method caches, it seemed most practical to me to have this functionality available independent of generic function dispatch (so that dispatch is a consumer of this logic and information instead of the provider).

For reviewing, I recommend starting from the definition of the new TypeMap type (as in, a map from a type to a value, allowing use of either == or <:) in julia.h (https://github.com/JuliaLang/julia/compare/jn/methcache?expand=1#diff-67021262df1246d0b5756f73b92eb69fR167)

this may have unintentionally provided a easy fix for #14919 (by deleting a few lines of code, of course :), but that should be a future PR.

@vtjnash vtjnash force-pushed the jn/methcache branch 6 times, most recently from 79b0e2f to ee1a1ea Compare April 11, 2016 22:43
# toplevel lambda - infer directly
frame = InferenceState(linfo, linfo.specTypes, svec(), true)
typeinf_loop(frame)
@assert frame.inferred # TODO: deal with the better
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deal with the what?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this. as in the assert that used to be elsewhere, but now is in a place that it can be dealt with

@vtjnash
Copy link
Member Author

vtjnash commented Apr 12, 2016

out of curiosity, i tested the change for #14919 (it's only a couple lines to test, although it would take more to make the full change), but it's a bit slower and it runs into #15838 before it can finish compiling. it uncovered a couple of bugs on master though, so i guess that's something?

@Keno
Copy link
Member

Keno commented Apr 12, 2016

There are a number of unrelated commits in here. Can we please get those as a separate pull request? This is enough of a monster to review as is.

@vtjnash
Copy link
Member Author

vtjnash commented Apr 12, 2016

only the first two codegen changes are unrelated (they are passing CI, so maybe i should just push those to master now?)

the last two are pre-existing, but conflict significantly with the other changes, so I can't pull them out (although I could commit the equivalent fix to master and then revert them in this PR)

@JeffBezanson
Copy link
Member

Yes, pulling out even 2 commits to master is helpful.

@Keno
Copy link
Member

Keno commented Apr 12, 2016

e6adb2b also looks separate

@vtjnash
Copy link
Member Author

vtjnash commented Apr 12, 2016

it could be squashed into ac48b6b, but it's a PR fix, not a bugfix

@vtjnash
Copy link
Member Author

vtjnash commented Apr 12, 2016

ok, updated to remove those three commits

// the most specific for the argument types.
union jl_typemap_t invokes;

int32_t called; // bit flags: whether each of the first 8 arguments is called
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still 8 arguments or is it 32 now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still 8, but now stored in an Int32 type (because that's what's available to jltypes.c during initialization)

@vtjnash vtjnash force-pushed the jn/methcache branch 3 times, most recently from 0b2ddef to c1fa1eb Compare April 15, 2016 06:50
This reverts commit b991db1, reversing
changes made to 9492c53.
this abstracts out the method lookup behavior
of the MethodTable cache
with the goal of making it more re-usable for the tfunc lookup (speed)
and helping address #265 (correctness)
jl_array_t *arg1; // Array{union jl_typemap_t}
jl_array_t *targ; // Array{union jl_typemap_t}
jl_typemap_entry_t *linear; // union jl_typemap_t (but no more levels)
jl_value_t *key; // [nullable]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does nullable mean here? C NULL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes (taking inspiration from Glib, which uses this notation in its documentation)

@JeffBezanson
Copy link
Member

I think it would help a lot to move the code for typemap to a separate file, since a major point of this change is to factor out the data structure from its clients.

@JeffBezanson
Copy link
Member

I get the sense that using the same structure for method definitions and method/tfunc caches is a big source of complexity here. Passing boolean flags to the table to tell it how to operate is kind of a warning sign to me.

An API cleanup might be in order. For example jl_typemap_insert taking 10 arguments is quite scary. Maybe this is an internal function, but then that should be clarified. Overall I'm not sure what the data structure's API is.

There are some un-renamed instances of TupleMap remaining. There is also some screwy indentation.

jl_lambda_info_t *method, jl_svec_t *tvars,
int8_t isstaged);
int jl_is_type(jl_value_t *v);
void jl_method_table_insert(jl_methtable_t *mt, jl_tupletype_t *type, jl_tupletype_t *simpletype,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the simpletype argument to this mean? It also seems to be more of an implementation detail that doesn't really belong in this entry point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it fixes the intersections of two bugs on master (#11840 which requires a simple guard entry, and #11355 which prevents the creation of guard entries)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of those bugs were closed without changing this API. Intuitively, to add a method you need a method table, a method, and possibly a separate type argument. Therefore that's what the arguments should be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, they were closed but not fixed. i'll add my test for the remaining case.

@nalimilan
Copy link
Member

Is this PR the cause of #15893?

@JeffBezanson
Copy link
Member

Yes, it certainly seems to be.

local tfunc_idx = -1
local frame
local frame = nothing
offs = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compatibility with the existing API inherited from the previous implementation of MethodTable->cache_targs / cache_args1. I would like to drop offs entirely (I have a commit to do it because it enables another optimization, but it only seemed to save ~0.5% on yichao's dispatch perf test), but in the existing codebase, it is used to decide whether the cache should split on the first or second argument type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but since inference only ever passes 0 it doesn't seem to need to be part of jl_tfunc_cache_insert and jl_tfunc_cache_lookup.

@JeffBezanson
Copy link
Member

I do see some nice speedups on some of the tests, such as subarray and linalg/triangular. That's really good, but now I'm getting greedy and want to understand better where the speedup comes from. There are a few candidates:

  • Faster tfunc cache lookup
  • Faster method dispatch
  • Faster jl_matching_methods
  • Removing stagedcache

If we can separate these effects I see potential either for further speedups or simplifying the code or both.

@tkelman
Copy link
Contributor

tkelman commented Apr 16, 2016

CI instability is going to waste more time than faster tests, so if that can't be fixed soon then I think it would be appropriate to revert this for now until it can be done in a stable and properly reviewed way.

And while we're renaming things, what about gf.c which is largely getting rewritten here? Always thought that was a cryptic name, could it be expanded a bit?

@vtjnash
Copy link
Member Author

vtjnash commented Apr 17, 2016

I get the sense that using the same structure for method definitions and method/tfunc caches is a big source of complexity here. Passing boolean flags to the table to tell it how to operate is kind of a warning sign to me.

the flags in jl_typemap_info->unsorted are there to allow configuration of the cost model, but they should be understood to be an optimization, not a necessity (most of the complexity in this code is about manually unrolling lots of special cases. recreating the algorithm here without optimizations would be much shorter – and slower).

for the other flags, it's tricky since the structure of dispatch is such that it is one data-structure, but there are a number of related questions that we are interested in asking them (intersection, more-specific, typeseq, subtype, and iteration). the bool flags attempt to balance between providing too generic entry points (like jl_typemap_visitor) that can't do tree pruning optimizations, and providing separate entry points for each query type (even if the intersection pruning would be the same)

@vtjnash
Copy link
Member Author

vtjnash commented Apr 17, 2016

jl_typemap_insert

this is the entry point for creating a typemap entry and inserting it into the map. so it takes all of the arguments for both. we could separate out the creation of the typemap entry (cutting the argument list in half), although i don't see much use right now for a unconnected entry.

@vtjnash
Copy link
Member Author

vtjnash commented Apr 17, 2016

I think it would help a lot to move the code for typemap to a separate file, since a major point of this change is to factor out the data structure from its clients.

agreed. this would involve splitting the file at around line ~1150 and making the the relevant header entries exist. the only part that's made me hesitate is wondering whether gcc is doing any sort of IPO that would be inhibited by the split.

@JeffBezanson
Copy link
Member

whether gcc is doing any sort of IPO that would be inhibited by the split

That's not the kind of performance I want :P

@vtjnash
Copy link
Member Author

vtjnash commented Apr 17, 2016

If we can separate these effects I see potential either for further speedups or simplifying the code or both.

that depends on which benchmark you want to optimize :P

faster ml_matches seemed to be highly effective at driving compile-time tests (and I only did the most basic optimization on it)

faster jl_method_table_insert (esp. for batch operations or entire subtrees) should help cut down the time for reloading precompiled modules

faster dispatch? i think we could play more games here to unroll common cases and copy more data inline to avoid a pointer indirection

more compact cache levels? there's a limiting case where the user generates something foolish like f(::NTuple{100, Int}, ::Val{i}) for i = 1:32 which will 100 single-entry cache levels in order to build the optimized cache entry table in the last place. it would be more efficient to handle this case by generating one cache level that went from offset 1 to 100, and then a second cache level that contained all of the Val entries

@andreasnoack
Copy link
Member

f4f1ee5 is causing weird behavior for method definitions in Calculus.jl (v0.1.14). Before this commit differentiate has 85 methods but after the commit it drops to 14-17 methods depending on which of the commits before a2e4ac4 I'm on.

@tkelman
Copy link
Contributor

tkelman commented Apr 17, 2016

Ah yeah PkgEval would have also been a good idea on a change this big that's supposed to not be user visible.

@vtjnash
Copy link
Member Author

vtjnash commented Apr 17, 2016

@andreasnoack fixed by #15904

@tkelman
Copy link
Contributor

tkelman commented Apr 17, 2016

@vtjnash still need to fix gf.c not being valid C++

c:/projects/julia/src/gf.c(359) : error C2440: 'type cast' : cannot convert from 'jl_value_t *' to 'jl_typemap_t' 
        No constructor could take the source type, or constructor overload resolution was ambiguous 
c:/projects/julia/src/gf.c(413) : error C2440: 'type cast' : cannot convert from 'jl_value_t *' to 'jl_typemap_t' 
        No constructor could take the source type, or constructor overload resolution was ambiguous

@Tetralux
Copy link
Contributor

I suspect this PR---specifically commit 35e64c2 according to Git Blame---has caused #16042, at REPLCompletions:324.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants