-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: have inlining heuristics look for cases where inlining might enable devirtualization #10303
Comments
Took a look at this and was reminded that the jit's early modelling of prospective inlinees is quite crude. In a case like this we have A -> B ->C, and we're trying to decide if inlining B into A will allow us to devirtualize the call from B to C. So we need to know:
To try and reduce the token resolution cost we could instead first see if any of the tracked stack locations is an exactly known ref type; if not we could skip the lookup. If so we could do the lookup and verify we're making a virtual/interface call AND that the arity is within our tracked range AND that the object pointer in the call is one of the exactly known ref types (AND perhaps that the type has interesting devirtualizations, eg we may want to punt on string). |
@AndyAyersMS I have been thinking about that problem too (not at the JIT level). There are specific cases where this can be handled is a different way (probably deserves its own issue), but let me elaborate in case there is some fundamental issue involved that I am not aware. In cases of very hot paths, a call may be done already knowing the type, especially when they are candidates for inlining but fail because Inlining is actually not profitable. That doesn't prevent us to be able to devirtualize it. We could explicitly mark the method to be opt-in "specializable" somehow (attribute, The interesting outcome is that feature would bring some of the 'struct' specialization optimizations into reference land which today is a no-no. I have the feeling that there is an 'underlying' limitation at JIT time, but probably it can be done. |
Pointers to specific examples might be helpful, as I can imagine a number of things that might map onto your suggestions. There is an issue #9682 open to consider generating specialized versions of generics for particular reference types. I haven't looked at it in depth to determine what all would be involved. Most of the work for it would be outside the jit. And as you note any solution must somehow come to grips with the potential for code bloat. From past experience I never found compiler-driven code specialization by cloning methods to be particularly useful -- the costs were high, the benefits modest, and inlining seemed like the superior way to benefit from specialization. But that was before the days of widespread metaprogramming. So perhaps I ought to reconsider. |
@AndyAyersMS That is a case, but I have plenty of those things but at a higher level of abstraction where evicting the interface at the class level would have a deep architectural impact. My latest Generalized Allocators work has to do pretty nasty stuff to be able to do everything struct based, because of the usage pattern a class level #9682 would simplify it a lot. I dont care about code bloating in those cases because I am making an absurd effort to ensure that codepaths are created for each type of allocator in a custom way. |
When a caller argument is an exact type and feeds a virtual or interface call in the callee, we might want to inline more aggressively.
A toy example of this can be found in this BenchmarkDotNet sample. Here
Run
, if inlined, would allow the interface calls to devirtualize.Run
is currently pretty far from being a viable inline candidate:Caller knows that the argument to
Run
is exact:Likely we would not give a ~3.65x boost to inlining benefit based on one argument reaching one call site. But if we also realized the call site was in a loop perhaps the net effect would be enough to justify an inline.
Currently we don't know when observing arg uses whether that use is in a loop or not. But if we were to associate uses with callee IL offsets we could circle back after finding all the branch targets and develop a crude estimator for loop depth, then sum up the weighted observations.
It would also be nice to tabulate a few more opportunities of this kind. The basic observational part change is simple enough to prototype that perhaps just building it is one way to make forward progress.
category:cq
theme:inlining
skill-level:intermediate
cost:medium
The text was updated successfully, but these errors were encountered: