-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qute Improvements #42909
Qute Improvements #42909
Conversation
* This method exists for the purpose of saving the cost of creating a substring of the key and | ||
* computing its hashCode. | ||
*/ | ||
private CompletionStage<Object> getCompletedStage(String key, int keyStartIndex, int count) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I see what you mean but I wonder if it would make more sense to precompute the metadata keys, store it in an array (in a field of LoopSectionHelper
) and replace this stuff with a simple if-then-else. I'll try to prepare something so that we can compare the results...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can have something which comparing is as easy as comparing among enums - would be even better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that the alias
(which is part of the prefix) can be different for a {#for}
definition...
|
I hope @mkouba to have the chance to try it tomorrow :( in the template benchmark it was pretty relevant (the 3 opts I mean) - clearly, because it was memory bound as we said in our call |
Hi @mkouba TLDR:
This latter thing could be proved by running async-profiler 3,0 on JMH via the agent and providing The last change along, for the Loop15 benchmark has delivered this improvement
after:
which seems worthy, especially because it removes code to be maintained as well... There are still few things which I don't understand (of the latest flamegraph collected on this PR), still for Loop15: why? is it expected?
This fact is what mess up with the benchmarks result, because getting a costly megamorphic call while accessing a single field, can be as costy as decoding a I'll see what I can do for this second point, but having the previous one fixed would be better IMO |
Regardless, here the improved tests:
Not a huge margin i.e. 5->17% but still better And the previous comment at #42909 (comment) is still relevant and it seems to be related to the |
@mkouba |
Hm, this is a bit suspicious because those benchmarks are very similar. Both contain a simple loop, a few expressions accessing POJO properties, and 2 metadata properties.
That's a great improvement and really worthy IMO!
Ah, yes, it's the
We do cache the value resolver per each part of an output expression. See https://github.com/quarkusio/quarkus/blob/main/independent-projects/qute/core/src/main/java/io/quarkus/qute/EvaluatorImpl.java#L142-L154. But it's not set in stone in the sense that there can be corner cases where the cached resolver does not apply and we have to iterate over all resolvers again... |
I have some suggestion on how to do this "right" (according to how JIT work) if you wish - because:
The suggestions here are:
In order to achieve it, we need:
I will soon prepare a branch with this proposal and show the benefit, wdyt? |
@mkouba
what benchmarks I could use to verify these, among yours?