-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subpar Caffeine cache performance (vs Akka HTTP, Play2, ASP.NET Core) #28795
Comments
/cc @gwenneg |
I’m not familiar enough, but the first thing I do to investigate the performance of a long running process is to profile. That typically offers enough insight to have a discussion and proof of concept fix, though a proper fix might be different based on the owner’s knowledge of their project. jcmd JFR.start duration=?m filename=?.jfr settings=profile.jfc Then open in JMC/JProfiler/Yourkit for the hotspot methods. This requires the jvm, not graal native. |
As @graemerocher mentioned in his comment on the Micronaut issue, you seem to be using the general annotation based method result cache mechanism and expecting to get HTTP cache headers set as well - this is not how things work.
Furthermore, as @ben-manes says, profiling the application is essential for us to be able to pinpoint any issues and find ways to address them. If you provide us with a sample application that exhibits the behavior you mention, we can certainly do the profiling. |
Another thing I should add, is that your throughput / latency comparison is likely not apples to apples, because for the Quarkus use case the request is being offloaded and handled by a worker thread. If you want an apples to apples comparison of Quarkus with other stacks, you should be using RESTEasy Reactive and either annotated the class / method with |
I wasn't aware of that, appreciate. Didn't see it in the docs.
I am.
Of course, I'll push the repo to GitHub. Would appreciate learning how you do the profiling too and what exactly to look at primarily.
Oh, right, I missed the Mutiny wrapper with Quarkus. Thanks for pointing it out. But the Now I've made it more explicit with the four evaluation models as described in the article that you're referring to. The blocking variant proves best in performance here; and it properly locks the cache thread. Adding
Cache stampede on fibers for 10 concurrent connections (crashes with 100)
|
Thanks, I'll try to look into it. But first someone should really try to replicate it so we can make sure it's not a config issue or the runtime environment being peculiarly unfavorable to it. I'll push my repo and link it here. |
@demming interesting. It would be wonderful if you are able to put together a sample we can try. |
@geoand Hey, just pushed it, https://github.com/demming/cache-stampede-quarkus |
Great, thanks |
I started looking at the issue, but I just want to clarify that case |
I took a quick look at this and I believe there is some performance being left on the table in Mutiny. Notice the very long Notice the Uni related entries taking up a large part of the flamegraph. @franz1981 I believe that when you have some time, this will be of a lot of interest to you 😉 |
@geoand it's interesting indeed, although a separate issue, can you try using https://github.com/RedHatPerf/type-pollution-agent to check if it can point to some of the And, in addition, I would run an allocation profiling session because of quarkus/extensions/cache/runtime/src/main/java/io/quarkus/cache/runtime/CacheInterceptor.java Line 70 in d921f97
arraylist default capacity can be just too much for the actual number of elements needed there? NOTE: in the same quarkus/extensions/cache/runtime/src/main/java/io/quarkus/cache/runtime/CacheInterceptor.java Lines 73 to 77 in d921f97
That could be a usual (and sad) case of https://bugs.openjdk.org/browse/JDK-8295496 |
I already did, nothing came up :). But I'll investigate futher. |
Scratch the comment above... I ran the agent for something else I was looking at... Not for this use case. I'll do that now. |
Here are the results:
|
So our caching layers does suffer from this problem... |
And not only @geoand : I see that's using a version of Netty that doesn't contain my fix, that's "tricking" the JIT (I know, it's unfair) and partially fixing it |
Yeah, there are multiple things that are concerning, but for this specific use case I'm just focusing on the cache part. |
@geoand :
Oh, I see, then I must have misunderstood what you were suggesting back then:
Oh, right, I'm not well-versed in Mutiny. Are |
They should not be waiting so long, so we likely have some hidden performance issue that is not related to Caffeine |
Would it make sense to try to initialize it to the initial size of the cache as per |
It's unrelated |
Alright, I see, thank you |
@geoand FYI, looking at your report the top consumer is |
Thanks @Sanne. I am aware of that fix |
Cool - I see Roberto released it, it's included in #28886 - looking forward to see that merged. |
Excellent 👌. I have similar fix that might help with the caching code. |
The type pollution fix for cache does indeed give a small performance boost, but nothing earth shattering. |
Just curious @geoand : you tested a version including smallrye/smallrye-common#190 as well? I see it's not still using the latest Netty version as well: I suggest to run something that fix the above mentioned by Sanne and forces the latest Netty version (that's including the fixes of scalability for the channel pipeline, including IdleStateHandler) |
Nope, just my fix |
@franz1981 we'll need to test all the fixes together to see what kind of performance boost they all provide |
I have sent a change to vertx addressing something similar too: eclipse-vertx/vert.x#4520 And the change I am proposing there will be real ie JIT won't optimize it :( |
I haven't forgotten this :) In the meantime @demming could run the reproducer over the last version of quarkus with
that should improve things a bit |
I think we can close this as it has pretty much turned into a general performance improvement thread that we already do anyway lead by @franz1981 :) |
Describe the bug
I've been looking into cache stampede issues across a few microservices that I'm running. I've observed quite meager Caffeine performance on cached endpoints running on Quarkus, and Spring WebFlux alike. As a baseline I'm taking my Akka HTTP cached implementation of the same simple microservice that sanitizes upstream HTML data. My Akka HTTP implementation is based on Scala's sttp/tapir, otherwise pretty bare akka-http underneath. Play2 Framework is close to it but still slower at cached performance and appears to suffer from cache stampede. While Quarkus is sophisticated enough to mitigate by thread locking cache access, the performance I've observed is just a fraction of what's feasible---about one tenth, i.e., an order of magnitude slower, see below.
For load and soak testing among other tools I've used
bombardier
, for a simple use case as followsbombardier -c 100 -d 10s -k -l "http://localhost:8090/website?address=http://localhost:8081"
which initiates 100 concurrent connections, runs them for
10s
(or any other duration), and implicitly reuses open connections. The upstream endpoint can be anything you like to ingest HTML data from.Together with Spring WebFlux, my Quarkus microservices are slowest at cached throughput among the services I've been running.
In addition, no
Cache-Control
HTTP headers are being set (only) by Quarkus. I believe they should be available for being set implicitly from the annotation rather than at compile time from the config---like all other frameworks do.The figures below are relative to the execution environment. But the relations among the individual framework figures are pretty much constant, that is, Quarkus and Spring WebFlux are 10 times slower at response caching than Akka HTTP, which is twice more efficient than the recent ASP.NET Core and about 50% faster than Play2.
Expected behavior
Since Akka HTTP too uses Caffeine under the hood, I'd expect it to have similar performance at response caching.
Akka HTTP underneath sttp/tapir over 10s:
With proper Cache-Control headers set.
Actual behavior
Quarkus native on GraalVM or JVM alike, production build. Same machine, same workload, randomized tests, 20 runs each in total.
This is the best as it gets over 10s (dev mode about 0.5 GB/s best), with about 0.6 GB/s on average under the given environment.
Without any Cache-Control HTTP headers.
I'm looking into a way to improve it. Not sure how, though. On a side note, I'd like to have a way to choose or plug in other means of mitigation.
How to Reproduce?
Here's a very simple resource, which instead of invoking a remote HTTP call could instead block the fiber for a while. The caching of the result of that endpoint is subject matter here.
With the following cache config in
application.yaml
:Output of
uname -a
orver
macOS 16.2, arm64
Output of
java -version
OpenJDK 19.0.1
GraalVM version (if different from Java)
graalvm-ce-java17-22.2.0
Quarkus version or git rev
2.13.1.Final
Build tool (ie. output of
mvnw --version
orgradlew --version
)Gradle 7.5.1
Additional information
The issues of mine that I linked in the first paragraph contain additional links and data points. I hope it's just me missing some obvious configuration in Quarkus that has led to this subpar performance at response caching.
The text was updated successfully, but these errors were encountered: