-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RSS usage increase between Quarkus 3.11 and 3.13 in MP application in native mode #42506
Comments
I did some investigation on top of Quarkus main, my initial tips (OpenTelementry bump, SmallRye Config bump) didn't prove to be the case, after that I tries several custom build but I wasn't able to pin point a concrete commit causing the issues. It's more like accumulation of multiple changes. Here is the dump of my notes (the goal is to get from 65MB to 59MB), I didn't add it to the description intentionally, it's already quite heavy:
RSS is decreasing slowly, so this leads me to the conclusion that it's more like accumulation of multiple changes that lead into such increased RSS I see between 3.11 and 3.13. |
Details about increased memory usage in quarkusio/quarkus#42506
This comment was marked as spam.
This comment was marked as spam.
cc @franz1981 |
This same trend is confirmed on the CI with stable Linux machines as well? In case it is, and the native image cmd doesn't reveal anything relevant (@zakkak could help there I think), if the additional weight come from progressively increasing amount of features, it's ok.
|
That's perf lab only, we do not have bare metal machines in our lab.
As mentioned in the description, JVM mode runs didn't reveal noticeable difference. It's native mode where the diff manifests. |
@rsvoboda I mean; the description here made me think it was a macOS run, but the difference is still the same while moving to Linux, right? |
Yes, @mjurc tried the reproducer commands on his Fedora machine and he saw similar trend, the absolute numbers were of course a bit different. Michal confirmed that RSS was higher for 3.13 comparing to 3.11. |
@franz1981 don't know if JFR alloc events are in place (@roberttoyonaga ?), I worked on the old object event but that's more for tracking leaks and this doesn't smell like a leak right now. @rsvoboda @mjurc if you run the test manually, can you see if the rss increase is due to startup and/or first request? again if you run the test manually you should get heap dumps after startup and after first request with the different versions and see if we can spot some differences there One more thing to maybe try here is @roberttoyonaga NMT work in case the increases are not spotted in heap contents. |
@franz1981 Ah yes we now have Finally, @rsvoboda run through the steps in https://quarkus.io/blog/native-startup-rss-troubleshooting and see if you can get a better understanding of the cause. As we highlighted there, the way native handles memory can lead to bigger increases caused by small changes compared to jvm mode. |
Yes Native Image JFR has both the yes, like Galder mentioned, you could also try "native memory tracking" to see where native allocation are happening off the "Java" heap. There's a blog post here: https://developers.redhat.com/articles/2024/05/21/native-memory-tracking-graalvm-native-image# |
Thanks for the pointers. I will try to look more into it, but my week is quite packed before I go on pto next week. |
@roberttoyonaga @rsvoboda We should mention that not all upstream features in master or 24.1 aren't yet in a released version of mandrel. So feature availability may vary. AFAIK, main mandrel version is 23.1 (mandrel for JDK 21) in quarkus. |
Oh yes, that's true. |
FWIW, CE (or graalvm community builds) early access builds are here which we should be testing: |
I have tried reproducing this on my Fedora setup with both GraalVM CE 21.0.2 (from SDK man) and the default builder image (Mandrel 23.1.4.0) and the results are:
Given that the image size increased by ~1MB I don't find the above increase in RSS usage surprising, but the reported (by @rsvoboda) RSS increase is quite larger than that :/ @mjurc when reproducing this did you notice similar increase to what @rsvoboda reports or a smaller one like the one in my measurements? @rsvoboda what are the thresholds set to in the CI? Can you also share the actual numbers from these runs? The default in the repository seems to be quite high at 90000 kB. |
Thanks @zakkak for checking; iirc that benchmark is NOT having any limit on memory or xms/xmx nor of number of cores; both things which could affect RSS in the way the allocations happen vs GC cycles (vs malloc arenas could be used) . |
@franz1981, how is the config-quickstart RSS? |
@radcortez still on holiday and with no VPN access, @johnaohara can answer you...but last time I checked there was a super small increase |
@franz1981 If I get it right it sets xmx to 96m, see https://github.com/quarkus-qe/quarkus-startstop/blob/643dadc30f810333f8a0c8ef846e7aaac53f9a7e/testsuite/src/it/java/io/quarkus/ts/startstop/utils/MvnCmds.java#L24, regarding the rest you are right it doesn't control them. |
Uh, I have missed it (my memory is failing me, sorry :"/ I should have checked) - thanks to have verified it |
I did some experiments with jfr and nmt recording with Quarkus MP app. As https://developers.redhat.com/articles/2024/05/21/native-memory-tracking-graalvm-native-image suggests, I used EA build of GraalVM. Here is recording-nmt-3.11-3.13.zip with Native binaries were produced from https://github.com/quarkus-qe/quarkus-startstop repo using this procedure:
JFR was produced using:
I noticed in that the flame graph under Event Browser - Java Application has some additional bars for 3.13 comparing to 3.11 At this stage I think I'm at my knowledge and investigation limits, any insight would be welcome. I shared my reproducer steps so anybody can play with this. |
FWIW, I'm planning to have a closer look at the Jackson thing but got sidetracked by some reverting fun :). I'll have a look soon hopefully. |
@radcortez btw, I think it would be worth checking that all the interceptors are correctly protected. |
Thanks, but let's figure out the exact requirements before blindly patching the code. |
I have conducted some (naive and basic) profiling with asyncprofiler with I saw nothing special in the CPU profile. The alloc profile is a bit more interesting even if I'm not entirely sure what to think of it. The initialization of the Here is the alloc profile for And here is the one I got with plain |
What's the Jackson thing in question? |
@gsmet both myself and @radcortez are using the methodology I have shown in #35406 (comment) The first bullet point, which disable TLABs, but adding:
I would use async profiler to stop the profiling collection before the shutdown happen so you won't include the allocations while stopping. @radcortez can share its methodology and command line. |
Yes, I confirm that I use what @franz1981 taught me :) For reference, these are the commands that I use:
|
@radcortez I tried with your commands and I don't see much difference for Jackson - but it was with the full But I found out something extremely odd: in the 3.13 Whereas for I'm wondering if somehow, OpenTelemetry could be disabled in 3.13 and somehow enabled with the new stuff introduced in The configuration is identical and there's some OpenTelemetry stuff in the config:
but I haven't started a collector, I just started the app. In any case, it looks like the behavior is a bit different. |
Strange... I've also executed with current |
Yeah that's what I have in |
I did try with |
@gsmet is looking into quarkus rest + Jackson area I've prepared https://github.com/rsvoboda/quarkus-startstop/tree/reactive-jackson I looked into summary from There is increase in quarkus-runner.jar size + netty stuff, imho this can have some implications to ther runtime. Most interesting one for me is the increase in size of quarkus-runner.jar which went 987.96kB => 1.14MB, that's ~ 130 kB more for 999-SNAPSHOT Here is diff of |
https://www.diffchecker.com/T5ZNgETl/ is better - trimmed (only name + length) and sorted |
io/quarkus/tls/runtime @cescoffier found on the @rsvoboda diff |
yes, tls stuff addition, generated vertx stuff is a bit bigger Jackson stuff doesn't seem to be problematic / much bigger, it's just 2.17.1 in 3.11.3 vs 2.17.2 in main |
@radcortez what's the state of this after #42814? |
I still need to do proper measurements. I also made some improvements on the SR config side. I reopened the issue, because this only fixes the OTel issue, there is still the other issue with Jackson. |
@franz1981 @radcortez I believe we should put all this information (along with whatever else we have picked up) in |
Good point; I saved to do it for two reasons:
I will see if I can come up with the same info by switching to JFR (which collect the same event) - wdyt @roberttoyonaga ? Is it correct @roberttoyonaga ? |
Yeah, those are valid, but if we someone were to look into Security allocations say in a month, they would have to ask you all over again and / or weed through information scattered in various discussions / chats. We can certainly add a caution note in the doc explaining that parts of the methodology are subject change :) |
Hi @franz1981 ! Hmm I don't think you can disable TLABs in Native Image. But even if you can, all JFR allocation events ( It is possible to use epsilon GC in Native Image though using |
@radcortez This need to be still opened? @rsvoboda wdyt? So maybe we can track this with a more narrowed issue - given that the OTel one seems fixed and, specifically, the failure in the regression test |
Yes, the problematic pieces from the OTel side are fixed. Not sure about the other ones. We can close this one and track any remaining pieces in separate issues. |
Describe the bug
After recent update from Quarkus 3.11 to 3.13 in QE StartStopTS (https://github.com/quarkus-qe/quarkus-startstop) we noticed RSS usage increase / failures due to threshold hit in native mode for application using MicroProfile / SmallRye extensions.
I was running my experiments on local Apple MBP M3 Pro, limiting running applications to the minimum, using 21.0.2-graalce. JVM mode runs didn't reveal noticeable difference
I did checks on more version, but these are the key ones:
These numbers are relevant to my system, you may see slightly different values when running the reproducer steps. But the key thing is that there is trend in increase of RSS usage with the same app after first successful request. The issue manifests in native mode, but imho it's not related to native-image tooling. Just FYI, I notice that RSS was slightly smaller when using Mandrel (23.1.4.r21-mandrel) to build the binary.
There is a slight increase between 3.11.3 and 3.12.3, difference between 3.11.3 / 3.12.3 and 3.13.2 is more noticeable, ~ 5-7 %.
Some details from native-image command:
3.13.2
3.12.3
3.11.3
Expected behavior
RSS usage stays similar between Quarkus 3.11 and 3.13 in MP application
Actual behavior
RSS usage increase between Quarkus 3.11 and 3.13 in MP application
How to Reproduce?
Output of
uname -a
orver
macOS
Output of
java -version
Java 21
Quarkus version or git rev
3.13.2
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response
The text was updated successfully, but these errors were encountered: