Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSS usage increase between Quarkus 3.11 and 3.13 in MP application in native mode #42506

Closed
rsvoboda opened this issue Aug 13, 2024 · 78 comments · Fixed by #42814
Closed

RSS usage increase between Quarkus 3.11 and 3.13 in MP application in native mode #42506

rsvoboda opened this issue Aug 13, 2024 · 78 comments · Fixed by #42814

Comments

@rsvoboda
Copy link
Member

rsvoboda commented Aug 13, 2024

Describe the bug

After recent update from Quarkus 3.11 to 3.13 in QE StartStopTS (https://github.com/quarkus-qe/quarkus-startstop) we noticed RSS usage increase / failures due to threshold hit in native mode for application using MicroProfile / SmallRye extensions.

I was running my experiments on local Apple MBP M3 Pro, limiting running applications to the minimum, using 21.0.2-graalce. JVM mode runs didn't reveal noticeable difference

I did checks on more version, but these are the key ones:

3.13.2
AVG RSS (kB) without min and max values: 65446

3.12.3
AVG RSS (kB) without min and max values: 61893

3.11.3
AVG RSS (kB) without min and max values: 60969

These numbers are relevant to my system, you may see slightly different values when running the reproducer steps. But the key thing is that there is trend in increase of RSS usage with the same app after first successful request. The issue manifests in native mode, but imho it's not related to native-image tooling. Just FYI, I notice that RSS was slightly smaller when using Mandrel (23.1.4.r21-mandrel) to build the binary.

There is a slight increase between 3.11.3 and 3.12.3, difference between 3.11.3 / 3.12.3 and 3.13.2 is more noticeable, ~ 5-7 %.

Some details from native-image command:
3.13.2

  29.65MB (46.02%) for code area:    51,790 compilation units
  33.88MB (52.58%) for image heap:  377,317 objects and 78 resources
 921.53kB ( 1.40%) for other data
  64.43MB in total
------------------------------------------------------------------------------------------------------------------------
Top 10 origins of code area:                                Top 10 object types in image heap:
  12.10MB java.base                                           10.03MB byte[] for code metadata
   1.86MB quarkus-runner.jar                                   5.29MB byte[] for java.lang.String
   1.81MB c.f.jackson.core.jackson-databind-2.17.2.jar         4.07MB java.lang.Class
   1.34MB svm.jar (Native Image)                               3.49MB java.lang.String
   1.26MB modified-io.vertx.vertx-core-4.5.9.jar               1.39MB com.oracle.svm.core.hub.DynamicHubCompanion
 637.69kB io.netty.netty-buffer-4.1.111.Final.jar           1005.64kB byte[] for general heap data
 589.17kB com.fasterxml.jackson.core.jackson-core-2.17.2.jar 908.41kB byte[] for reflection metadata
 509.34kB io.netty.netty-common-4.1.111.Final.jar            704.97kB java.lang.String[]
 440.35kB io.netty.netty-codec-http-4.1.111.Final.jar        597.46kB c.o.svm.core.hub.DynamicHub$ReflectionMetadata
 401.02kB io.netty.netty-transport-4.1.111.Final.jar         510.38kB java.util.HashMap$Node
   8.47MB for 148 more packages                                5.97MB for 4166 more object types

3.12.3

  29.33MB (45.84%) for code area:    51,075 compilation units
  33.77MB (52.77%) for image heap:  374,656 objects and 76 resources
 910.64kB ( 1.39%) for other data
  63.99MB in total
------------------------------------------------------------------------------------------------------------------------
Top 10 origins of code area:                                Top 10 object types in image heap:
  12.09MB java.base                                            9.93MB byte[] for code metadata
   1.81MB c.f.jackson.core.jackson-databind-2.17.2.jar         5.25MB byte[] for java.lang.String
   1.73MB quarkus-runner.jar                                   4.02MB java.lang.Class
   1.41MB svm.jar (Native Image)                               3.47MB java.lang.String
   1.25MB modified-io.vertx.vertx-core-4.5.7.jar               1.37MB com.oracle.svm.core.hub.DynamicHubCompanion
 589.17kB com.fasterxml.jackson.core.jackson-core-2.17.2.jar1002.65kB byte[] for general heap data
 588.80kB io.netty.netty-buffer-4.1.108.Final.jar            896.98kB byte[] for reflection metadata
 494.40kB io.netty.netty-common-4.1.108.Final.jar            699.94kB java.lang.String[]
 447.51kB io.netty.netty-codec-http-4.1.108.Final.jar        591.60kB c.o.svm.core.hub.DynamicHub$ReflectionMetadata
 399.07kB io.netty.netty-transport-4.1.108.Final.jar         510.56kB java.util.HashMap$Node
   8.28MB for 147 more packages                                6.12MB for 4139 more object types

3.11.3

  29.06MB (45.68%) for code area:    50,844 compilation units
  33.67MB (52.92%) for image heap:  373,003 objects and 76 resources
 912.34kB ( 1.40%) for other data
  63.62MB in total
------------------------------------------------------------------------------------------------------------------------
Top 10 origins of code area:                                Top 10 object types in image heap:
  12.09MB java.base                                            9.84MB byte[] for code metadata
   1.81MB c.f.jackson.core.jackson-databind-2.17.1.jar         5.21MB byte[] for java.lang.String
   1.61MB quarkus-runner.jar                                   4.01MB java.lang.Class
   1.40MB svm.jar (Native Image)                               3.45MB java.lang.String
   1.25MB modified-io.vertx.vertx-core-4.5.7.jar               1.37MB com.oracle.svm.core.hub.DynamicHubCompanion
 589.09kB com.fasterxml.jackson.core.jackson-core-2.17.1.jar1001.41kB byte[] for general heap data
 588.80kB io.netty.netty-buffer-4.1.108.Final.jar            888.38kB byte[] for reflection metadata
 494.98kB io.netty.netty-common-4.1.108.Final.jar            696.34kB java.lang.String[]
 447.51kB io.netty.netty-codec-http-4.1.108.Final.jar        589.49kB c.o.svm.core.hub.DynamicHub$ReflectionMetadata
 399.07kB io.netty.netty-transport-4.1.108.Final.jar         510.19kB java.util.HashMap$Node
   8.16MB for 145 more packages                                6.20MB for 4134 more object types

Expected behavior

RSS usage stays similar between Quarkus 3.11 and 3.13 in MP application

Actual behavior

RSS usage increase between Quarkus 3.11 and 3.13 in MP application

How to Reproduce?

git clone https://github.com/quarkus-qe/quarkus-startstop
cd quarkus-startstop

sdk use java 21.0.2-graalce

mvn clean verify -pl testsuite -Dtest=StartStopTest#fullMicroProfileNative -Dquarkus.version=3.13.2 -Dstart-stop.iterations=25 | grep "AVG RSS"
mvn clean verify -pl testsuite -Dtest=StartStopTest#fullMicroProfileNative -Dquarkus.version=3.12.3 -Dstart-stop.iterations=25 | grep "AVG RSS"
mvn clean verify -pl testsuite -Dtest=StartStopTest#fullMicroProfileNative -Dquarkus.version=3.11.3 -Dstart-stop.iterations=25 | grep "AVG RSS"

Output of uname -a or ver

macOS

Output of java -version

Java 21

Quarkus version or git rev

3.13.2

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@rsvoboda rsvoboda added the kind/bug Something isn't working label Aug 13, 2024
@quarkus-bot
Copy link

quarkus-bot bot commented Aug 13, 2024

/cc @Karm (mandrel), @galderz (mandrel), @zakkak (mandrel,native-image)

@rsvoboda
Copy link
Member Author

I did some investigation on top of Quarkus main, my initial tips (OpenTelementry bump, SmallRye Config bump) didn't prove to be the case, after that I tries several custom build but I wasn't able to pin point a concrete commit causing the issues. It's more like accumulation of multiple changes.

Here is the dump of my notes (the goal is to get from 65MB to 59MB), I didn't add it to the description intentionally, it's already quite heavy:

mvn clean verify -pl testsuite -Dtest=StartStopTest#fullMicroProfileNative -Dquarkus.version=999-SNAPSHOT -Dstart-stop.iterations=25 | grep "AVG rssKb"

Quarkus main
2024-08-12 17:57:25.841 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 65532


Quarkus main without otel bump - https://github.com/quarkusio/quarkus/commit/b6f54993c43f955f2b20855b9c4d694af7c210f2
2024-08-12 18:02:20.956 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 65246


before MP Config update (HEAD detached at d7b84f6d01c)                                                  ~15.7.
  https://github.com/quarkusio/quarkus/commit/78045ea67413518f7c77271d08a70d03d1fffbbd
2024-08-12 18:17:15.291 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 64582

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+609   ~14.7.
git checkout a8dfd70cc8c916d0f1e7e5c0eea21832e90c8785
HEAD is now at a8dfd70cc8c resources, not resource for path
2024-08-12 19:49:55.790 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 64613

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+609   ~10.7.
git checkout aaf2960b388115bfd3e6fd903c71df8f6c1a8ba1
HEAD is now at aaf2960b388 Merge pull request #41793 from turing85/feature/parmeters-in-compile-plugin
2024-08-12 19:53:24.538 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 63120

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+679   ~9.7.
git checkout 37b1fb98322230a9165105b146c978e36afa93bd
HEAD is now at 37b1fb98322 WebSockets Next: Dev UI fixes
2024-08-12 19:59:05.640 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 62891

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+714   ~8.7.
git checkout eecc1ae55a45042e1fcb1bca36d70f8ad845a085
Previous HEAD position was 37b1fb98322 WebSockets Next: Dev UI fixes
2024-08-12 20:03:20.918 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61257

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+709   ~6.7.
git checkout 72d8bf6840b5aaf30392a69c03e460c2670b9c96
    Update telemetry-micrometer.adoc
2024-08-12 18:25:57.942 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61243
2024-08-12 20:06:16.126 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61376

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+779   ~3.7.
git checkout 0ae63d59354cfca3a32b4006be3ca4dcabeb9870
HEAD is now at 0ae63d59354 Support multiple recipe artifacts in quarkus update
2024-08-12 20:11:35.490 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61022

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+884   ~30.6.
git checkout b310b2699f2113276cf2f6a0ded72f7587f13693
HEAD is now at b310b2699f2 Bump ubi image to 1.19 for jib
2024-08-12 20:18:43.963 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 61079

https://github.com/quarkusio/quarkus/commits/main/?after=00b31c814afe25e3d15e0a99d639f7e4383a0341+954   ~26.6.
git checkout 872ffdcc60ac75a86c7e22258c9d3d1eb1ba2dba
HEAD is now at 872ffdcc60a Bump io.quarkus:quarkus-platform-bom-maven-plugin
2024-08-12 20:23:01.314 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG rssKb without min and max values: 60974

RSS is decreasing slowly, so this leads me to the conclusion that it's more like accumulation of multiple changes that lead into such increased RSS I see between 3.11 and 3.13.

@quarkus-bot

This comment was marked as spam.

@geoand
Copy link
Contributor

geoand commented Aug 13, 2024

cc @franz1981

@franz1981
Copy link
Contributor

franz1981 commented Aug 13, 2024

This same trend is confirmed on the CI with stable Linux machines as well?
IIRC the start stop test should run wild, without any max heap settings: what happen if you cap it to some value (eg twice the max heap used while running loose)?

In case it is, and the native image cmd doesn't reveal anything relevant (@zakkak could help there I think), if the additional weight come from progressively increasing amount of features, it's ok.
But if we believe we could make things better or it is unacceptable, we can:

  • have an heap dump with GC, without GC running of the 2 versions and compare
  • have some allocation flamegraphs (@galderz now native image should be able to collect alloc JFR events, no?) and check what we can do to improve there

@rsvoboda
Copy link
Member Author

This same trend is confirmed on the CI with stable Linux machines as well?

That's perf lab only, we do not have bare metal machines in our lab.

What about JVM mode?

As mentioned in the description, JVM mode runs didn't reveal noticeable difference. It's native mode where the diff manifests.
It can be the case that as JVM mode uses more memory the diff is not detected ...

@franz1981
Copy link
Contributor

@rsvoboda I mean; the description here made me think it was a macOS run, but the difference is still the same while moving to Linux, right?

@rsvoboda
Copy link
Member Author

Yes, @mjurc tried the reproducer commands on his Fedora machine and he saw similar trend, the absolute numbers were of course a bit different. Michal confirmed that RSS was higher for 3.13 comparing to 3.11.

@galderz
Copy link
Member

galderz commented Aug 14, 2024

@franz1981 don't know if JFR alloc events are in place (@roberttoyonaga ?), I worked on the old object event but that's more for tracking leaks and this doesn't smell like a leak right now.

@rsvoboda @mjurc if you run the test manually, can you see if the rss increase is due to startup and/or first request? again if you run the test manually you should get heap dumps after startup and after first request with the different versions and see if we can spot some differences there

One more thing to maybe try here is @roberttoyonaga NMT work in case the increases are not spotted in heap contents.

@galderz
Copy link
Member

galderz commented Aug 14, 2024

@franz1981 Ah yes we now have ObjectAllocationInNewTLAB, definitely want to jfr that @rsvoboda @mjurc

Finally, @rsvoboda run through the steps in https://quarkus.io/blog/native-startup-rss-troubleshooting and see if you can get a better understanding of the cause. As we highlighted there, the way native handles memory can lead to bigger increases caused by small changes compared to jvm mode.

@roberttoyonaga
Copy link
Contributor

Yes Native Image JFR has both the ObjectAllocationInNewTLAB event and the ObjectAllocationSample event. These events sample object allocations that result in requiring a new TLAB. They record stacktraces, so you can generate flamegraphs in JMC too. ObjectAllocationSample is basically the same as ObjectAllocationInNewTLAB but it's throttled to reduce overhead. It should safe to enable that event in production, but ObjectAllocationInNewTLAB has very high overhead.

yes, like Galder mentioned, you could also try "native memory tracking" to see where native allocation are happening off the "Java" heap. There's a blog post here: https://developers.redhat.com/articles/2024/05/21/native-memory-tracking-graalvm-native-image#

@rsvoboda
Copy link
Member Author

Thanks for the pointers. I will try to look more into it, but my week is quite packed before I go on pto next week.

@jerboaa
Copy link
Contributor

jerboaa commented Aug 16, 2024

@roberttoyonaga @rsvoboda We should mention that not all upstream features in master or 24.1 aren't yet in a released version of mandrel. So feature availability may vary. AFAIK, main mandrel version is 23.1 (mandrel for JDK 21) in quarkus.

@roberttoyonaga
Copy link
Contributor

Oh yes, that's true. ObjectAllocationSample and native memory tracking are in the GraalVM EA build. But heap dumping and the ObjectAllocationInNewTLAB JFR event will be available in mandrel for JDK 21.

@jerboaa
Copy link
Contributor

jerboaa commented Aug 16, 2024

Oh yes, that's true. ObjectAllocationSample and native memory tracking are in the GraalVM EA build.

FWIW, CE (or graalvm community builds) early access builds are here which we should be testing:
https://github.com/graalvm/graalvm-ce-dev-builds/releases. They're based on OpenJDK and open source code.

@zakkak
Copy link
Contributor

zakkak commented Aug 20, 2024

I have tried reproducing this on my Fedora setup with both GraalVM CE 21.0.2 (from SDK man) and the default builder image (Mandrel 23.1.4.0) and the results are:

3.11.3.graal.txt:2024-08-20 16:26:28.538 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 72581

3.11.3.txt:2024-08-20 16:13:38.929 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 72692

3.13.2.graal.txt:2024-08-20 16:28:15.357 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 73427

3.13.2.txt:2024-08-20 16:11:12.395 INFO  [i.q.t.s.StartStopTest] (testRuntime) AVG RSS (kB) without min and max values: 73282

Given that the image size increased by ~1MB I don't find the above increase in RSS usage surprising, but the reported (by @rsvoboda) RSS increase is quite larger than that :/

@mjurc when reproducing this did you notice similar increase to what @rsvoboda reports or a smaller one like the one in my measurements?

@rsvoboda what are the thresholds set to in the CI? Can you also share the actual numbers from these runs? The default in the repository seems to be quite high at 90000 kB.

@franz1981
Copy link
Contributor

franz1981 commented Aug 20, 2024

Thanks @zakkak for checking; iirc that benchmark is NOT having any limit on memory or xms/xmx nor of number of cores; both things which could affect RSS in the way the allocations happen vs GC cycles (vs malloc arenas could be used) .
Maybe would be beneficial to have both information while trying to reproduce it

@radcortez
Copy link
Member

@franz1981, how is the config-quickstart RSS?

@franz1981
Copy link
Contributor

@radcortez still on holiday and with no VPN access, @johnaohara can answer you...but last time I checked there was a super small increase

@zakkak
Copy link
Contributor

zakkak commented Aug 20, 2024

Thanks @zakkak for checking; iirc that benchmark is NOT having any limit on memory or xms/xmx nor of number of cores; both things which could affect RSS in the way the allocations happen vs GC cycles (vs malloc arenas could be used) . Maybe would be beneficial to have both information while trying to reproduce it

@franz1981 If I get it right it sets xmx to 96m, see https://github.com/quarkus-qe/quarkus-startstop/blob/643dadc30f810333f8a0c8ef846e7aaac53f9a7e/testsuite/src/it/java/io/quarkus/ts/startstop/utils/MvnCmds.java#L24, regarding the rest you are right it doesn't control them.

@franz1981
Copy link
Contributor

Uh, I have missed it (my memory is failing me, sorry :"/ I should have checked) - thanks to have verified it

@rsvoboda
Copy link
Member Author

I did some experiments with jfr and nmt recording with Quarkus MP app. As https://developers.redhat.com/articles/2024/05/21/native-memory-tracking-graalvm-native-image suggests, I used EA build of GraalVM.

Here is recording-nmt-3.11-3.13.zip with recording-nmt-3.11.3.jfr and recording-nmt-3.13.2.jfr.
GitHub allows only "known file extensions" and thus I had to zip the files.

Native binaries were produced from https://github.com/quarkus-qe/quarkus-startstop repo using this procedure:

export PATH=/Users/rsvoboda/Downloads/graalvm-jdk-24+10.1/Contents/Home/bin:$PATH

mvn clean package -f app-full-microprofile -Dnative -Dquarkus.native.additional-build-args='--enable-monitoring=heapdump\,jfr\,nmt' -Dquarkus.version=3.13.2
mv app-full-microprofile/target/quarkus-runner ~/Downloads/app-full-microprofile-nmt-3.13.2 

mvn clean package -f app-full-microprofile -Dnative -Dquarkus.native.additional-build-args='--enable-monitoring=heapdump\,jfr\,nmt' -Dquarkus.version=3.11.3
mv app-full-microprofile/target/quarkus-runner ~/Downloads/app-full-microprofile-nmt-3.11.3 

JFR was produced using:

get_jfr () {
  $1 -XX:+FlightRecorder -XX:StartFlightRecording=filename=$2 &
  APPLICATION_PID=$!

  counter=0
  until curl -s -f -o /dev/null "http://127.0.0.1:8080/q/health/ready"
  do
    sleep 0.001
    counter=$((counter+1))
  done
  echo "Cycles: $counter"
  echo "RSS: `ps -p $APPLICATION_PID -o rss=`"
  sleep 0.5
  echo "RSS after 0.5s: `ps -p $APPLICATION_PID -o rss=`"

  kill -SIGTERM "$APPLICATION_PID"
  wait "$APPLICATION_PID"
}

get_jfr ${HOME}/Downloads/app-full-microprofile-nmt-3.13.2 ${HOME}/Downloads/recording-nmt-3.13.2.jfr
get_jfr ${HOME}/Downloads/app-full-microprofile-nmt-3.11.3 ${HOME}/Downloads/recording-nmt-3.11.3.jfr

I noticed in that the flame graph under Event Browser - Java Application has some additional bars for 3.13 comparing to 3.11

Screenshot 2024-08-22 at 16 57 34

At this stage I think I'm at my knowledge and investigation limits, any insight would be welcome. I shared my reproducer steps so anybody can play with this.

@gsmet
Copy link
Member

gsmet commented Aug 27, 2024

FWIW, I'm planning to have a closer look at the Jackson thing but got sidetracked by some reverting fun :). I'll have a look soon hopefully.

@gsmet
Copy link
Member

gsmet commented Aug 27, 2024

@radcortez btw, I think it would be worth checking that all the interceptors are correctly protected.

@radcortez
Copy link
Member

I can work on a patch if you want.

Thanks, but let's figure out the exact requirements before blindly patching the code.

@gsmet
Copy link
Member

gsmet commented Aug 27, 2024

I have conducted some (naive and basic) profiling with asyncprofiler with app-full-microprofile and focused on the RESTEasy Reactive startup.

I saw nothing special in the CPU profile. The alloc profile is a bit more interesting even if I'm not entirely sure what to think of it.

The initialization of the ObjectMapper seems to be a bit heavier in 3.14. But... all I can see in the profile is more bars with the class loader loading bytes.
I have no idea if it shows a new issue in the new RunnerClassLoader or if it's just the version of Jackson loading more classes. I will try to downgrade Jackson to the version in 3.13.2 and then will try to revert the RunnerClassLoader and see what I get.
(Also, it might just be an artifact of this particular run, will see)

Here is the alloc profile for 3.13.2:

Screenshot from 2024-08-27 18-55-26

And here is the one I got with plain main:

Screenshot from 2024-08-27 18-55-35

@geoand
Copy link
Contributor

geoand commented Aug 27, 2024

What's the Jackson thing in question?

@franz1981
Copy link
Contributor

franz1981 commented Aug 27, 2024

@gsmet both myself and @radcortez are using the methodology I have shown in #35406 (comment)

The first bullet point, which disable TLABs, but adding:

  • alloc=1 given that is a "new" async profiler feature to decide the granularity by which allocations are collected
  • replacing EpsilonGC with G1 because the latter still capture all allocations, while TLABs are disabled, similarly to EpsilonGC

I would use async profiler to stop the profiling collection before the shutdown happen so you won't include the allocations while stopping.
For this latter point I can describe what the quarkus start stop test is doing, because it automatically solve this problem.
Indeed my suggestion would be pass the right configuration to it and just use the async profiler integration I have added there, which is collecting the profiling data right after the first successfully request hit, without including any stop data.

@radcortez can share its methodology and command line.
This is probably the only way to measure the allocation pressure when you have such small startup times, because allocation profiling can be very misleading and biased when there are so few samples. With this methodology it collects all allocations (becoming tracing and no longer sampling) and you can have a match with the allocated bytes directly.

@radcortez
Copy link
Member

Yes, I confirm that I use what @franz1981 taught me :)

For reference, these are the commands that I use:

  • java -agentpath:[AGENT_PATH]/ibasyncProfiler.dylib=start,alloc=1,total,event=alloc,file=alloc.jfr -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseTLAB -Xmx1G -Xms1G -XX:+AlwaysPreTouch -jar target/quarkus-app/quarkus-run.jar
  • java -cp [PROFILER_PATH]/async-profiler-converter-3.0.jar jfr2flame alloc.jfr --alloc --total alloc.html

@gsmet
Copy link
Member

gsmet commented Aug 27, 2024

@radcortez I tried with your commands and I don't see much difference for Jackson - but it was with the full app-full-microprofile.

But I found out something extremely odd: in the 3.13 app-full-microprofile app, I see absolutely nothing OpenTelemetry: nothing at startup, nothing when a request is executed (I used the search engine and result is 0%).

Whereas for main, when I search for opentelemetry, the result is 75.74% and I see the OpenTelemetry initialization and I see some OpenTelemetry gRPC thingy when the request is executed (I execute a request).

I'm wondering if somehow, OpenTelemetry could be disabled in 3.13 and somehow enabled with the new stuff introduced in main.

The configuration is identical and there's some OpenTelemetry stuff in the config:

# address of simplistic Vert.x collector started before 'app-full-microprofile' project is tested
quarkus.otel.exporter.otlp.traces.endpoint=http://localhost:4317/api/traces
# shorten waiting period for traces
quarkus.otel.bsp.schedule.delay=700ms
quarkus.otel.bsp.export.timeout=700ms

but I haven't started a collector, I just started the app.

In any case, it looks like the behavior is a bit different.

@radcortez
Copy link
Member

But I found out something extremely odd: in the 3.13 app-full-microprofile app, I see absolutely nothing OpenTelemetry: nothing at startup, nothing when a request is executed (I used the search engine and result is 0%).

Strange... I've also executed with current main and I do see the OTel Recorder being executed during start.

@gsmet
Copy link
Member

gsmet commented Aug 27, 2024

Yeah that's what I have in main. It's with 3.13.2 that I don't see OTel around.

@radcortez
Copy link
Member

I did try with 3.13.2 and it seems ok to me (I see the OTel Recorder at init).

@rsvoboda
Copy link
Member Author

@gsmet is looking into quarkus rest + Jackson area I've prepared https://github.com/rsvoboda/quarkus-startstop/tree/reactive-jackson

I looked into summary from native-image for mvn -f app-full-microprofile clean package -Dnative -Dquarkus.version=999-SNAPSHOT / mvn -f app-full-microprofile clean package -Dnative -Dquarkus.version=3.11.3

There is increase in quarkus-runner.jar size + netty stuff, imho this can have some implications to ther runtime.
Code metadata are slightly increased, probably means more classes are loaded ?

Screenshot 2024-08-28 at 8 16 39

Most interesting one for me is the increase in size of quarkus-runner.jar which went 987.96kB => 1.14MB, that's ~ 130 kB more for 999-SNAPSHOT
The app has <finalName>quarkus</finalName>, so it imho relates to quarkus-native-image-source-jar/quarkus-runner.jar in target directory of the app.

Here is diff of unzip -l of that file between 3.11.3 and 999-SNAPSHOT: https://www.diffchecker.com/F8MhZqVr/
(the link will be available for 1 month)

@rsvoboda
Copy link
Member Author

https://www.diffchecker.com/T5ZNgETl/ is better - trimmed (only name + length) and sorted

@franz1981
Copy link
Contributor

io/quarkus/tls/runtime @cescoffier found on the @rsvoboda diff

@rsvoboda
Copy link
Member Author

yes, tls stuff addition, generated vertx stuff is a bit bigger

Jackson stuff doesn't seem to be problematic / much bigger, it's just 2.17.1 in 3.11.3 vs 2.17.2 in main

@quarkus-bot quarkus-bot bot added this to the 3.16 - main milestone Aug 28, 2024
@radcortez radcortez reopened this Aug 28, 2024
@franz1981
Copy link
Contributor

@radcortez what's the state of this after #42814?

@radcortez
Copy link
Member

I still need to do proper measurements. I also made some improvements on the SR config side.

I reopened the issue, because this only fixes the OTel issue, there is still the other issue with Jackson.

@gsmet gsmet modified the milestones: 3.16 - main, 3.14.2 Aug 30, 2024
@geoand
Copy link
Contributor

geoand commented Sep 6, 2024

Yes, I confirm that I use what @franz1981 taught me :)

For reference, these are the commands that I use:

* `java -agentpath:[AGENT_PATH]/ibasyncProfiler.dylib=start,alloc=1,total,event=alloc,file=alloc.jfr -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC -XX:-UseTLAB -Xmx1G -Xms1G -XX:+AlwaysPreTouch -jar target/quarkus-app/quarkus-run.jar`

* `java -cp [PROFILER_PATH]/async-profiler-converter-3.0.jar jfr2flame alloc.jfr --alloc --total alloc.html`

@franz1981 @radcortez I believe we should put all this information (along with whatever else we have picked up) in TROUBLESHOOTING.md, otherwise it will be difficult and time consuming for new folks to discover the proper methodology for fixing future issues (which inevitably arise soon)

@franz1981
Copy link
Contributor

franz1981 commented Sep 6, 2024

Good point; I saved to do it for two reasons:

  1. It is an impl detail of how G1/EpsilonGC and OpenJDK work with the allocation events
  2. It is tied with a specific async profiler version

I will see if I can come up with the same info by switching to JFR (which collect the same event) - wdyt @roberttoyonaga ?
I would love to see native image to have profit using the same methodology i.e.
disabling TLABs and using epsilonGC while configuring allocation events to have 1 byte granularity to be able to collect all allocations without any bias - effectively turning sampling into tracing.
If it will works, we have a new toy to play @radcortez ;)

Is it correct @roberttoyonaga ?

@geoand
Copy link
Contributor

geoand commented Sep 6, 2024

Yeah, those are valid, but if we someone were to look into Security allocations say in a month, they would have to ask you all over again and / or weed through information scattered in various discussions / chats.

We can certainly add a caution note in the doc explaining that parts of the methodology are subject change :)

@roberttoyonaga
Copy link
Contributor

I will see if I can come up with the same info by switching to JFR (which collect the same event) - wdyt @roberttoyonaga ?
I would love to see native image to have profit using the same methodology i.e.
disabling TLABs and using epsilonGC while configuring allocation events to have 1 byte granularity to be able to collect all allocations without any bias - effectively turning sampling into tracing.
If it will works, we have a new toy to play @radcortez ;)

Is it correct @roberttoyonaga ?

Hi @franz1981 !

Hmm I don't think you can disable TLABs in Native Image. But even if you can, all JFR allocation events (jdk.ObjectAllocationSample, jdk.ObjectAllocationInNewTLAB) essentially have sampling "built in". They only emit events on the allocation slow path, when a new TLAB is needed (to reduce overhead). So you wouldn't be able to use those JFR events to capture every allocation anyway (1 byte granularity is not possble). As far as I know, that's the case with JFR in both Hotspot and Native Image.

It is possible to use epsilon GC in Native Image though using --gc=epsilon.

@franz1981
Copy link
Contributor

franz1981 commented Sep 30, 2024

I reopened the issue, because this only fixes the OTel issue, there is still the other issue with Jackson.

@radcortez This need to be still opened? @rsvoboda wdyt?

So maybe we can track this with a more narrowed issue - given that the OTel one seems fixed and, specifically, the failure in the regression test

@radcortez
Copy link
Member

Yes, the problematic pieces from the OTel side are fixed. Not sure about the other ones. We can close this one and track any remaining pieces in separate issues.

@geoand geoand closed this as completed Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

10 participants