-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory usage increased between Quarkus 3.6 and 3.7 #38814
Comments
If you can publish a 3.6 heap dump and a 3.7 one, it probably would help to all work on the same baseline. |
I've got the dumps, and will attach them, but I don't think I have the 'right' dump. I had two options for collection the dumps:
Dumps are here: https://drive.google.com/file/d/1gaGHo-qHu2GImdskJMiF-dNCocabGAfG/view?usp=sharing |
Some of what I'm seeing in the dumps also looks similar to what Guillaume investigated in #35280, so I think this issue may be in a similar domain to that one. |
See also #41156. Memory requirements went up again in Quarkus 3.12. |
I had a look at the issue and here are my findings for The first thing is that the OOM is triggered by the build process, it has nothing to do with CL leaks. Second thing is that there are two main culprits:
I wonder if we could improve that. I don't fully understand why we would need to keep a list of the full managed dependencies tree once the artifact has been resolved - but I suppose there are very few projects with that many managed dependencies. We would probably need to discuss this with @aloubyansky . That's for the first part of the work. Now I will have a look at a heap dump from an old version to see what could be the actual regression. |
So from one I can see, it's just Quarkus getting generally fatter:
The main contributors to the
For 3.6.1, that is the other version I tested:
From a quick analysis:
I think one thing we are learning with this is that keeping relocations around for ages has a cost so we need to drop the relocations at some point. |
how big impact are they ? I would imagine fixing/improving how we deal with the full resolved set (which we shouldn't have to keep around forever) then relocations should be quite minimal/not a problem? |
I don't think we keep the Aether model in memory forever but we need it at some point and we are using quite a huge amount of memory for it (at least compared to our footprint - it's 1/3 of the memory footprint when we build/start dev mode). Now, I will discuss with Alexey if things can be improved in this area but I wouldn't be surprised if we were pushing things to the limits with our huge BOM and that it wasn't really designed/optimized for such a use case. My wild guess is that it's going to get even worse for a large project with a lot of Quarkus extensions but that's just a theory, from the heap dumps I checked. Now I'm not advocating for us to drop relocations at a fast pace. I'm just saying that we didn't really think they have a cost and they have. So I would rather have the following policy:
For instance, we could remove the Quarkus REST/Quarkus Messaging relocations introduced 4 months ago after the 3.15 release. |
I'm definitely curious on how/why we would have to retain all around forever - wasn't designed/intended normal build should need to retain everything. lets see what @aloubyansky sees. |
Again, we don't keep all around forever. But each |
Yes I get that - but even that shouldn't be necessary. |
We should probably distinguish between a "high tide" memory usage and a "steady state" memory usage. The symptom this work item was raised for was that our high tide memory usage was going up. Unloading things like the Aether memory model would/do help with steady state memory usage, but not with the high tide usage. Big high tide memory usage could prevent people being able to build their Quarkus apps in constrained environments, but steady state memory usage is probably the metric that gets more attention/slides/blogs, etc. |
Not sure if this is relevant here, but concurrency (particularly concurrent startup) can affect the "high tide" in ways that can be hard to predict. |
From what I have seen in the heap dumps, the problem we have here has little to do with concurrency. |
So what makes the issue so problematic for us people with a huge dependency management is that each descriptor will have its own copy of the whole dependency management tree. See: And the Given our whole dependency management tree is 800 KB, it grows very fast. |
Thanks, looked at dumps. The memory hog is in session (RepositoryCache). For start, I'd recommend to test If it does, we know we are at right track. |
@cstamas Thanks for having a look. I can share the dump with you. Either send me your Google account at gsmet at redhat dot com or a place to upload it for you. My initial idea with the patch above was to limit the number of managed dependencies to copy to only the ones that were important for a given descriptor. Meaning that in our case, all the managed dependencies are copied to quite some descriptors that don't really need it (or at least not all of them). |
Yes, and as i mentioned, doing this on POM (removing "unused depmgt") is not feasible as POM lacks context where it will be used, hence no idea which one is "unused". Maybe we need to deconstruct poms and cache depmgt separately, as in this case all the Q1, Q2,.... would literally share the same one instance. |
While it's not a complete fix, I created a PR to improve the situation a bit: apache/maven-resolver#534 |
So, two interesting PRs (for Resolver 1.9.x):
Would be good, if someone could locally merge both, and test such built Resolver 1.9.x with Quarkus. Re timeline: I am out next 2 weeks, and Resolver 1.9.21 was just released last week...But when am back, I could do Resolver 1.9.22 w/ memory improvements, and Maven 3.9.9 is already scheduled next (so early Aug or alike). |
Good news: the Maven IT suite is OK with change above: apache/maven#1617 Bad news: interning all the I did not measure the most interesting question: heap usage change, but I'd bet in a six-pack of beer it is HUGE 😃 |
I will have a look next week. Thanks! |
I can see how it could be a good thing memory wise. I’m a bit worried about the additional work though. Especially since in most of the cases, we are doing the exact same thing over and over (typically the convert operation in ArtifactDescriptorReaderDelegate). Multiplied by 2300 multiplied by the number of artifacts affected, that’s a lot of useless work. |
Agreed. But Biggest problem is that |
@cstamas I created a small PR (for your 1.9 PR, the same patch should be applied to the Locally, when I ran the Maven ITS, I got:
I think the way I reduce allocations by not copying the map might help getting a speed up. It's not scientific but I think it would be worth pushing all this combined to the Apache Maven ITs and see if you get similar results. |
@cstamas and I can confirm the memory usage is greatly reduced. So if you end up confirming that all patches applied actually make things a bit faster, that would be a win on both sides. |
Back from vacation... but DK traveling still ahead. |
@cstamas maybe you could merge the two PRs for MRESOLVER-586 and rebase yours? This patch is an easy win. That way you could easily test the addition of all of them? |
@cstamas Glad to have you back! I'm around if you want to discuss this any further or need some additional help. |
Merged, rebased, Maven ITs running apache/maven#1617 |
Ok, so almost all merged, and MRESOLVER-587 looks good as well:
The only question here are defaults of two new config keys: for safety, am using false/false (so basically interning does NOT kick in, behaves as today) but is user configurable. As we saw, true/true tremendously lowers memory consumption at price of increased runtime. Still, as Guillaume measured (and those changes are picked up as well), there is some hope: #38814 (comment) Ultimate fix would be in ArtifactDescriptorReader and making descriptor immutable, but sadly that is not doable in scope of 1.x/2.0.x of Resolver, is probably to happen in 2.1.x Resolver.... WDYT? Any opinion? |
@cstamas your proposal of interning only managed dependencies by default makes sense to me. |
The "wannabe" Maven 3.9.9 has all patches merged, CI built one (w/o versioned directories) can be downloaded: https://repository.apache.org/content/repositories/snapshots/org/apache/maven/apache-maven/3.9.9-SNAPSHOT/apache-maven-3.9.9-20240812.124731-17-bin.zip Please test! |
Awesome @cstamas I will have a look later today with the original project causing issues. |
@cstamas so we still have an OOM with low memory but a lot later and neither Maven or the Maven Resolver are in the way anymore. I'll go address the other issues I could spot. |
Thanks a lot for his amazing cooperation :). |
FYI Maven 3.9.9 is out |
@manofthepeace would you be interested in preparing the PR with the upgrade? We will probably wait for the corresponding mvnd release to merge but it looks like a good idea to have it ready. |
Sure, will do. |
This had been quite a fight but I can confirm that with Maven 3.9.9 and I will close this one. Thanks to everyone involved. |
Hurray! I'll set a reminder to myself to reduce how much memory the tests are given in a couple of months, once this is in a release. That should allow us to spot creeping regressions. |
Describe the bug
I have an extension test in the quarkus-pact project that started OOMing in the ecosystem CI recently. I've worked around the issue by increasing the memory available to the test, but we may want to investigate why the memory requirements have increased.
I can reproduce just by starting the Quarkus application with constrained memory (it fails to start). The application has two pact extensions in it, so it does more classloading than a simpler application. With Quarkus 3.6, the app starts, and with 3.7, it fails to start. I can force a failure on Quarkus 3.6 by dropping the available memory down to 110mb (from 128mb), so even with Quarkus 3.6, the memory usage was fairly high - 3.7 just happens to be the release that tipped it over the edge into failure.
The ecosystem CI caught the change: quarkiverse/quarkiverse#94 (comment). This is the CI run where the problem first appeared, which might help us identify a specific change: https://github.com/quarkiverse/quarkus-pact/actions/runs/7082294368
My initial guess is that this is related to classloading, but I don't have firm evidence. It could also be test execution. If I give the applications more memory, so that it doesn't OOM, and then trigger a dump once startup is finished, the 3.6 and 3.7 dumps both use about 81mb of memory, and look similar.
Expected behavior
No response
Actual behavior
No response
How to Reproduce?
I've attached a reproducer app. Unzip it, and then run
For more control, another way to reproduce is
mvn install
. It should pass.140m
to128m
, the default. (I had to override the default memory allocation from the parent test class to get the test to pass.)cd cross-extension-integration-tests && mvn clean install
should now show the failurehappy-everyone-all-together-processed.zip
run the test project standalone.
Output of
uname -a
orver
No response
Output of
java -version
No response
Quarkus version or git rev
No response
Build tool (ie. output of
mvnw --version
orgradlew --version
)No response
Additional information
No response
The text was updated successfully, but these errors were encountered: