Allow product tests' framework to use more memory#16231
Allow product tests' framework to use more memory#16231electrum merged 1 commit intotrinodb:masterfrom
Conversation
|
Doesn't this suggest we're starting to make the GC work more either because allocation rate increased or because we actually consume more memory? cc: @raunaqmorarka have you noticed any changes in recent benchmark runs? EDIT: Seems it's limited to the PT launcher itself and not the server itself? How did you verify this @nineinchnick ? |
|
Great job on investigating and finding a possible fix @nineinchnick. |
I don't have much knowledge around this. I found out the minimum value when the issue is not appearing by trial and error, running tests locally outside the tests' container. I hope we'll see the same results in the CI in this PR. |
|
BTW the issue also stopped appearing when I only added the async profiler agent. It must have somehow affected the GC too. |
That suggest that product has changed. Should we also change recommended jvm config for Trino? |
|
It worked, I'm comparing the |
|
Can someone retry |
|
We don’t use ParallelGC for Trino. This is a very specific tuning for product tests, that I actually don’t understand. Does anyone know why it was setup this way? The docs indicate that low settings here can impact performance. I wonder if we should remove these settings and use G1. |
|
There have been some reports in Slack about instability in 407, so it’s plausible there’s a real issue introduced in that version . |
|
All green and good to go. Let's merge this to reduce the CI queue, and we can do more investigation later. |
|
Agreed, going to merge this now. |
|
@nineinchnick can you try these experiments:
|
|
Another idea: simply set a max heap size, for both G1 and ParallelGC. The intent seems to be to use as little memory as possible for the product tests framework, to allow more memory for the containers. As long as we don't have variable usage (i.e. some tests need more framework memory and less container memory, while others need more container memory), then this should be simpler. |
|
Let's continue in #16306 |


Description
After #15833 we saw a big increase in the duration of product tests jobs. Starting tests got stuck for 2–3 minutes before loading the tempto configuration. This only happens on amd64 hosts with 7 GB of memory. I found this section that explains why we set it to 10% before: https://docs.oracle.com/en/java/javase/17/gctuning/factors-affecting-garbage-collection-performance.html#GUID-7FB2D1D5-D75F-4AA1-A3B1-4A17F8FF97D0
I chose to increase
MaxHeapFreeRatioto 50% because this is the lowest value when tests didn't get stuck.Additional context and related issues
Release notes
(x) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: