Generate dynamic Presto configs with pbench by simoneves · Pull Request #48 · rapidsai/velox-testing

simoneves · 2025-09-17T18:22:42Z

This PR adds the ability to generate Presto config values dynamically for a given machine spec (CPU count and CPU RAM size).

The config files are expanded based on more recent manual testing (@tmostak and @patdevinwilson) and converted to templates for population by pbench genconfig.

Pending a better solution, due to go not being installed on lab machines, I have committed pbench binaries for Linux AMD64 and ARM64, along with a script to generate the config files. These files MUST exist before Presto will run, as the Docker recipe has been adapted to map them in place of the previous hardwired versions.

Finally, there is a simple script to generate the dynamic files, which are the ones which the Presto containers will use.

This process first generates the config.json file required by pbench in the generated subdirectory, populating the CPU and RAM entries based on the current host (possibly too simplistic... discuss!). It then runs pbench genconfig which copies all files found in the template folder to the generated folder substituting computed values where required, based on the input values in config.json.

The auto-generated values are NOT applied to the Java config files. These remain at the basic default settings.

Note that there is variant-specific behavior in that vcpu_per_worker is set to 2 for GPU mode, and to CPU count for CPU and Java. This logic is in the generate script rather than having separate templates, although that may still be required in the future.

simoneves · 2025-09-17T18:28:08Z

presto/docker/config/params.json

+  "native_query_mem_percent_of_sys_mem": 0.95,
+  "join_max_bcast_size_percent_of_container_mem": 0.01,
+  "memory_push_back_start_below_limit_gb": 5
+}


This file comes with pbench. I'm sure the heuristics could be tweaked if needed.

simoneves · 2025-09-17T18:28:29Z

presto/docker/config/template/etc_common/catalog/hive.properties

 hive.metastore.catalog.dir=file:/var/lib/presto/data/hive/metastore
 hive.allow-drop-table=true
+
+hive.file-splittable=false


IIUC, this is basically always required. Discuss.

@devavret asked me add it. This is required.

simoneves · 2025-09-17T18:28:45Z

presto/docker/config/template/etc_common/jvm.config

 -server
-Xmx24G
+-Xmx{{ .HeapSizeGb }}G
+-Xms{{ .HeapSizeGb }}G


simoneves · 2025-09-17T18:28:56Z

presto/docker/config/template/etc_common/jvm.config

 -XX:+ExplicitGCInvokesConcurrent
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:+ExitOnOutOfMemoryError
+-Djdk.nio.maxCachedBufferSize=2000000


From @tmostak

simoneves · 2025-09-17T18:29:20Z

presto/docker/config/template/etc_coordinator/config_java.properties

+query.max-total-memory={{ mul .JavaQueryMaxTotalMemPerNodeGb .NumberOfWorkers }}GB
+query.max-memory-per-node={{ .JavaQueryMaxMemPerNodeGb }}GB
+query.max-memory={{ mul .JavaQueryMaxMemPerNodeGb .NumberOfWorkers }}GB
+memory.heap-headroom-per-node={{ .HeadroomGb }}GB


THIS CONFIG WILL PROBABLY NOT WORK!

If it does not work, should it be removed?

Can't remove it. We need a config for Java mode. I just need to fix it.

I removed the explicit memory configs for Java mode. It still works for the simple queries I tried. We can revisit this if we need to do comparisons with Java mode again.

simoneves · 2025-09-17T18:29:41Z

presto/docker/config/template/etc_worker/config_java.properties

+query.max-total-memory={{ mul .JavaQueryMaxTotalMemPerNodeGb .NumberOfWorkers }}GB
+query.max-memory-per-node={{ .JavaQueryMaxMemPerNodeGb }}GB
+query.max-memory={{ mul .JavaQueryMaxMemPerNodeGb .NumberOfWorkers }}GB
+memory.heap-headroom-per-node={{ .HeadroomGb }}GB


THIS CONFIG WILL PROBABLY NOT WORK!

simoneves · 2025-09-17T18:30:18Z

presto/pbench/pbench

+    fi
+fi
+
+$EXE "$@"


Standard pbench wrapper script to detect OS and ARCH and not much else.

I did not add a license header, as it's copied directly from the pbench repo.

Not sure why this annotation isn't now obsolete, as this file is no longer new in this PR but added by a merge-out from main after it was landed in a different PR.

simoneves · 2025-09-17T18:31:25Z

presto/docker/docker-compose.common.yml

  presto-base-volumes:
    volumes:
-      - ./config/etc_common:/opt/presto-server/etc
+      - ./config/generated/etc_common:/opt/presto-server/etc


File structure is expected to be same as before, just moved down into generated.

simoneves · 2025-09-17T18:33:06Z

presto/docker/config/template/etc_coordinator/config_native.properties

+native-execution-enabled=true
+optimizer.optimize-hash-generation=false
+regex-library=RE2J
+use-alternative-function-signatures=true


This file was changed a lot, based on @tmostak's adaptation of the IBM version.

simoneves · 2025-09-17T18:33:31Z

presto/docker/config/template/etc_worker/config_native.properties

+runtime-metrics-collection-enabled=true
+system-mem-pushback-enabled=true
+system-mem-limit-gb={{ sub .ContainerMemoryGb .GeneratorParameters.MemoryPushBackStartBelowLimitGb }}
+system-mem-shrink-gb=20


This file also changed based on @tmostak's adaptation of the IBM version.

simoneves · 2025-10-18T19:36:36Z

presto/scripts/generate_presto_config.sh

+pushd ../docker/config > /dev/null
+
+# always move back even on failure
+trap "popd > /dev/null" EXIT


Not actually sure this is necessary. I think pops happen automatically on exit.

simoneves · 2025-10-18T19:41:36Z

presto/scripts/generate_presto_config.sh

+# get host values
+NPROC=`nproc`
+# lsmem will report in SI.  Make sure we get values in GB.
+RAM_GB=$(( $(lsmem -b | grep "Total online memory" | awk '{print $4}') / (1024*1024*1024) ))


This can't be right!

OK, it IS right, but would be simpler if the divide was just in the awk expression.

Also, something is generating a docker/config/1 file containing a pbench usage message.

Also, no files are generated!

This was due to the typo in the stderr redirection (see below). Fixed.

I moved the divide inside the awk expression to avoid the need for the extra layer of bash math.

simoneves · 2025-10-18T19:46:15Z

presto/scripts/generate_presto_config.sh

+
+# run pbench to generate the config files
+# hide default pbench logging which goes to stderr so we only see any errors
+../../pbench/pbench genconfig -p params.json -t template generated 2&>1 grep -v '\{\"level'


OK, this is bogus and that's my bad for not testing it properly

It should be 2>& not 2&>1. Fixed.

Fix pbench output redirection

patdevinwilson · 2025-10-14T19:33:21Z

presto/scripts/generate_presto_config.sh

+
+# get host values
+NPROC=`nproc`
+RAM_GB=`lsmem | awk '/Total online/ { print $4 }'`


RAM_GB=cat /sys/devices/system/node/node0/meminfo | awk '/MemTotal/ {printf $4/1024/1024}'

This seems to be more accurate :)

@patdevinwilson my comment was out of date anyway. @misiugodfrey came up with a better one. The issue with yours is that it only considers one node of a NUMA system such as a dual GH200. It's still pretty fragile, though. I can't believe that getting a definitive "how many total GB of CPU RAM does this thing have" is so difficult.

simoneves · 2025-10-21T18:36:25Z

@paul-aiyedun this is still blocked by your earlier request-for-changes. Do you still have any concerns?

paul-aiyedun · 2025-10-21T18:59:54Z

@paul-aiyedun this is still blocked by your earlier request-for-changes. Do you still have any concerns?

The config_native_cpu.properties, config_native_gpu.properties, config_native_cpu.properties, and config_native_gpu.properties duplication is still a concern. Also, the Java Presto configurations appear to be incomplete.

paul-aiyedun · 2025-10-21T18:54:12Z

presto/docker/config/template/etc_worker/config_java.properties

+# Memory auto-configuration is not wired for Java engine in this template.
+# Uncomment and set values if using Java workers with memory governance.
+#query.max-total-memory-per-node={{ .JavaQueryMaxTotalMemPerNodeGb }}GB
+#query.max-total-memory={{ mul .JavaQueryMaxTotalMemPerNodeGb .NumberOfWorkers }}GB
+#query.max-memory-per-node={{ .JavaQueryMaxMemPerNodeGb }}GB
+#query.max-memory={{ mul .JavaQueryMaxMemPerNodeGb .NumberOfWorkers }}GB
+#memory.heap-headroom-per-node={{ .HeadroomGb }}GB


Delete commented out configurations.

Same comment applies to other configuration files.

Removed two commented-out optimizer configs from shared native Coordinator config. Also removed all the commented-out Java memory configs as part of reinstating the original Java configs (see below).

paul-aiyedun · 2025-10-21T18:56:16Z

presto/docker/config/template/etc_coordinator/config_java.properties

+# Memory auto-configuration is not wired for Java engine in this template.
+# Uncomment and set values if using Java workers/coordinator end-to-end.
+#query.max-total-memory-per-node={{ .JavaQueryMaxTotalMemPerNodeGb }}GB
+#query.max-total-memory={{ mul .JavaQueryMaxTotalMemPerNodeGb .NumberOfWorkers }}GB
+#query.max-memory-per-node={{ .JavaQueryMaxMemPerNodeGb }}GB
+#query.max-memory={{ mul .JavaQueryMaxMemPerNodeGb .NumberOfWorkers }}GB
+#memory.heap-headroom-per-node={{ .HeadroomGb }}GB


Are some of these configurations not required for Presto Java?

The automatic process generates an invalid config for Java, and the server will not start. I asked the group about this some time ago and nobody had an opinion. I guess I can set them to some nominal fixed but valid values.

Reinstated the original fixed Java configs (Coordinator and Worker) from main.

simoneves · 2025-10-21T19:01:10Z

@paul-aiyedun I recombined the CPU and GPU Coordinator config (although I left them individually mapped to the same file in the Dockerfiles)

paul-aiyedun · 2025-10-21T19:47:43Z

@paul-aiyedun I recombined the CPU and GPU Coordinator config

I believe this also applies to the worker config files (see #48 (comment)).

(although I left them individually mapped to the same file in the Dockerfiles)

What is the reason for this?

simoneves · 2025-10-21T20:00:48Z

@paul-aiyedun I recombined the CPU and GPU Coordinator config

I believe this also applies to the worker config files (see #48 (comment)).

I don't want to do it that way. That would require two passes of pbench with separate config files. Sorry.

(although I left them individually mapped to the same file in the Dockerfiles)

What is the reason for this?

Fine.

simoneves · 2025-10-21T20:05:13Z

(although I left them individually mapped to the same file in the Dockerfiles)

What is the reason for this?

Fine.

Actually there is no change here. The previous CPU and GPU Dockerfiles already mapped the file in the leaf native services.

paul-aiyedun · 2025-10-21T20:12:32Z

@paul-aiyedun I recombined the CPU and GPU Coordinator config

I believe this also applies to the worker config files (see #48 (comment)).

I don't want to do it that way. That would require two passes of pbench with separate config files. Sorry.

Unless I am missing something, this would require setting vcpu_per_worker to 2 if VARIANT_TYPE is gpu and to NPROC otherwise (

velox-testing/presto/scripts/generate_presto_config.sh

Line 49 in a78fd7a

"vcpu_per_worker": ${NPROC},

). I don't think separate config files or multiple passes are needed.

simoneves · 2025-10-22T00:12:14Z

Unless I am missing something, this would require setting vcpu_per_worker to 2 if VARIANT_TYPE is gpu and to NPROC otherwise. I don't think separate config files or multiple passes are needed.

You are not wrong. I get it now. Will do.

…onfigs

simoneves · 2025-10-22T17:00:03Z

@paul-aiyedun worker config unsplit done

This reinstates the two query optimizer flags which were removed from the config during the dev of #48. These have been reconfirmed to be needed for optimum performance on GPU only.

simoneves added 10 commits September 16, 2025 16:13

Move into template subdir

1b7b175

Convert to templates

1dded38

Add config.json example

13f8907

Use generated config

44f4141

Remove jvm.config additions

37b74d5

Add params.json

fccb6bf

add memory.heap-headroom-per-node templates

92d707e

Replace Java and Native configs with Todd's, converted to templates

cc1c2f1

Add pbench executables (Linux amd64 and arm64)

6b237cc

Add generate.sh

ab93762

simoneves requested review from Avinash-Raj, misiugodfrey, patdevinwilson, paul-aiyedun and tmostak September 17, 2025 18:25

simoneves marked this pull request as draft September 17, 2025 18:26

simoneves commented Sep 17, 2025

View reviewed changes

simoneves changed the title ~~[HERC-82] Generate dynamic Presto configs~~ Generate dynamic Presto configs with pbench Sep 17, 2025

simoneves and others added 3 commits September 17, 2025 11:50

Genericize the machine description

5df3d35

Renamed pbench binaries

c0439cd

Remove two optimizer configs, per Todd

4f32959

simoneves commented Oct 18, 2025

View reviewed changes

Tidy memory math

9b75f2b

Fix pbench output redirection

patdevinwilson approved these changes Oct 21, 2025

View reviewed changes

Recombine CPU and GPU coordinator config

d0df70d

paul-aiyedun reviewed Oct 21, 2025

View reviewed changes

simoneves added 2 commits October 21, 2025 12:42

Reinstate original fixed Java configs (with comments)

73a40d4

Remove commented-out optimizer configs

a78fd7a

simoneves requested a review from paul-aiyedun October 21, 2025 19:46

simoneves added 4 commits October 22, 2025 09:45

Set vcpu_per_worker differently on CPU and GPU

46f67ee

Unsplit worker config

e689d4c

Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…

67bd9a4

…onfigs

Enhance comment

70f4e15

paul-aiyedun approved these changes Oct 22, 2025

View reviewed changes

simoneves merged commit 76c8162 into main Oct 22, 2025

simoneves deleted the seves/HERC-82_use_pbench_for_presto_memory_configs branch October 22, 2025 18:51

This was referenced Oct 22, 2025

Simplify Velox build changes #90

Merged

Presto: Reinstate previous JVM config for Java Worker mode #105

Closed

simoneves mentioned this pull request Oct 29, 2025

Presto: More robust handling of existing config #109

Merged

Conversation

simoneves commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simoneves Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simoneves Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simoneves commented Oct 21, 2025

Uh oh!

paul-aiyedun commented Oct 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simoneves commented Oct 21, 2025

Uh oh!

paul-aiyedun commented Oct 21, 2025

Uh oh!

simoneves commented Oct 21, 2025

Uh oh!

simoneves commented Sep 17, 2025 •

edited

Loading

simoneves Sep 17, 2025 •

edited

Loading

simoneves Sep 17, 2025 •

edited

Loading