Skip to content
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
1b7b175
Move into template subdir
simoneves Sep 16, 2025
1dded38
Convert to templates
simoneves Sep 16, 2025
13f8907
Add config.json example
simoneves Sep 16, 2025
44f4141
Use generated config
simoneves Sep 16, 2025
37b74d5
Remove jvm.config additions
simoneves Sep 16, 2025
fccb6bf
Add params.json
simoneves Sep 16, 2025
92d707e
add memory.heap-headroom-per-node templates
simoneves Sep 17, 2025
cc1c2f1
Replace Java and Native configs with Todd's, converted to templates
simoneves Sep 17, 2025
6b237cc
Add pbench executables (Linux amd64 and arm64)
simoneves Sep 17, 2025
ab93762
Add generate.sh
simoneves Sep 17, 2025
5df3d35
Genericize the machine description
simoneves Sep 17, 2025
c0439cd
Renamed pbench binaries
Sep 17, 2025
4f32959
Remove two optimizer configs, per Todd
simoneves Sep 22, 2025
acd3f76
Simple automatic CPU/RAM config
simoneves Sep 22, 2025
c665b18
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Sep 24, 2025
e06e120
Add single-node-execution-enabled=true to coordinator and worker native
simoneves Sep 24, 2025
80869cc
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Sep 25, 2025
b5e4377
Generate config automatically on Presto startup
simoneves Sep 26, 2025
7cac1a4
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Sep 26, 2025
4f799fd
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Sep 27, 2025
7ae7488
Do the final popd in a trap
simoneves Sep 27, 2025
0f97b92
Split coordinator and worker config files
simoneves Sep 30, 2025
cec02b3
Split file mapping into containers
simoneves Sep 30, 2025
dc5d2e6
Remove name reference
simoneves Oct 3, 2025
6928ee4
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Oct 3, 2025
c980842
Hide memory configs for Java, they do not work, come back to this
simoneves Oct 3, 2025
697efb3
Change spill-path to /tmp
simoneves Oct 3, 2025
1cf021a
Suppress pbench genconfig output (concerns)
simoneves Oct 6, 2025
1c9b7f1
Make sure lsmem reports in GB
misiugodfrey Oct 15, 2025
d5beb9d
Merge branch 'seves/HERC-82_use_pbench_for_presto_memory_configs' of …
misiugodfrey Oct 15, 2025
fcc2e94
Don't hide any errors from pbench genconfig
simoneves Oct 16, 2025
d1df541
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Oct 16, 2025
ac2faea
Git ignore generated config files
simoneves Oct 18, 2025
993dab9
Commented properfies files
misiugodfrey Oct 17, 2025
e37f16f
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Oct 18, 2025
9b75f2b
Tidy memory math
simoneves Oct 18, 2025
d0df70d
Recombine CPU and GPU coordinator config
simoneves Oct 21, 2025
73a40d4
Reinstate original fixed Java configs (with comments)
simoneves Oct 21, 2025
a78fd7a
Remove commented-out optimizer configs
simoneves Oct 21, 2025
46f67ee
Set vcpu_per_worker differently on CPU and GPU
simoneves Oct 22, 2025
e689d4c
Unsplit worker config
simoneves Oct 22, 2025
67bd9a4
Merge branch 'main' into seves/HERC-82_use_pbench_for_presto_memory_c…
simoneves Oct 22, 2025
70f4e15
Enhance comment
simoneves Oct 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@ __pycache__/

# Default benchmark output directory
benchmark_output

# Generated Presto Config
presto/docker/config/generated/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @johallar

8 changes: 0 additions & 8 deletions presto/docker/config/etc_common/catalog/hive.properties

This file was deleted.

2 changes: 0 additions & 2 deletions presto/docker/config/etc_common/catalog/tpch.properties

This file was deleted.

1 change: 0 additions & 1 deletion presto/docker/config/etc_common/log.properties

This file was deleted.

15 changes: 0 additions & 15 deletions presto/docker/config/etc_coordinator/config_java.properties

This file was deleted.

23 changes: 0 additions & 23 deletions presto/docker/config/etc_coordinator/config_native.properties

This file was deleted.

4 changes: 0 additions & 4 deletions presto/docker/config/etc_coordinator/node.properties

This file was deleted.

3 changes: 0 additions & 3 deletions presto/docker/config/etc_worker/config_java.properties

This file was deleted.

10 changes: 0 additions & 10 deletions presto/docker/config/etc_worker/config_native.properties

This file was deleted.

4 changes: 0 additions & 4 deletions presto/docker/config/etc_worker/node.properties

This file was deleted.

15 changes: 15 additions & 0 deletions presto/docker/config/params.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"sys_reserved_mem_percent": 0.05,
"sys_reserved_mem_cap_gb": 2,
"heap_size_percent_of_container_mem": 0.9,
"headroom_percent_of_heap": 0.2,
"query_max_total_mem_per_node_percent_of_heap": 0.8,
"query_max_mem_per_node_percent_of_total": 0.9,
"proxygen_mem_per_worker_gb": 0.125,
"proxygen_mem_cap_gb": 2,
"native_buffer_mem_percent": 0.05,
"native_buffer_mem_cap_gb": 32,
"native_query_mem_percent_of_sys_mem": 0.95,
"join_max_bcast_size_percent_of_container_mem": 0.01,
"memory_push_back_start_below_limit_gb": 5
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file comes with pbench. I'm sure the heuristics could be tweaked if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we include non-templated files in the template directory?

@simoneves simoneves Oct 2, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because otherwise they don't get copied to generated.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Select the connector implementation. "hive-hadoop2" uses the Hive connector
# backed by Hadoop 2.x libraries which is the default for Presto's Hive support.
connector.name=hive-hadoop2

# Configure the metastore implementation. "file" enables a simple file-based
# metastore suitable for local testing without an external Hive Metastore (HMS).
# See https://prestodb.io/docs/current/installation/deployment.html#configuring-a-file-based-metastore for more details.
hive.metastore=file
# Root directory where the file-based metastore stores table and partition
# metadata. This path is inside the container volume so state persists across
# server restarts during tests.
hive.metastore.catalog.dir=file:/var/lib/presto/data/hive/metastore
# Allow DROP TABLE statements. Enabled to make smoke/perf tests able to reset
# state and clean up artifacts without manual intervention.
hive.allow-drop-table=true

# Control whether Presto can split files for parallel reads. Disable when the
# file compression/format isn't splittable to avoid read failures. TPCH Parquet
# test data commonly uses SNAPPY compression that isn't splittable at the file
# level here, hence this must be false.
hive.file-splittable=false

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this is basically always required. Discuss.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@devavret asked me add it. This is required.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment that explains why this configuration has to be set?

Same comment applies to all additional configuration updates.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll have to ask @tmostak and @patdevinwilson. They just told me to add them and didn't give much context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Select the built-in TPCH connector that generates synthetic datasets on the
# fly. Used for functional and performance testing without external storage.
connector.name=tpch

# Choose the column naming convention for generated tables. "STANDARD" matches
# the canonical TPC-H schema so queries from benchmarks run unmodified.
tpch.column-naming=STANDARD
Original file line number Diff line number Diff line change
@@ -1,12 +1,26 @@
# Enable JVM server mode for better JIT optimization on long-running servers.
-server
-Xmx24G
# Maximum Java heap size; templated to match container memory.
-Xmx{{ .HeapSizeGb }}G

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but we seem to basically be using pbench as a (Go) templating engine i.e. pbench is not actually generating the config keys?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! I was surprised too!

# Initial Java heap size; equal to max to avoid heap resizing pauses.
-Xms{{ .HeapSizeGb }}G

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Templated

# Use the G1 garbage collector for predictable pause times.
-XX:+UseG1GC
# Tune G1 region size to balance GC throughput and fragmentation.
-XX:G1HeapRegionSize=32M
# Abort when GC overhead becomes excessive to prevent hangs.
-XX:+UseGCOverheadLimit
# Make System.gc() invoke concurrent collections to reduce pauses.
-XX:+ExplicitGCInvokesConcurrent
# Create heap dumps on OOM for postmortem analysis.
-XX:+HeapDumpOnOutOfMemoryError
# Exit the JVM on OOM so orchestration can restart the process.
-XX:+ExitOnOutOfMemoryError
# Cap NIO direct buffer cache to limit retained off-heap memory.
-Djdk.nio.maxCachedBufferSize=2000000

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From @tmostak

# Allow self-attach for profilers (e.g., async-profiler) during debugging.
-Djdk.attach.allowAttachSelf=true
# Open JDK internals for reflection required by Presto and dependencies under Java 11+ modules.
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.ref=ALL-UNNAMED
Expand Down
4 changes: 4 additions & 0 deletions presto/docker/config/template/etc_common/log.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Root logger level for all Presto server components. INFO provides useful
# operational diagnostics while keeping logs compact for tests. Increase to
# DEBUG when deep troubleshooting is needed.
com.facebook.presto=INFO
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Java coordinator configuration. Avoid adding Presto Native-only properties
# here, as they are unsupported by the Java engine and may prevent startup.
# Run this node as the cluster coordinator; it schedules and manages queries.
coordinator=true
# Do not schedule worker tasks on the coordinator to avoid resource contention.
node-scheduler.include-coordinator=false
# Coordinator REST/HTTP port for clients and workers.
http-server.http.port=8080
# Embedded service that provides node discovery for workers.
discovery-server.enabled=true
# Address workers use to register with the discovery service.
discovery.uri=http://presto-coordinator:8080

# Min workers before query starts; keep minimal for quick tests.
query-manager.required-workers=1
# Maximum wait for required workers to join.
query-manager.required-workers-max-wait=10s

# Memory auto-configuration is not wired for Java engine in this template.
# Uncomment and set values if using Java workers/coordinator end-to-end.
#query.max-total-memory-per-node={{ .JavaQueryMaxTotalMemPerNodeGb }}GB
#query.max-total-memory={{ mul .JavaQueryMaxTotalMemPerNodeGb .NumberOfWorkers }}GB
#query.max-memory-per-node={{ .JavaQueryMaxMemPerNodeGb }}GB
#query.max-memory={{ mul .JavaQueryMaxMemPerNodeGb .NumberOfWorkers }}GB
#memory.heap-headroom-per-node={{ .HeadroomGb }}GB

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are some of these configurations not required for Presto Java?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The automatic process generates an invalid config for Java, and the server will not start. I asked the group about this some time ago and nobody had an opinion. I guess I can set them to some nominal fixed but valid values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reinstated the original fixed Java configs (Coordinator and Worker) from main.

Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Run this node as the cluster coordinator; it schedules and manages queries.
coordinator=true
# Do not schedule worker tasks on the coordinator to avoid resource contention.
node-scheduler.include-coordinator=false
# Coordinator REST/HTTP port for clients and workers.
http-server.http.port=8080
# Embedded service that provides node discovery for workers.
discovery-server.enabled=true
# Address workers use to register with the discovery service.
discovery.uri=http://presto-coordinator:8080

# Set Presto version string to match workers for compatibility in tests.
presto.version=testversion

# Keep up to 30 rolled log files to bound disk usage.
log.max-history=30
# Rotate logs at ~100MB per file for manageable artifacts.
log.max-size=104857600B
# Reserve heap headroom per node to reduce full GC and OOM risk.
memory.heap-headroom-per-node={{ .HeadroomGb }}GB

# Limit pending splits per task to avoid excessive memory usage.
node-scheduler.max-pending-splits-per-task=2000
# Cap concurrent splits per node for balanced scheduling.
node-scheduler.max-splits-per-node=2000

# Optimizer flags
#optimizer.joins-not-null-inference-strategy=USE_FUNCTION_METADATA
#optimizer.default-filter-factor-enabled=true
# Use known constraints to simplify plan and filters.
optimizer.exploit-constraints=true
# Rewrite large IN lists as joins for performance in some cases.
optimizer.in-predicates-as-inner-joins-enabled=true
# Allow partial aggregations to reduce data shuffled across stages.
optimizer.partial-aggregation-strategy=automatic
# Prefer partial aggregations when beneficial.
optimizer.prefer-partial-aggregation=true
# Default selectivity heuristic for joins when stats are missing.
optimizer.default-join-selectivity-coefficient=0.1
# Infer additional range predicates to improve filtering.
optimizer.infer-inequality-predicates=true
# Support complex equi-join patterns in the optimizer.
optimizer.handle-complex-equi-joins=true
# Add dynamic domain filters to reduce scanned data.
optimizer.generate-domain-filters=true
# Upper limit for broadcasted table size to avoid memory blowups.
# See: https://github.com/prestodb/presto/issues/22161#issuecomment-1994128619
join-max-broadcast-table-size={{ .JoinMaxBroadcastTableSizeMb }}MB

# Client request timeout to avoid hung queries.
query.client.timeout=30m
# Use phased execution policy for improved large query scheduling.
query.execution-policy=phased
# Kill queries based on total reservation on blocked nodes to recover memory.
query.low-memory-killer.policy=total-reservation-on-blocked-nodes
# Upper limit on query wall time to keep tests bounded.
query.max-execution-time=30m
# Keep metadata of up to 1000 queries for UI and debugging.
query.max-history=1000
# Memory quotas per node and cluster to protect stability.
query.max-total-memory-per-node={{ .JavaQueryMaxTotalMemPerNodeGb }}GB
query.max-total-memory={{ mul .JavaQueryMaxTotalMemPerNodeGb .NumberOfWorkers }}GB
query.max-memory-per-node={{ .JavaQueryMaxMemPerNodeGb }}GB
query.max-memory={{ mul .JavaQueryMaxMemPerNodeGb .NumberOfWorkers }}GB
# Allow deep stage DAGs required by certain benchmark queries.
query.max-stage-count=1300
# Retain query info at least this long for diagnostics.
query.min-expire-age=120.00m
# Larger scheduling batches for better throughput in benchmarks.
query.min-schedule-split-batch-size=2000
# Raise warning threshold to align with higher max stage count.
query.stage-count-warning-threshold=150
# Increase serialized plan/query length limit for complex benchmark queries.
query.max-length=2000000

# Disable dynamic filtering for deterministic benchmarking.
experimental.enable-dynamic-filtering=false
# Cap revocable memory per node to avoid overcommit.
experimental.max-revocable-memory-per-node=50GB
# Limit disk spill usage per node to bound IO and disk usage.
experimental.max-spill-per-node=50GB
# Enable repartitioning improvements for shuffle efficiency.
experimental.optimized-repartitioning=true
# Enable dereference and subfield pushdown to reduce scanned data.
experimental.pushdown-dereference-enabled=true
experimental.pushdown-subfields-enabled=true
# Cluster-wide guardrail for spill during a single query per node.
experimental.query-max-spill-per-node=50GB
# Disable reserved memory pool to simplify test behavior.
experimental.reserved-pool-enabled=false
# Stop spilling when disk usage exceeds this fraction.
experimental.spiller-max-used-space-threshold=0.7
# Directory for spill files during execution.
experimental.spiller-spill-path=/tmp


# Min workers before query starts; keep minimal for quick tests.
query-manager.required-workers=1
# Maximum wait for required workers to join.
query-manager.required-workers-max-wait=10s

# Set required configuration for Presto C++ workers as indicated in https://prestodb.io/docs/current/presto_cpp/properties.html#coordinator-properties
native-execution-enabled=true
# Disable Java-side hash generation optimizations not used by native workers.
optimizer.optimize-hash-generation=false
# Use RE2J regex engine for performance and determinism.
regex-library=RE2J
# Enable alternative function signatures for native compatibility.
use-alternative-function-signatures=true

# Optimize for queries that can run entirely on a single worker.
single-node-execution-enabled=true
Loading