feat(helm): Add Presto query engine support to the Helm chart.#2004
feat(helm): Add Presto query engine support to the Helm chart.#2004junhaoliao wants to merge 15 commits intoy-scope:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds Presto integration across CLP: new Kubernetes and Docker Compose docs, Helm values and templates for Presto coordinator and workers, extensive conditional Presto ConfigMap entries, gating of legacy query components when query_engine == "presto", and setup script updates for node labeling, ports, and worker scaling. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Helm
participant KubeAPI as Kubernetes API
participant PCoord as Presto Coordinator
participant PWorker as Presto Worker
participant CLP as CLP Components
Client->>Helm: Install chart with clpConfig.presto
Helm->>KubeAPI: Create ConfigMaps, Deployments, Services
KubeAPI->>PCoord: Start coordinator pod (port 8889)
KubeAPI->>PWorker: Start worker pods
PWorker->>PCoord: Register with coordinator
Client->>PCoord: Submit query (NodePort 8889)
PCoord->>PWorker: Distribute query tasks
PWorker->>CLP: Read catalog / split-filter
PWorker-->>PCoord: Return results
PCoord-->>Client: Return aggregated results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/src/user-docs/guides-k8s-deployment.md`:
- Around line 359-362: Update the note text in the docs to correct the list of
components replaced when `query_engine` = "presto": remove "results cache" from
the excluded list and add "api-server"; reference the Helm gating logic when
editing the sentence so it reads that Presto replaces the "query scheduler,
query workers, reducers, and api-server" (since `query_scheduler`,
`query_workers`, and `reducers` are gated by `{{- if ne
.Values.clpConfig.package.query_engine \"presto\" }}` and the `api-server`
deployment is gated by `{{- if and .Values.clpConfig.api_server (ne
.Values.clpConfig.package.query_engine \"presto\") }}`), while `results-cache-*`
manifests are not conditionally removed.
- Around line 311-315: The prestoCoordinator and prestoWorker image definitions
currently use mutable tag values (tag: "dev"); update these to pinned/stable
release tags (e.g., tag: "vX.Y.Z") for both prestoCoordinator and prestoWorker
so deployments pull a fixed image; locate the tag fields under the
prestoCoordinator and prestoWorker blocks and replace "dev" with the chosen
semantic version string (or a chart value reference like {{
.Values.image.prestoCoordinator.tag }} and {{ .Values.image.prestoWorker.tag }}
if you prefer templating) to ensure reproducible deployments.
In `@docs/src/user-docs/guides-using-presto.md`:
- Around line 66-70: The docs show only a helm install command which will fail
when CLP is already deployed; update the step under the Presto install to
include the alternative helm upgrade invocation so users can perform an upgrade
when a release exists—specifically add guidance to run `helm upgrade clp clp/clp
DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml` as the alternative to `helm
install` (referencing the existing `helm install clp clp/clp
DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml` and the `presto-values.yaml`
values file).
In `@tools/deployment/package-helm/templates/configmap.yaml`:
- Around line 369-378: The JVM Xmx in presto-coordinator-config-jvm.config is
hardcoded as "-Xmx4G"; change it to render a Helm value so operators can
configure it: replace the literal "-Xmx4G" with a template that reads
.Values.clpConfig.presto.coordinator.jvm_max_heap (with a sensible default like
"4G"), ensure consumers pass the unit (e.g., "4G") or document it, and add the
new key plus a descriptive default and comment to values.yaml so the chart is
configurable and documented.
- Around line 347-348: The ConfigMap is embedding sensitive values
(clp.metadata-db-user, clp.metadata-db-password, clp.s3-access-key-id,
clp.s3-secret-access-key); remove these keys from templates/configmap.yaml and
instead create a Helm Secret template that stores those values (e.g.,
templates/secret.yaml with .Values.credentials.database and
.Values.credentials.s3 entries), then update consumers to read them from the
Secret (use env valueFrom.secretKeyRef, secret projection into a volume, or
template lookup/include to inject non-plain text values) so credentials are
stored as Kubernetes Secrets rather than literal ConfigMap entries.
- Line 397: The ConfigMap currently hardcodes AWS hostnames for the Presto S3
path via the clp.s3-end-point entry using {{ .s3_config.bucket }} and {{
.s3_config.region_code }}, which breaks S3-compatible stores; change the
template to prefer the configured endpoint_url on .s3_config (e.g.,
archive_output.storage.s3_config.endpoint_url / .s3_config.endpoint_url) when
present and fall back to the existing AWS-format URL only if endpoint_url is
empty, so Presto uses the explicit endpoint for MinIO/other S3-compatible
services.
In `@tools/deployment/package-helm/templates/presto-coordinator-deployment.yaml`:
- Around line 33-63: The presto-coordinator container lacks Kubernetes resource
requests/limits; add a configurable resources block for the container named
"presto-coordinator" (e.g., read values from a new or existing Helm key such as
.Values.prestoCoordinator.resources or .Values.resources.prestoCoordinator) and
populate both requests and limits for cpu and memory so they can be tuned
per-environment; update the chart values.yaml to include sensible defaults and
reference those values in the presto-coordinator deployment template.
In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml`:
- Around line 62-93: Add Kubernetes resource requests and limits to the
presto-worker container (container name "presto-worker") so CPU/memory are
enforced; update the deployment template to include a resources: block under
that container using Helm values (e.g. reference values like
.Values.resources.prestoWorker.requests.cpu,
.Values.resources.prestoWorker.requests.memory,
.Values.resources.prestoWorker.limits.cpu,
.Values.resources.prestoWorker.limits.memory or similar keys used elsewhere) and
ensure the memory request/limit aligns reasonably with the Presto config
(system_memory_gb/query_memory_gb) to avoid OOMKills; keep the block next to
ports/volumeMounts in the "presto-worker" container spec.
- Around line 38-55: The startup script that sets DISCOVERY_URI, calls wget into
VERSION and extracts PRESTO_VERSION can produce an empty PRESTO_VERSION (e.g.,
coordinator not ready) and currently would write a blank presto.version; update
the boot script (the block using set -e, DISCOVERY_URI, VERSION, PRESTO_VERSION)
to validate the coordinator response: implement a short retry loop with a
timeout that retries wget on ${DISCOVERY_URI}/v1/info a few times (or at least
reject empty output), extract PRESTO_VERSION only after non-empty VERSION, and
if PRESTO_VERSION is still empty log an error and exit non-zero (do not append a
blank presto.version); ensure the script does not proceed to echo
"presto.version=${PRESTO_VERSION}" unless PRESTO_VERSION is non-empty.
- Around line 69-86: The volume mount ordering for the Presto worker container
is subtle: the emptyDir mount named "presto-etc" at /opt/presto-server/etc must
be mounted before the child mounts ("presto-catalog" at
/opt/presto-server/etc/catalog and "presto-config" via subPath at
/opt/presto-server/etc/velox.properties) so init-container writes to the
emptyDir (config.properties, node.properties) remain visible; add a clear inline
comment near the volumeMounts block explaining this dependency and that
Kubernetes mounts parents before children (so do not reorder or replace
"presto-etc" with a deeper mount), referencing the mount names presto-etc,
presto-catalog, presto-config and the target path /opt/presto-server/etc.
- Around line 33-55: The init container "setup-configs" is using an unpinned
busybox image and a brittle sed-based JSON parse for PRESTO_VERSION; change the
image (in the "setup-configs" container spec) from busybox:latest to a pinned
image that includes jq (or make the tag configurable via values.yaml), and
replace the sed pipeline that sets PRESTO_VERSION (the commands that assign
VERSION and PRESTO_VERSION) to use a robust JSON parser like jq (e.g., fetch
DISCOVERY_URI "/v1/info" and pipe to jq -r .version) and add a simple check to
fail or default if PRESTO_VERSION is empty so the echo "presto.version=..." step
is reliable.
In `@tools/deployment/package-helm/values.yaml`:
- Around line 20-27: The Helm values set mutable dev tags for Presto images:
update prestoCoordinator.tag and prestoWorker.tag to use a stable release tag
convention (e.g., match clpPackage.tag which uses "main" or a semantic version)
so defaults are deterministic; modify the values for prestoCoordinator and
prestoWorker to a stable tag string and document in values.yaml comment that
operators should override tag for development workflows, ensuring consistency
across clpPackage, prestoCoordinator, and prestoWorker.
| :::{note} | ||
| When `query_engine` is set to `"presto"`, the chart deploys a Presto coordinator and Presto | ||
| worker(s) instead of the query scheduler, query workers, reducers, and results cache. | ||
| ::: |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for conditional logic around results_cache, api-server, and other components
# in the Helm templates to determine what's excluded when query_engine="presto"
echo "=== Searching for results_cache conditional logic ==="
rg -n 'results.cache|resultsCache' tools/deployment/package-helm/templates/ \
-A 5 -B 5 --type yaml
echo -e "\n=== Searching for api-server conditional logic ==="
rg -n 'api.server|apiServer|api_server' tools/deployment/package-helm/templates/ \
-A 5 -B 5 --type yaml
echo -e "\n=== Searching for query_engine conditional logic ==="
rg -n 'query_engine|queryEngine' tools/deployment/package-helm/templates/ \
-A 5 -B 5 --type yamlRepository: y-scope/clp
Length of output: 50367
Note is inaccurate: results_cache should not be listed as excluded, and api-server is missing.
The note states Presto replaces "the query scheduler, query workers, reducers, and results cache", but:
- The Helm templates show no conditional gating on
query_enginefor anyresults-cache-*manifests (service, statefulset, indices-creator job), confirming it remains deployed when Presto is enabled — only with retention disabled. - The api-server deployment is actually gated by
{{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }}, so it is excluded and should be mentioned. - The query scheduler, query workers, and reducers are all gated by
{{- if ne .Values.clpConfig.package.query_engine "presto" }}, confirming they are excluded.
Update the note to list: query scheduler, query workers, reducers, and api-server—not results cache.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/user-docs/guides-k8s-deployment.md` around lines 359 - 362, Update
the note text in the docs to correct the list of components replaced when
`query_engine` = "presto": remove "results cache" from the excluded list and add
"api-server"; reference the Helm gating logic when editing the sentence so it
reads that Presto replaces the "query scheduler, query workers, reducers, and
api-server" (since `query_scheduler`, `query_workers`, and `reducers` are gated
by `{{- if ne .Values.clpConfig.package.query_engine \"presto\" }}` and the
`api-server` deployment is gated by `{{- if and .Values.clpConfig.api_server (ne
.Values.clpConfig.package.query_engine \"presto\") }}`), while `results-cache-*`
manifests are not conditionally removed.
| 2. Install (or upgrade) the Helm chart with the Presto values: | ||
|
|
||
| ```bash | ||
| helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml | ||
| ``` |
There was a problem hiding this comment.
helm install will fail for users with an existing CLP deployment.
The requirement at line 33 explicitly states the user already has "a running CLP Kubernetes deployment," yet step 2 only shows helm install, which would error out with release already exists. The parenthetical "(or upgrade)" acknowledges the upgrade path but never shows the corresponding command.
💡 Suggested fix
-2. Install (or upgrade) the Helm chart with the Presto values:
+2. If you haven't installed CLP yet, install it with Presto enabled:
```bash
helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
```
+
+ If CLP is already installed, upgrade the release instead:
+
+ ```bash
+ helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml
+ ```📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 2. Install (or upgrade) the Helm chart with the Presto values: | |
| ```bash | |
| helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f presto-values.yaml | |
| ``` | |
| 2. If you haven't installed CLP yet, install it with Presto enabled: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/user-docs/guides-using-presto.md` around lines 66 - 70, The docs
show only a helm install command which will fail when CLP is already deployed;
update the step under the Presto install to include the alternative helm upgrade
invocation so users can perform an upgrade when a release exists—specifically
add guidance to run `helm upgrade clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f
presto-values.yaml` as the alternative to `helm install` (referencing the
existing `helm install clp clp/clp DOCS_VAR_HELM_VERSION_FLAG -f
presto-values.yaml` and the `presto-values.yaml` values file).
| clp.metadata-db-user={{ $.Values.credentials.database.username }} | ||
| clp.metadata-db-password={{ $.Values.credentials.database.password }} |
There was a problem hiding this comment.
Credentials stored in plaintext in a ConfigMap — use a Secret instead.
clp.metadata-db-password (and clp.metadata-db-user) are written directly into the ConfigMap. ConfigMaps are stored unencrypted in etcd by default, are readable by any principal with get/list on ConfigMaps in the namespace, and are fully visible in kubectl get configmap -o yaml. Kubernetes Secrets are the appropriate resource for sensitive values: they can be encrypted at rest, carry separate RBAC controls, and are omitted from audit logs by default.
The same concern applies to the S3 credentials at lines 396/398 (clp.s3-access-key-id, clp.s3-secret-access-key).
Consider rendering these files from a Kubernetes Secret (or multiple projected volume mounts) rather than embedding credentials in the ConfigMap.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 347 -
348, The ConfigMap is embedding sensitive values (clp.metadata-db-user,
clp.metadata-db-password, clp.s3-access-key-id, clp.s3-secret-access-key);
remove these keys from templates/configmap.yaml and instead create a Helm Secret
template that stores those values (e.g., templates/secret.yaml with
.Values.credentials.database and .Values.credentials.s3 entries), then update
consumers to read them from the Secret (use env valueFrom.secretKeyRef, secret
projection into a volume, or template lookup/include to inject non-plain text
values) so credentials are stored as Kubernetes Secrets rather than literal
ConfigMap entries.
| presto-coordinator-config-jvm.config: | | ||
| -server | ||
| -Xmx4G | ||
| -XX:+UseG1GC | ||
| -XX:G1HeapRegionSize=32M | ||
| -XX:+UseGCOverheadLimit | ||
| -XX:+ExplicitGCInvokesConcurrent | ||
| -XX:+HeapDumpOnOutOfMemoryError | ||
| -XX:+ExitOnOutOfMemoryError | ||
| -Djdk.attach.allowAttachSelf=true |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Hardcoded -Xmx4G should be a configurable value.
The coordinator JVM heap is fixed at 4 GB regardless of the resource limits/requests set on the coordinator pod. If the pod is scheduled with less memory, it will OOM-kill; if scheduled with more, the heap cannot grow to use it. Expose this as a value (e.g., .Values.clpConfig.presto.coordinator.jvm_max_heap) so operators can tune it alongside query.max-memory and query.max-memory-per-node.
♻️ Suggested approach
presto-coordinator-config-jvm.config: |
-server
- -Xmx4G
+ -Xmx{{ .coordinator.jvm_max_heap_gb }}G
-XX:+UseG1GC🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 369 -
378, The JVM Xmx in presto-coordinator-config-jvm.config is hardcoded as
"-Xmx4G"; change it to render a Helm value so operators can configure it:
replace the literal "-Xmx4G" with a template that reads
.Values.clpConfig.presto.coordinator.jvm_max_heap (with a sensible default like
"4G"), ensure consumers pass the unit (e.g., "4G") or document it, and add the
new key plus a descriptive default and comment to values.yaml so the chart is
configurable and documented.
| - name: "setup-configs" | ||
| image: "busybox:latest" | ||
| command: | ||
| - "/bin/sh" | ||
| - "-c" | ||
| - | | ||
| set -e | ||
| cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties | ||
| cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties | ||
|
|
||
| # Query coordinator for version and add it to config | ||
| DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080" | ||
| VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info") | ||
| PRESTO_VERSION=$(echo "$VERSION" \ | ||
| | sed 's/.*"version":"//' \ | ||
| | sed 's/".*//') | ||
| echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties | ||
|
|
||
| # Set node identity from hostname | ||
| HOSTNAME=$(hostname) | ||
| HOST_IP=$(hostname -i) | ||
| echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties | ||
| echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties |
There was a problem hiding this comment.
Fragile JSON parsing with sed and unpinned busybox:latest image.
Two concerns in this init container:
-
busybox:latest— Using thelatesttag hurts reproducibility and can introduce unexpected breakages. Pin to a specific version (e.g.,busybox:1.37), or make it configurable via values. -
JSON parsing via
sed(Lines 46–48) — Extractingversionfrom a JSON response usingsed 's/.*"version":"//' | sed 's/".*//'is brittle. If the JSON structure changes (e.g., field ordering, nestedversionkeys, or whitespace variations), parsing will silently produce incorrect results. Sincebusyboxdoesn't includejq, consider switching to an image that hasjq, or using a more robustawk/greppattern.
Proposed fix for image tag pinning and safer parsing
- name: "setup-configs"
- image: "busybox:latest"
+ image: "busybox:1.37"
command:
- "/bin/sh"
- "-c"
- |
set -e
cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties
cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties
# Query coordinator for version and add it to config
DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080"
VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info")
- PRESTO_VERSION=$(echo "$VERSION" \
- | sed 's/.*"version":"//' \
- | sed 's/".*//')
+ PRESTO_VERSION=$(echo "$VERSION" \
+ | grep -o '"version":"[^"]*"' \
+ | head -1 \
+ | sed 's/"version":"//;s/"//')
echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around
lines 33 - 55, The init container "setup-configs" is using an unpinned busybox
image and a brittle sed-based JSON parse for PRESTO_VERSION; change the image
(in the "setup-configs" container spec) from busybox:latest to a pinned image
that includes jq (or make the tag configurable via values.yaml), and replace the
sed pipeline that sets PRESTO_VERSION (the commands that assign VERSION and
PRESTO_VERSION) to use a robust JSON parser like jq (e.g., fetch DISCOVERY_URI
"/v1/info" and pipe to jq -r .version) and add a simple check to fail or default
if PRESTO_VERSION is empty so the echo "presto.version=..." step is reliable.
| - | | ||
| set -e | ||
| cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties | ||
| cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties | ||
|
|
||
| # Query coordinator for version and add it to config | ||
| DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080" | ||
| VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info") | ||
| PRESTO_VERSION=$(echo "$VERSION" \ | ||
| | sed 's/.*"version":"//' \ | ||
| | sed 's/".*//') | ||
| echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties | ||
|
|
||
| # Set node identity from hostname | ||
| HOSTNAME=$(hostname) | ||
| HOST_IP=$(hostname -i) | ||
| echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties | ||
| echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties |
There was a problem hiding this comment.
No error handling if the coordinator version query fails.
If wget on Line 45 fails (e.g., coordinator not yet ready, network issue), $VERSION will be empty and $PRESTO_VERSION will be an empty string. This would write presto.version= to the config, likely causing the worker to fail with a confusing error. While set -e is present, wget returning an empty body with HTTP 200 would not trigger it.
Consider adding a validation check:
Proposed validation
VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info")
PRESTO_VERSION=$(echo "$VERSION" \
| sed 's/.*"version":"//' \
| sed 's/".*//')
+ if [ -z "$PRESTO_VERSION" ]; then
+ echo "ERROR: Failed to retrieve Presto version from coordinator" >&2
+ exit 1
+ fi
echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - | | |
| set -e | |
| cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties | |
| cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties | |
| # Query coordinator for version and add it to config | |
| DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080" | |
| VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info") | |
| PRESTO_VERSION=$(echo "$VERSION" \ | |
| | sed 's/.*"version":"//' \ | |
| | sed 's/".*//') | |
| echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties | |
| # Set node identity from hostname | |
| HOSTNAME=$(hostname) | |
| HOST_IP=$(hostname -i) | |
| echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties | |
| echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties | |
| - | | |
| set -e | |
| cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties | |
| cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties | |
| # Query coordinator for version and add it to config | |
| DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080" | |
| VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info") | |
| PRESTO_VERSION=$(echo "$VERSION" \ | |
| | sed 's/.*"version":"//' \ | |
| | sed 's/".*//') | |
| if [ -z "$PRESTO_VERSION" ]; then | |
| echo "ERROR: Failed to retrieve Presto version from coordinator" >&2 | |
| exit 1 | |
| fi | |
| echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties | |
| # Set node identity from hostname | |
| HOSTNAME=$(hostname) | |
| HOST_IP=$(hostname -i) | |
| echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties | |
| echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around
lines 38 - 55, The startup script that sets DISCOVERY_URI, calls wget into
VERSION and extracts PRESTO_VERSION can produce an empty PRESTO_VERSION (e.g.,
coordinator not ready) and currently would write a blank presto.version; update
the boot script (the block using set -e, DISCOVERY_URI, VERSION, PRESTO_VERSION)
to validate the coordinator response: implement a short retry loop with a
timeout that retries wget on ${DISCOVERY_URI}/v1/info a few times (or at least
reject empty output), extract PRESTO_VERSION only after non-empty VERSION, and
if PRESTO_VERSION is still empty log an error and exit non-zero (do not append a
blank presto.version); ensure the script does not proceed to echo
"presto.version=${PRESTO_VERSION}" unless PRESTO_VERSION is non-empty.
| containers: | ||
| - name: "presto-worker" | ||
| image: "{{ .Values.image.prestoWorker.repository }}:{{ .Values.image.prestoWorker.tag }}" | ||
| imagePullPolicy: "{{ .Values.image.prestoWorker.pullPolicy }}" | ||
| ports: | ||
| - name: "presto-worker" | ||
| containerPort: 8080 | ||
| volumeMounts: | ||
| - name: "presto-catalog" | ||
| mountPath: "/opt/presto-server/etc/catalog" | ||
| readOnly: true | ||
| - name: "presto-config" | ||
| mountPath: "/opt/presto-server/etc/velox.properties" | ||
| subPath: "presto-worker-config-velox.properties" | ||
| readOnly: true | ||
| - name: "presto-etc" | ||
| mountPath: "/opt/presto-server/etc" | ||
| {{- if eq .Values.clpConfig.archive_output.storage.type "fs" }} | ||
| - name: {{ include "clp.volumeName" (dict | ||
| "component_category" "shared-data" | ||
| "name" "archives" | ||
| ) | quote }} | ||
| mountPath: "/var/data/archives" | ||
| readOnly: true | ||
| {{- end }} | ||
| readinessProbe: | ||
| {{- include "clp.readinessProbeTimings" . | nindent 12 }} | ||
| tcpSocket: &presto-worker-health-check | ||
| port: "presto-worker" | ||
| livenessProbe: | ||
| {{- include "clp.livenessProbeTimings" . | nindent 12 }} | ||
| tcpSocket: *presto-worker-health-check |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
No resources (requests/limits) defined on the worker container.
Same concern as the coordinator: the worker container has no CPU/memory resource requests or limits. This is especially important for Presto workers, which can be memory-intensive. The system_memory_gb and query_memory_gb values are set in the Presto config but not enforced at the Kubernetes level, which could lead to OOMKills or overcommitment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around
lines 62 - 93, Add Kubernetes resource requests and limits to the presto-worker
container (container name "presto-worker") so CPU/memory are enforced; update
the deployment template to include a resources: block under that container using
Helm values (e.g. reference values like
.Values.resources.prestoWorker.requests.cpu,
.Values.resources.prestoWorker.requests.memory,
.Values.resources.prestoWorker.limits.cpu,
.Values.resources.prestoWorker.limits.memory or similar keys used elsewhere) and
ensure the memory request/limit aligns reasonably with the Presto config
(system_memory_gb/query_memory_gb) to avoid OOMKills; keep the block next to
ports/volumeMounts in the "presto-worker" container spec.
| volumeMounts: | ||
| - name: "presto-catalog" | ||
| mountPath: "/opt/presto-server/etc/catalog" | ||
| readOnly: true | ||
| - name: "presto-config" | ||
| mountPath: "/opt/presto-server/etc/velox.properties" | ||
| subPath: "presto-worker-config-velox.properties" | ||
| readOnly: true | ||
| - name: "presto-etc" | ||
| mountPath: "/opt/presto-server/etc" | ||
| {{- if eq .Values.clpConfig.archive_output.storage.type "fs" }} | ||
| - name: {{ include "clp.volumeName" (dict | ||
| "component_category" "shared-data" | ||
| "name" "archives" | ||
| ) | quote }} | ||
| mountPath: "/var/data/archives" | ||
| readOnly: true | ||
| {{- end }} |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Volume mount layering at /opt/presto-server/etc — verify correctness.
The main container mounts:
presto-catalog(configMap) at/opt/presto-server/etc/catalogpresto-config(configMap, subPath) at/opt/presto-server/etc/velox.propertiespresto-etc(emptyDir) at/opt/presto-server/etc
Kubernetes sorts mounts by path length (parent before child), so the emptyDir is mounted first, then the catalog and velox.properties are overlaid. The init container's writes (config.properties, node.properties) live in the emptyDir and should remain visible. This layering should work, but it's subtle and easy to break during future refactors.
Consider adding a comment in the template explaining the mount ordering dependency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around
lines 69 - 86, The volume mount ordering for the Presto worker container is
subtle: the emptyDir mount named "presto-etc" at /opt/presto-server/etc must be
mounted before the child mounts ("presto-catalog" at
/opt/presto-server/etc/catalog and "presto-config" via subPath at
/opt/presto-server/etc/velox.properties) so init-container writes to the
emptyDir (config.properties, node.properties) remain visible; add a clear inline
comment near the volumeMounts block explaining this dependency and that
Kubernetes mounts parents before children (so do not reorder or replace
"presto-etc" with a deeper mount), referencing the mount names presto-etc,
presto-catalog, presto-config and the target path /opt/presto-server/etc.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/deployment/package-helm/Chart.yaml`:
- Line 3: The Chart.yaml pre-release version was mistakenly advanced to
"0.1.4-dev.4", skipping "0.1.4-dev.3"; update the version field in Chart.yaml
from "0.1.4-dev.4" to "0.1.4-dev.3" so the pre-release sequence remains
contiguous (look for the existing version: "0.1.4-dev.4" and replace it with
"0.1.4-dev.3").
| apiVersion: "v2" | ||
| name: "clp" | ||
| version: "0.1.4-dev.2" | ||
| version: "0.1.4-dev.4" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether 0.1.4-dev.3 is used in any branch or tag
git grep "0.1.4-dev.3" $(git for-each-ref --format='%(refname)' refs/heads refs/remotes refs/tags) -- "tools/deployment/package-helm/Chart.yaml" 2>/dev/null || echo "Not found on any ref"Repository: y-scope/clp
Length of output: 75
Change version to 0.1.4-dev.3 to maintain sequential pre-release versioning.
The version skips from 0.1.4-dev.2 directly to 0.1.4-dev.4, but 0.1.4-dev.3 does not exist on any branch or tag in the repository. This skip appears unintentional and breaks the contiguous pre-release sequence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/Chart.yaml` at line 3, The Chart.yaml
pre-release version was mistakenly advanced to "0.1.4-dev.4", skipping
"0.1.4-dev.3"; update the version field in Chart.yaml from "0.1.4-dev.4" to
"0.1.4-dev.3" so the pre-release sequence remains contiguous (look for the
existing version: "0.1.4-dev.4" and replace it with "0.1.4-dev.3").
| @@ -1,4 +1,4 @@ | |||
| {{- if .Values.clpConfig.api_server }} | |||
| {{- if and .Values.clpConfig.api_server (ne .Values.clpConfig.package.query_engine "presto") }} | |||
There was a problem hiding this comment.
why do we disable api server? Shouldn't we also disable query scheduler and worker if clp search is not needed?
There was a problem hiding this comment.
currently, the API server does not work with Presto.
we do have this specified in tools/deployment/package-helm/templates/query-scheduler-deployment.yaml as well
{{- if ne .Values.clpConfig.package.query_engine "presto" }}
though, given some users may want to use both the API Server the Presto, i think we may discard .Values.clpConfig.package.query_engine values and simply check whether query_scheduler / query_worker / API is null before launching the services. what do you think?
cc @kirkrodrigues for discussion since this is interface related
| @@ -17,6 +17,14 @@ image: | |||
| repository: "ghcr.io/y-scope/clp/clp-package" | |||
There was a problem hiding this comment.
I'm getting Presto internal error, possibly due to the recent timestamp changes f82e611. Can you reproduce the error?
There was a problem hiding this comment.
🤦 right. i can reproduce once i merge main into the current branch.
@coderabbitai note #1788 is a BREAKING change that broke CLP Presto, as in the CLP Presto Connector cannot handle the new Timestamp column type at the moment. submit an issue to track
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Get details about PR `#1788` to understand the breaking change
gh pr view 1788 --json title,url,mergedAt,body --repo y-scope/clp
# Create an issue to track the breaking change
gh issue create \
--repo y-scope/clp \
--title "Presto Connector cannot handle new Timestamp column type from PR `#1788`" \
--body "## Issue Description
The CLP Presto Connector currently cannot handle the new Timestamp column type introduced in PR `#1788`, causing internal errors when querying with Presto.
## Context
- **Breaking PR**: `#1788` (commit f82e6114160a6addd4727259906bcf621ac9912c)
- **Affected Component**: CLP Presto Connector
- **Impact**: Presto queries fail with internal errors
The issue was discovered during testing of PR `#2004` (Presto Helm chart support). The error can be reproduced after merging main into the Presto feature branch.
## References
- PR `#1788`: https://github.com/y-scope/clp/pull/1788
- PR `#2004`: https://github.com/y-scope/clp/pull/2004
- Comment: https://github.com/y-scope/clp/pull/2004#discussion_r2821243815
Reported by: `@junhaoliao`" \
--assignee junhaoliao
Length of output: 5337
@junhaoliao, I've created issue #2008 to track this breaking change. The issue documents that PR #1788 (which introduced the new Timestamp column type to replace DateString and bumped the archive version to 0.5.0) broke the CLP Presto Connector's ability to handle the new column type. The issue has been assigned to you and includes references to both PRs and the relevant commit.
🐰✨
/_/
( o.o )
^ <
"Issue tracked!"
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1818
File: tools/deployment/package-helm/templates/configmap.yaml:12-12
Timestamp: 2025-12-19T05:03:42.629Z
Learning: In the y-scope/clp Helm chart, the `api_server.host` configuration field in the ConfigMap (tools/deployment/package-helm/templates/configmap.yaml) is unused because no other k8s internal services need to reach the API server—it's only accessed from outside the cluster via NodePort.
If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!
| - "/bin/sh" | ||
| - "-c" | ||
| - | | ||
| set -e | ||
| cp /etc/presto-config/presto-worker-config-config.properties /opt/presto-server/etc/config.properties | ||
| cp /etc/presto-config/presto-worker-config-node.properties /opt/presto-server/etc/node.properties | ||
|
|
||
| # Query coordinator for version and add it to config | ||
| DISCOVERY_URI="http://{{ include "clp.fullname" . }}-presto-coordinator:8080" | ||
| VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info") | ||
| PRESTO_VERSION=$(echo "$VERSION" \ | ||
| | sed 's/.*"version":"//' \ | ||
| | sed 's/".*//') | ||
| echo "presto.version=${PRESTO_VERSION}" >> /opt/presto-server/etc/config.properties | ||
|
|
||
| # Set node identity from hostname | ||
| HOSTNAME=$(hostname) | ||
| HOST_IP=$(hostname -i) | ||
| echo "node.internal-address=${HOST_IP}" >> /opt/presto-server/etc/node.properties | ||
| echo "node.id=${HOSTNAME}" >> /opt/presto-server/etc/node.properties |
There was a problem hiding this comment.
This script looks scary. Does it mimic the functionality of tools/deployment/presto-clp/worker/scripts/generate-configs.sh? I guess we don't use the script directly because of missing dependencies?
There was a problem hiding this comment.
we can't reuse generate-configs.sh directly because it also does apt-get, env-var substitution, and catalog management, all of which are unnecessary in Helm since configmap/volume mounts handle that
extracting the shared bits is possible but the indirection cost (.Files.Get, symlinks into the chart dir) outweighs the benefit for such a small surface
- Replace `query_engine` conditions with modular flags in Helm templates. - Refactor Presto configurations to use dedicated flags and scripts. - Enhance worker and reducer node scheduling logic. - Update documentation to reflect Presto integration and better configuration flexibility.
|
TODO: address #1957 since we're updating the Presto setup docs in this PR. |
Resolve conflicts:
- Chart.yaml: Accept main's version (0.2.1-dev.0)
- values.yaml: Keep prestoWorker section, drop storage.storageClassName
(host-path PVs removed on main)
- set-up-multi-{dedicated,shared}-test.sh: Combine Presto node support with
main's get_image_helm_args
- guides-k8s-deployment.md: Keep Presto link references
- Accept deletion of all host-path PV/PVC files (query-scheduler-logs,
query-worker-logs, query-worker-staged-streams, reducer-logs)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (2)
docs/src/user-docs/guides-using-presto.md (1)
33-36:⚠️ Potential issue | 🟡 MinorThe Helm flow still contradicts itself for existing releases.
This section says the Kubernetes path requires a running CLP deployment, but the only command shown here is
helm install, which will fail once that release already exists. Either drop the prerequisite for fresh installs, or show the matchinghelm upgradecommand beside this step.Also applies to: 83-87
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/src/user-docs/guides-using-presto.md` around lines 33 - 36, The "Requirements" section currently states a prerequisite of "A running CLP Kubernetes deployment" but the docs only show a `helm install` command which fails for existing releases; update the documentation under the "Requirements" heading (and the similar block at lines referenced for the second occurrence) to either remove the running-deployment prerequisite for fresh installs or, preferably, replace/augment the shown `helm install` with the idempotent Helm command and an explicit upgrade example — e.g., show `helm upgrade --install <release> <chart> ...` (or show both `helm install ...` and the corresponding `helm upgrade` example) next to the existing `helm install` so the guide works for both fresh and existing releases.tools/deployment/package-helm/templates/presto-worker-deployment.yaml (1)
33-35:⚠️ Potential issue | 🟠 MajorPin the setup image and stop relying on
:latestfor the shell toolchain.
setup-configsno longer useskubectl, but it does depend on whateverbitnami/kubectl:latesthappens to ship forsh,cp,wget, andjq. That makes worker start-up non-reproducible and easy to break on upstream image changes. Please move this image/tag into values and pin it to a known build that includes the exact tools the script needs.Does the `bitnami/kubectl` container image include `jq` and `wget`, and which pinned tags are currently published so Helm charts can avoid using `:latest`?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml` around lines 33 - 35, Replace the hardcoded image "bitnami/kubectl:latest" for the "setup-configs" container with a chart value (e.g. .Values.setupConfig.image and .Values.setupConfig.tag or a single .Values.setupConfig.imageWithTag) and set the values.yaml default to a pinned tag/digest that you have verified contains /bin/sh, cp, wget and jq; keep the command as ["/bin/sh", "/scripts/presto-worker-setup-configs.sh"]. Update chart README/values description to note the image must include jq and wget and add a comment reminding future updaters to pin to a specific tag or digest rather than :latest. Ensure the template uses the new values symbol names so upgrades can change the image via values.yaml.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/src/user-docs/guides-k8s-deployment.md`:
- Line 664: Remove the unused markdown link reference "[logging-infra-issue]"
from the document: locate the link definition line that reads
"[logging-infra-issue]: https://github.com/y-scope/clp/issues/1760" and delete
it (and verify there are no remaining usages of the identifier in the file such
as "[logging-infra-issue]" inline links); this resolves the MD053 markdownlint
warning.
In `@tools/deployment/package-helm/set-up-multi-dedicated-test.sh`:
- Around line 13-17: NUM_PRESTO_NODES is defaulting to 2 and is counted in
total_workers even when Presto mode is not enabled, leaving unused Presto nodes;
change the default to 0 (NUM_PRESTO_NODES="${NUM_PRESTO_NODES:-0}") or,
alternatively, when NUM_PRESTO_NODES or PRESTO_WORKER_REPLICAS is set >0 switch
the Helm install/values into Presto mode so pods can be scheduled; update the
same defaults/logic for the other occurrences (lines referenced around 37-38 and
58-78) and ensure total_workers calculation and kind node pool labeling only
include Presto nodes when Presto mode is enabled.
In `@tools/deployment/package-helm/set-up-multi-shared-test.sh`:
- Line 15: The PRESTO_WORKER_REPLICAS variable is defined but never applied to
the Helm values, so setting it only changes output; update
set-up-multi-shared-test.sh to pass Presto-specific values when
PRESTO_WORKER_REPLICAS is set by explicitly setting
clpConfig.package.query_engine=presto and ensuring clpConfig.presto is non-null
and prestoWorker.replicas uses PRESTO_WORKER_REPLICAS in the Helm
install/upgrade values block (the install command that currently relies on chart
defaults). Locate references to PRESTO_WORKER_REPLICAS, prestoWorker.replicas,
clpConfig.package.query_engine, and clpConfig.presto in the script and add the
corresponding Helm --set entries (or values YAML injection) so the chart
actually deploys Presto with the requested replica count.
In `@tools/deployment/package-helm/templates/configmap.yaml`:
- Around line 129-130: The configmix allows mismatched flags causing
split-brain: introduce a single predicate variable (e.g., $prestoEnabled := eq
.Values.clpConfig.package.query_engine "presto") at the top of the Helm template
and use that variable everywhere that currently mixes
.Values.clpConfig.package.query_engine, .Values.clpConfig.query_*, and
.Values.clpConfig.presto; replace the existing conditionals in the CLP-S legacy
blocks (the sections referencing ClpQueryEngine, PrestoHost, presto/PrestoHost)
and the Presto blocks with checks against $prestoEnabled so both the configmap
entries and the Presto deployment templates are gated by the same boolean.
Ensure every place noted in the review (around package.query_engine and the
blocks spanning lines ~132-158, ~227-233, ~326-334, and the single-line block
~337) uses that single $prestoEnabled predicate.
- Around line 388-395: The S3 block under the with
$.Values.clpConfig.archive_output.storage blindly dereferences
.s3_config.aws_authentication.credentials.{access_key_id,secret_access_key}
which breaks for non-static auth shapes; update the template around the eq .type
"s3" branch to validate that .s3_config.aws_authentication is the
static/credentials form (e.g. credentials exist and no session_token/profile
type) and call the Helm fail function with a clear message if it is not, or
alternatively only render the clp.s3-access-key-id/clp.s3-secret-access-key and
clp.s3-auth-provider=clp_package properties when the credentials fields are
present; reference the $.Values.clpConfig.archive_output.storage /
.s3_config.aws_authentication and the clp.s3-access-key-id /
clp.s3-secret-access-key symbols when making the change.
In `@tools/deployment/package-helm/templates/query-scheduler-deployment.yaml`:
- Line 1: The deployment/service templates currently only check
.Values.clpConfig.query_scheduler which stays truthy when package.query_engine
is changed to "presto"; update the top-level conditional in both
query-scheduler-deployment.yaml and query-scheduler-service.yaml to also ensure
.Values.package.query_engine is not "presto" (e.g., add an AND check that
.Values.package.query_engine != "presto" using Helm's and/ne helpers) so the
scheduler is not rendered when query_engine is set to "presto".
---
Duplicate comments:
In `@docs/src/user-docs/guides-using-presto.md`:
- Around line 33-36: The "Requirements" section currently states a prerequisite
of "A running CLP Kubernetes deployment" but the docs only show a `helm install`
command which fails for existing releases; update the documentation under the
"Requirements" heading (and the similar block at lines referenced for the second
occurrence) to either remove the running-deployment prerequisite for fresh
installs or, preferably, replace/augment the shown `helm install` with the
idempotent Helm command and an explicit upgrade example — e.g., show `helm
upgrade --install <release> <chart> ...` (or show both `helm install ...` and
the corresponding `helm upgrade` example) next to the existing `helm install` so
the guide works for both fresh and existing releases.
In `@tools/deployment/package-helm/templates/presto-worker-deployment.yaml`:
- Around line 33-35: Replace the hardcoded image "bitnami/kubectl:latest" for
the "setup-configs" container with a chart value (e.g. .Values.setupConfig.image
and .Values.setupConfig.tag or a single .Values.setupConfig.imageWithTag) and
set the values.yaml default to a pinned tag/digest that you have verified
contains /bin/sh, cp, wget and jq; keep the command as ["/bin/sh",
"/scripts/presto-worker-setup-configs.sh"]. Update chart README/values
description to note the image must include jq and wget and add a comment
reminding future updaters to pin to a specific tag or digest rather than
:latest. Ensure the template uses the new values symbol names so upgrades can
change the image via values.yaml.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 480ad11d-e1af-46f0-a62d-35394755466a
📒 Files selected for processing (11)
docs/src/user-docs/guides-k8s-deployment.mddocs/src/user-docs/guides-using-presto.mdtools/deployment/package-helm/.set-up-common.shtools/deployment/package-helm/Chart.yamltools/deployment/package-helm/set-up-multi-dedicated-test.shtools/deployment/package-helm/set-up-multi-shared-test.shtools/deployment/package-helm/templates/configmap.yamltools/deployment/package-helm/templates/presto-coordinator-deployment.yamltools/deployment/package-helm/templates/presto-coordinator-service.yamltools/deployment/package-helm/templates/presto-worker-deployment.yamltools/deployment/package-helm/templates/query-scheduler-deployment.yaml
| [kind]: https://kind.sigs.k8s.io/ | ||
| [kubeadm]: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ | ||
| [kubectl]: https://kubernetes.io/docs/tasks/tools/ | ||
| [logging-infra-issue]: https://github.com/y-scope/clp/issues/1760 |
There was a problem hiding this comment.
Remove the stale link reference.
[logging-infra-issue] is not referenced anywhere in this document, so markdownlint will keep flagging MD053 until it is deleted.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)
[warning] 664-664: Link and image reference definitions should be needed
Unused link or image reference definition: "logging-infra-issue"
(MD053, link-image-reference-definitions)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/src/user-docs/guides-k8s-deployment.md` at line 664, Remove the unused
markdown link reference "[logging-infra-issue]" from the document: locate the
link definition line that reads "[logging-infra-issue]:
https://github.com/y-scope/clp/issues/1760" and delete it (and verify there are
no remaining usages of the identifier in the file such as
"[logging-infra-issue]" inline links); this resolves the MD053 markdownlint
warning.
| NUM_PRESTO_NODES="${NUM_PRESTO_NODES:-2}" | ||
| COMPRESSION_WORKER_REPLICAS="${COMPRESSION_WORKER_REPLICAS:-2}" | ||
| QUERY_WORKER_REPLICAS="${QUERY_WORKER_REPLICAS:-2}" | ||
| REDUCER_REPLICAS="${REDUCER_REPLICAS:-2}" | ||
| PRESTO_WORKER_REPLICAS="${PRESTO_WORKER_REPLICAS:-2}" |
There was a problem hiding this comment.
The default dedicated test now allocates unused Presto nodes.
NUM_PRESTO_NODES is included in total_workers, but this script still installs the chart without enabling Presto mode. In the default path that grows the kind cluster and labels a Presto node pool that no pod can use. Default the Presto node count to 0, or switch the Helm install into Presto mode when these settings are present.
Also applies to: 37-38, 58-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/set-up-multi-dedicated-test.sh` around lines 13
- 17, NUM_PRESTO_NODES is defaulting to 2 and is counted in total_workers even
when Presto mode is not enabled, leaving unused Presto nodes; change the default
to 0 (NUM_PRESTO_NODES="${NUM_PRESTO_NODES:-0}") or, alternatively, when
NUM_PRESTO_NODES or PRESTO_WORKER_REPLICAS is set >0 switch the Helm
install/values into Presto mode so pods can be scheduled; update the same
defaults/logic for the other occurrences (lines referenced around 37-38 and
58-78) and ensure total_workers calculation and kind node pool labeling only
include Presto nodes when Presto mode is enabled.
| COMPRESSION_WORKER_REPLICAS="${COMPRESSION_WORKER_REPLICAS:-2}" | ||
| QUERY_WORKER_REPLICAS="${QUERY_WORKER_REPLICAS:-2}" | ||
| REDUCER_REPLICAS="${REDUCER_REPLICAS:-2}" | ||
| PRESTO_WORKER_REPLICAS="${PRESTO_WORKER_REPLICAS:-2}" |
There was a problem hiding this comment.
This Presto knob is inert unless the script also enables Presto mode.
The install command still uses the chart defaults, so prestoWorker.replicas has no effect unless this script also sets clpConfig.package.query_engine=presto and a non-null clpConfig.presto. As written, the new variable only changes the banner output, not the workload this script actually exercises.
Also applies to: 28-28, 41-46
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/set-up-multi-shared-test.sh` at line 15, The
PRESTO_WORKER_REPLICAS variable is defined but never applied to the Helm values,
so setting it only changes output; update set-up-multi-shared-test.sh to pass
Presto-specific values when PRESTO_WORKER_REPLICAS is set by explicitly setting
clpConfig.package.query_engine=presto and ensuring clpConfig.presto is non-null
and prestoWorker.replicas uses PRESTO_WORKER_REPLICAS in the Helm
install/upgrade values block (the install command that currently relies on chart
defaults). Locate references to PRESTO_WORKER_REPLICAS, prestoWorker.replicas,
clpConfig.package.query_engine, and clpConfig.presto in the script and add the
corresponding Helm --set entries (or values YAML injection) so the chart
actually deploys Presto with the requested replica count.
| package: | ||
| query_engine: {{ .Values.clpConfig.package.query_engine | quote }} |
There was a problem hiding this comment.
Gate both engine branches off the same flag.
This file now mixes package.query_engine, with .Values.clpConfig.query_*, and if .Values.clpConfig.presto. That can render split-brain config: ClpQueryEngine: "presto" while presto/PrestoHost are null, or the CLP-S sections still exist because their values are present. Please derive one $prestoEnabled := eq .Values.clpConfig.package.query_engine "presto" flag and use it consistently for both the legacy query blocks and the Presto blocks. The same predicate should also gate the Presto deployment templates.
Possible template shape
+{{- $prestoEnabled := eq .Values.clpConfig.package.query_engine "presto" }}
...
- {{- with .Values.clpConfig.query_scheduler }}
+ {{- if not $prestoEnabled }}
+ {{- with .Values.clpConfig.query_scheduler }}
query_scheduler:
...
{{- end }}
+ {{- end }}
...
- {{- if .Values.clpConfig.presto }}
+ {{- if $prestoEnabled }}
presto:
host: "{{ include "clp.fullname" . }}-presto-coordinator"
port: 8889
{{- else }}
presto: null
{{- end }}
...
-{{- with .Values.clpConfig.presto }}
+{{- if $prestoEnabled }}
+{{- with .Values.clpConfig.presto }}
presto-coordinator-catalog-clp.properties: |
...
+{{- end }}
+{{- end }}Also applies to: 132-158, 227-233, 326-334, 337-337
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 129 -
130, The configmix allows mismatched flags causing split-brain: introduce a
single predicate variable (e.g., $prestoEnabled := eq
.Values.clpConfig.package.query_engine "presto") at the top of the Helm template
and use that variable everywhere that currently mixes
.Values.clpConfig.package.query_engine, .Values.clpConfig.query_*, and
.Values.clpConfig.presto; replace the existing conditionals in the CLP-S legacy
blocks (the sections referencing ClpQueryEngine, PrestoHost, presto/PrestoHost)
and the Presto blocks with checks against $prestoEnabled so both the configmap
entries and the Presto deployment templates are gated by the same boolean.
Ensure every place noted in the review (around package.query_engine and the
blocks spanning lines ~132-158, ~227-233, ~326-334, and the single-line block
~337) uses that single $prestoEnabled predicate.
| {{- with $.Values.clpConfig.archive_output.storage }} | ||
| {{- if eq .type "s3" }} | ||
| clp.storage-type=s3 | ||
| clp.s3-auth-provider=clp_package | ||
| clp.s3-access-key-id={{ .s3_config.aws_authentication.credentials.access_key_id }} | ||
| clp.s3-end-point=https://{{ .s3_config.bucket }}.s3.{{ .s3_config.region_code }}.amazonaws.com/ | ||
| clp.s3-secret-access-key={{ .s3_config.aws_authentication.credentials.secret_access_key }} | ||
| {{- end }}{{/* if eq .type "s3" */}} |
There was a problem hiding this comment.
Fail fast when Presto S3 is given a non-static auth config.
This block always dereferences aws_authentication.credentials.access_key_id/secret_access_key, but the same chart still allows other auth shapes elsewhere (type: profile, optional session_token). In Presto mode those inputs will either render broken connector config or silently drop required auth data. Either reject non-static auth here with fail, or plumb the supported auth mode explicitly into the connector properties.
Possible guard
{{- with $.Values.clpConfig.archive_output.storage }}
{{- if eq .type "s3" }}
+ {{- $auth := .s3_config.aws_authentication | default dict }}
+ {{- if not $auth.credentials }}
+ {{- fail "Presto S3 mode currently requires archive_output.storage.s3_config.aws_authentication.credentials" }}
+ {{- end }}
clp.storage-type=s3
clp.s3-auth-provider=clp_package
- clp.s3-access-key-id={{ .s3_config.aws_authentication.credentials.access_key_id }}
+ clp.s3-access-key-id={{ $auth.credentials.access_key_id }}
clp.s3-end-point=https://{{ .s3_config.bucket }}.s3.{{ .s3_config.region_code }}.amazonaws.com/
- clp.s3-secret-access-key={{ .s3_config.aws_authentication.credentials.secret_access_key }}
+ clp.s3-secret-access-key={{ $auth.credentials.secret_access_key }}
{{- end }}{{/* if eq .type "s3" */}}
{{- end }}{{/* with $.Values.clpConfig.archive_output.storage */}}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/configmap.yaml` around lines 388 -
395, The S3 block under the with $.Values.clpConfig.archive_output.storage
blindly dereferences
.s3_config.aws_authentication.credentials.{access_key_id,secret_access_key}
which breaks for non-static auth shapes; update the template around the eq .type
"s3" branch to validate that .s3_config.aws_authentication is the
static/credentials form (e.g. credentials exist and no session_token/profile
type) and call the Helm fail function with a clear message if it is not, or
alternatively only render the clp.s3-access-key-id/clp.s3-secret-access-key and
clp.s3-auth-provider=clp_package properties when the credentials fields are
present; reference the $.Values.clpConfig.archive_output.storage /
.s3_config.aws_authentication and the clp.s3-access-key-id /
clp.s3-secret-access-key symbols when making the change.
| @@ -1,3 +1,4 @@ | |||
| {{- if .Values.clpConfig.query_scheduler }} | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and inspect the values.yaml to confirm clpConfig.query_scheduler default
fd -t f "values.yaml" tools/deployment/package-helm/Repository: y-scope/clp
Length of output: 96
🏁 Script executed:
# Read the values.yaml file to check query_scheduler and query_engine defaults
cat -n tools/deployment/package-helm/values.yaml | grep -A 10 -B 2 "query_scheduler"Repository: y-scope/clp
Length of output: 602
🏁 Script executed:
# Also check the query_engine setting location
cat -n tools/deployment/package-helm/values.yaml | grep -A 5 -B 5 "query_engine"Repository: y-scope/clp
Length of output: 326
🏁 Script executed:
# Read the actual query-scheduler-deployment.yaml to see current gating
cat -n tools/deployment/package-helm/templates/query-scheduler-deployment.yaml | head -20Repository: y-scope/clp
Length of output: 764
🏁 Script executed:
# Read the query-scheduler-service.yaml to check its gating
cat -n tools/deployment/package-helm/templates/query-scheduler-service.yaml | head -20Repository: y-scope/clp
Length of output: 638
Also gate the scheduler off when query_engine is set to "presto".
clpConfig.query_scheduler is a non-null map by default, so this condition stays truthy when a user only flips package.query_engine to "presto". The scheduler still renders in Presto mode, breaking the query-engine switch. The same fix is needed in query-scheduler-service.yaml.
Suggested fix
-{{- if .Values.clpConfig.query_scheduler }}
+{{- if and .Values.clpConfig.query_scheduler (ne .Values.clpConfig.package.query_engine "presto") }}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {{- if .Values.clpConfig.query_scheduler }} | |
| {{- if and .Values.clpConfig.query_scheduler (ne .Values.clpConfig.package.query_engine "presto") }} |
🧰 Tools
🪛 YAMLlint (1.38.0)
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/deployment/package-helm/templates/query-scheduler-deployment.yaml` at
line 1, The deployment/service templates currently only check
.Values.clpConfig.query_scheduler which stays truthy when package.query_engine
is changed to "presto"; update the top-level conditional in both
query-scheduler-deployment.yaml and query-scheduler-service.yaml to also ensure
.Values.package.query_engine is not "presto" (e.g., add an AND check that
.Values.package.query_engine != "presto" using Helm's and/ne helpers) so the
scheduler is not rendered when query_engine is set to "presto".
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/deployment/package-helm/values.yaml (1)
68-75:⚠️ Potential issue | 🟠 MajorKeep the reducer toleration aligned with the new query node selector.
Line 70 moves reducers onto
yscope.io/nodeType: "query", but Line 74 still showsyscope.io/dedicated: "reducer". If operators copy this example onto tainted query nodes, reducer pods will stay Pending because they will not tolerate the taint.Proposed fix
# tolerations: # - key: "yscope.io/dedicated" # operator: "Equal" - # value: "reducer" + # value: "query" # effect: "NoSchedule"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/deployment/package-helm/values.yaml` around lines 68 - 75, The toleration block is mismatched with the nodeSelector: nodeSelector uses yscope.io/nodeType: "query" but tolerations still reference yscope.io/dedicated: "reducer", causing reducer pods to stay Pending; update the tolerations under scheduling to match the taint key/value used on query nodes (e.g., change the toleration key to "yscope.io/nodeType" and value to "query", keep operator "Equal" and effect "NoSchedule"), or alternatively align the nodeSelector back to reducer — modify the keys/values in the tolerations stanza (tolerations, key, value, operator, effect) to be consistent with the nodeSelector.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tools/deployment/package-helm/values.yaml`:
- Around line 196-209: The Presto values block is missing documentation that
users must also set clpConfig.package.query_engine to "presto" (instead of the
default "clp-s") to switch the CLP search flow; update the Presto section in
values.yaml to add a short note instructing users to set
clpConfig.package.query_engine: "presto" (or to change the existing
clpConfig.package.query_engine value) when enabling presto: <...> so that
clpConfig.presto is not treated as self-contained and the Web UI uses the Presto
search flow rather than the default "clp-s".
---
Outside diff comments:
In `@tools/deployment/package-helm/values.yaml`:
- Around line 68-75: The toleration block is mismatched with the nodeSelector:
nodeSelector uses yscope.io/nodeType: "query" but tolerations still reference
yscope.io/dedicated: "reducer", causing reducer pods to stay Pending; update the
tolerations under scheduling to match the taint key/value used on query nodes
(e.g., change the toleration key to "yscope.io/nodeType" and value to "query",
keep operator "Equal" and effect "NoSchedule"), or alternatively align the
nodeSelector back to reducer — modify the keys/values in the tolerations stanza
(tolerations, key, value, operator, effect) to be consistent with the
nodeSelector.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4b9dff88-375b-4629-a788-f3f9dc195653
📒 Files selected for processing (1)
tools/deployment/package-helm/values.yaml
…gine` and update Presto configurations
| presto-coordinator-catalog-clp.properties: | | ||
| connector.name=clp | ||
| clp.metadata-provider-type=mysql | ||
| clp.metadata-db-url=jdbc:mysql://{{ include "clp.fullname" $ }}-database:3306 |
There was a problem hiding this comment.
should we use {{ include "clp.databaseHost" . }} to support un-bundled database?
| {{- include "clp.labels" . | nindent 8 }} | ||
| app.kubernetes.io/component: "presto-worker" | ||
| spec: | ||
| serviceAccountName: {{ include "clp.fullname" . }}-job-watcher |
|
|
||
| :::: | ||
|
|
||
| ### Using Presto as the query engine |
There was a problem hiding this comment.
this section duplicates with guides-using-presto.md. Shall we use a reference link?
| [kind]: https://kind.sigs.k8s.io/ | ||
| [kubeadm]: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/ | ||
| [kubectl]: https://kubernetes.io/docs/tasks/tools/ | ||
| [logging-infra-issue]: https://github.com/y-scope/clp/issues/1760 |
| PRESTO_VERSION=$(wget -q -O - "${DISCOVERY_URI}/v1/info" | jq -r '.version') | ||
| echo "presto.version=${PRESTO_VERSION}" >> "${PRESTO_CONFIG_DIR}/config.properties" |
There was a problem hiding this comment.
bitnami/kubectl:latest doesn't have wget and this command fails on my machine. As a result, the worker doesn't boot correctly. Not sure if I mis-configured something.
Description
Add Presto coordinator and worker deployments to the CLP Helm chart, gated by
clpConfig.presto(non-null enables Presto). WhenclpConfig.prestois configured, the chart deploys a Presto coordinator (single replica) and configurable Presto worker(s). Presto runs alongside the existing compression pipeline—the two query engines are not mutually exclusive.The
query_enginesetting controls which search interface the Web UI displays (clp-s searchvs. Presto SQL queries) but does not gate any infrastructure deployments. The clp-s query
pipeline components (query scheduler, query workers, reducers) and the API server can optionally be disabled by setting their config keys to
nullto save resources.Checklist
breaking change.
Validation performed
1. Helm template renders correctly (Presto mode)
Task: Verify the Helm chart renders all Presto resources and excludes query/reducer/api-server
templates when Presto is enabled and query pipeline components are set to
null.Command:
Output:
Explanation: Presto coordinator, service, and worker templates are rendered. Api-server, query
scheduler, query worker, and reducer templates are correctly excluded.
2. Helm template renders correctly (clp-s mode)
Task: Verify the clp-s query pipeline resources are rendered when
query_engineis"clp-s",and Presto resources are excluded.
Command:
Output:
Explanation: Api-server, query scheduler, query worker, reducer, and results cache templates are
rendered. Presto templates are correctly excluded.
3. Helm template renders correctly using the docs' presto-values.yaml
Task: Verify that a
presto-values.yamlfile created by following theguides-k8s-deployment.mddocumentation renders without errors.
presto-values.yaml (following the docs, with placeholders replaced):
Command:
Output:
Explanation: The template renders without errors using the exact values file from the docs
(with placeholders filled in). The garbage-collector is correctly excluded because
results_cache.retention_periodis set tonullandarchive_output.retention_perioddefaultsto
null, so there is nothing to garbage-collect.4. Multi-node cluster: clp-s mode (default)
Task: Verify that the Helm chart deploys correctly in clp-s mode on a multi-node kind cluster
with dedicated worker nodes.
Command:
Output (final pod status):
Explanation: All clp-s pipeline pods are running (api-server, query-scheduler, 2x query-worker,
2x reducer, results-cache, etc.). No Presto pods are present.
Node scheduling verification:
Compression workers are scheduled on compression-labeled nodes and query workers on query-labeled
nodes.
5. Multi-node cluster: Presto mode
Task: Verify that the Helm chart deploys correctly in Presto mode on a multi-node kind cluster.
The test script was temporarily modified to add
-f /tmp/presto-values.yamlto thehelm installcommand and remove the query worker/reducer
--setflags (since those components are not deployed inPresto mode).
Command:
Output (final pod status):
Explanation: Presto coordinator and worker pods are running. Api-server, query-scheduler,
query-worker, and reducer pods are correctly absent. The Presto worker init container successfully
detected the coordinator version and initialized its config.
Presto pod verification:
Compressed sample logs postgresql.jsonl, performed queries in the webui and observed results were
correctly returned.
Summary by CodeRabbit
New Features
Documentation