making the evaluation tests more stable #213

andrej1991 · 2025-09-19T15:04:48Z

Updating the query in several evaluation tests so the instructions would be more straightforward.
Adding a post script to the test which create a cluster, what deletes the created cluster - it helps reducing the number of clusters in the pool, which makes the environment for the tests more uniform.
Updating the test cluster_id_from_name with a setup script which creates a cluster, because when the chat is used to create the cluster then mostly it simply just remembers the UUID of the created cluster.
Also adding a check which verifies if the cluster was really created, because the proper tool call is only half of the acceptance criteria.

Summary by CodeRabbit

New Features
- Enforced UNIQUE_ID precondition, clearer success banner, and stricter failure detection for evaluations.
Tests
- Expanded evaluation scenarios: action evaluations, setup/verify/cleanup hooks, updated cluster naming/network flows, refined cloud-refusal behavior and verification steps.
Chores
- CI/test tooling improvements: safer token handling, isolated temp-dir runs, executable test scripts, added env vars for OCM endpoints and pull-secret access, new Makefile test targets, and cluster lifecycle helper scripts.
Documentation
- Clarified cluster-name-to-ID matching workflow and provider guidance.

coderabbitai · 2025-09-19T15:04:57Z

Walkthrough

Require UNIQUE_ID at runtime; stage a modified eval_data.yaml into a temp dir; tighten CI token handling; run evaluator earlier and fail on any non-zero evaluation failures; add setup/verify/cleanup hooks and action_eval types in eval data; add OCM helper scripts and pod env vars for staged cluster tests.

Changes

Cohort / File(s)	Summary
Evaluation runner `test/evals/eval.py`	Import `os`; require `UNIQUE_ID` env var (exit 1 if missing); call `evaluator.run_evaluation()` immediately after instantiation; treat any non-zero `failed_evals_count` as failure; print success message and exit with appropriate code.
Evaluation data `test/evals/eval_data.yaml`	Add `setup_script`, `eval_verify_script`, `cleanup_script`; add `action_eval` to `eval_types`; introduce/standardize `uniq-cluster-name` substitution; update expected tool calls and add `platform` fields; revise SNO/cluster/host/network and cluster-id flows.
CI prow files `test/prow/Dockerfile`, `test/prow/entrypoint.sh`, `test/prow/template.yaml`	Dockerfile: install `vim` and chmod `test/scripts/*.sh`; entrypoint: validate `UNIQUE_ID`, use `mktemp` `TEMP_DIR`, robustly obtain/export `OCM_TOKEN`, stage token & `eval_data.yaml` into temp dir, invoke eval with staged YAML; template: add `OCM_BASE_URL` parameter/env.
Makefile & k8s eval script `Makefile`, `scripts/eval_k8s.sh`	Add `.ONESHELL` and new `test-eval` target; both targets use strict shell flags, create `UNIQUE_ID` and `TEMP_DIR`, copy & sed-substitute `eval_data.yaml` into temp, persist OCM token to staged path, and run evaluator with staged YAML; `eval_k8s.sh` exports `UNIQUE_ID`/`TEMP_DIR` and updates invocation.
Assisted API helper scripts `test/scripts/cluster_create_test_setup.sh`, `test/scripts/cluster_created_check.sh`, `test/scripts/delete_cluster.sh`	New scripts to create infra via OCM/Assisted APIs (fetch pull secret), poll/check cluster creation, and delete clusters matching `-<UNIQUE_ID>`; use strict bash, `curl -sSf`/`jq`, and honor `OCM_TOKEN`/`OCM_BASE_URL`.
Pod manifest `assisted-chat-pod.yaml`	Add `INVENTORY_URL` and `PULL_SECRET_URL` environment variables to `assisted-service-mcp` container.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CI as CI / Makefile
  participant Entrypoint as entrypoint.sh
  participant FS as TEMP_DIR (staged files)
  participant Eval as test/evals/eval.py
  participant Hooks as setup/verify/cleanup scripts
  participant OCM as OCM / Assisted API

  CI->>Entrypoint: start job (generate or receive UNIQUE_ID)
  Entrypoint->>FS: mktemp, copy & sed eval_data.yaml (uniq-cluster-name→UNIQUE_ID), write ocm_token.txt
  Entrypoint->>Eval: python test/evals/eval.py --eval_data_yaml FS/eval_data.yaml
  Eval->>Eval: require UNIQUE_ID (exit 1 if missing)
  Eval->>Hooks: run setup_script (optional)
  Eval->>OCM: perform tool/action calls
  OCM-->>Eval: responses/tool outputs
  Eval->>Hooks: run eval_verify_script (optional)
  Eval->>Eval: compute failed_evals_count (fail if >0)
  alt success
    Eval-->>Entrypoint: exit 0 ("🎉 All evaluations passed!")
  else failure
    Eval-->>Entrypoint: exit 1 (print summary)
  end
  Entrypoint->>Hooks: run cleanup_script (optional)

sequenceDiagram
  autonumber
  participant Setup as cluster_create_test_setup.sh
  participant API as OCM/Assisted APIs
  participant Verify as cluster_created_check.sh
  participant Cleanup as delete_cluster.sh

  Setup->>API: GET pull secret, POST /clusters, POST /infra-envs
  Verify->>API: poll GET /clusters for UNIQUE_ID
  alt Found
    Verify-->>Setup: success
  else Timeout
    Verify-->>Setup: fail (exit 1)
  end
  Cleanup->>API: GET /clusters -> DELETE /clusters/{id} where name endswith -UNIQUE_ID

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

MGMT-21148: Add initial eval tests #40 — modifies test/evals/eval.py (invocation/result handling) and is directly related.
eval tests copy eval_data to temp dir to avoid permission issues #207 — changes test/prow/entrypoint.sh to stage eval_data.yaml into a temp dir and adjust invocation.
creating a unique name for the clusters created by the evaluation tests #198 — introduces/propagates UNIQUE_ID-based cluster-name substitution across tests and CI scripts.

Suggested labels

ok-to-test

Suggested reviewers

keitwb
maorfr
omertuc

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "making the evaluation tests more stable" directly captures the primary goal of this pull request—enhancing the reliability of evaluation tests through script updates, new setup and cleanup steps, and added verification—making it both specific and concise.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

Public repositories are always opted into early access features.
You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

test/prow/entrypoint.sh (2)
8-12: Address shellcheck warning for variable assignment.

While the current code works, shellcheck recommends separating declaration and assignment to avoid masking return values, which could help debug potential issues with the OCM token retrieval.

Apply this diff to address the shellcheck warning:
-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
+OCM_TOKEN_RAW=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
   -H "Content-Type: application/x-www-form-urlencoded" \
   -d "grant_type=client_credentials" \
   -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+  -d "client_secret=$CLIENT_SECRET")
+export OCM_TOKEN=$(echo "$OCM_TOKEN_RAW" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
14-16: Address shellcheck warnings for variable assignments.

Similar to the OCM_TOKEN, separating declaration and assignment would improve error handling visibility.

Apply this diff to address the shellcheck warnings:
-export WORK_DIR=$(pwd)
+WORK_DIR=$(pwd)
+export WORK_DIR
 TEST_DIR="${WORK_DIR}/test/evals"
-export TEMP_DIR=$(mktemp -d)
+TEMP_DIR=$(mktemp -d)
+export TEMP_DIR
test/prow/cluster_create_test_setup.sh (1)
3-3: Fix potential data handling issue with pull secret.

The sed command that escapes quotes in the pull secret response may not handle all edge cases correctly, especially if the response contains nested JSON or special characters.

Apply this diff to use jq for more robust JSON handling:
-PULL_SECRET=$(curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer ${OCM_TOKEN}" https://api.stage.openshift.com/api/accounts_mgmt/v1/access_token | sed 's/"/\\"/g')
+PULL_SECRET=$(curl -X POST -H "Content-Type: application/json" -H "Authorization: Bearer ${OCM_TOKEN}" https://api.stage.openshift.com/api/accounts_mgmt/v1/access_token | jq -c .)
This change uses jq -c . to compact and properly escape the JSON response, which is more reliable than sed for JSON manipulation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cdf477 and 1fd460a.

📒 Files selected for processing (5)

test/evals/eval.py (1 hunks)
test/evals/eval_data.yaml (4 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🧰 Additional context used

🪛 Shellcheck (0.10.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 14-14: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 16-16: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

🔇 Additional comments (11)

test/prow/Dockerfile (1)

4-4: LGTM!

Adding vim to the installed packages makes sense for the enhanced test environment, especially when combined with the cluster setup scripts that may benefit from text editing capabilities during testing workflows.

test/evals/eval.py (1)

91-91: Make the failure condition more strict for better test reliability.

The change from failed_evals_count > 2 to failed_evals_count (any failure) makes the tests more strict. This aligns with the PR objective of creating "more stable eval tests" by catching any evaluation failures immediately rather than allowing up to 2 failures.

test/evals/eval_data.yaml (5)

43-44: LGTM!

The change from keyword-based evaluation to accuracy-based evaluation for the static networking support test provides more reliable results by comparing the exact response rather than just checking for specific keywords.

91-91: Minor: Trailing period improves query consistency.

Adding a trailing period to the eval query improves consistency with natural language formatting in the test data.

99-100: Update parameter values for better test consistency.

The changes to use explicit string values ("x86_64" and "") instead of None for cpu_architecture and ssh_public_key provide clearer test expectations and avoid potential null handling issues in the evaluation system.

121-123: LGTM!

The simplified accuracy-based evaluation for host boot status provides clearer test expectations and eliminates the complexity of tool call verification for this specific scenario.

180-189: Approve: setup_script path substitution verified. test/prow/cluster_create_test_setup.sh exists; test/evals/eval_data.yaml contains the entry; test/prow/entrypoint.sh runs sed -i 's||${WORK_DIR}|g' on $TEMP_DIR/eval_data.yaml, so the placeholder is correctly substituted.

test/prow/entrypoint.sh (1)

25-25: LGTM!

The workdir placeholder substitution correctly replaces the placeholder in the eval_data.yaml with the actual working directory, enabling the setup scripts to be found at runtime.

test/prow/cluster_create_test_setup.sh (3)

5-7: LGTM!

The cluster creation API call correctly uses the Assisted Installer v2 endpoints and includes all required parameters. The JSON structure matches the official API specification, and the jq '.id' extraction properly captures the cluster ID for subsequent use.

10-12: LGTM!

The infrastructure environment creation follows the correct API pattern, linking the infra-env to the cluster using the captured CLUSTER_ID. The comment on line 9 correctly notes that CLUSTER_ID is already quoted from the jq extraction, which is important for proper JSON formatting.

1-12: Verify OCM_TOKEN/UNIQUE_ID are set and CLUSTER_ID quoting is correct

Confirmed: script references api.stage.openshift.com and uses OCM_TOKEN and UNIQUE_ID (test/prow/cluster_create_test_setup.sh).

Action: ensure UNIQUE_ID and OCM_TOKEN are provided at runtime and verify that jq '.id' returns a quoted string; if it returns an unquoted value, wrap cluster_id in quotes in the infra-env payload or adjust jq output.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

test/evals/eval_data.yaml (2)

113-113: Fix duplicated wording in expected_response.

“The SSH public key is set for the cluster for cluster” → drop the duplicate “for cluster”.

Apply:
-expected_response: The SSH public key is set for the cluster for cluster
+expected_response: The SSH public key is set for the cluster
123-126: Grammar nit: plural agreement.

“hosts hasn't been discovered yet.” → “hosts haven't been discovered yet.”

Apply:
-expected_response: "hosts hasn't been discovered yet."
+expected_response: "hosts haven't been discovered yet."

test/prow/entrypoint.sh (3)

8-12: Harden token fetch: avoid SC2155, quote data, and fail on HTTP errors.

Separate assignment from export, use jq -r, add --fail-with-body and retries, and URL-encode form data.

Apply:

-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+OCM_TOKEN=$(
+  curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+    -X POST 'https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token' \
+    -H 'Content-Type: application/x-www-form-urlencoded' \
+    --data-urlencode "grant_type=client_credentials" \
+    --data-urlencode "client_id=${CLIENT_ID}" \
+    --data-urlencode "client_secret=${CLIENT_SECRET}" \
+  | jq -r '.access_token'
+)
+export OCM_TOKEN
+[ -n "$OCM_TOKEN" ] || { echo "ERROR: failed to obtain OCM token"; exit 1; }

14-16: Avoid SC2155; also quote cd, and clean up temp dir.

Export after assigning; add trap to remove TEMP_DIR.

Apply:

-export WORK_DIR=$(pwd)
+WORK_DIR=$(pwd)
+export WORK_DIR
 TEST_DIR="${WORK_DIR}/test/evals"
-export TEMP_DIR=$(mktemp -d)
+TEMP_DIR=$(mktemp -d)
+export TEMP_DIR
+trap 'rm -rf -- "$TEMP_DIR"' EXIT
@@
-cd $TEMP_DIR
+cd "$TEMP_DIR"

23-27: Quote paths; secure secrets on disk.

Quote cp/sed paths; restrict file perms for ocm_token.txt and .env to avoid accidental leakage.

Apply:

-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|<workdir-placeholder>|${WORK_DIR}|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|<workdir-placeholder>|${WORK_DIR}|g" "$TEMP_DIR/eval_data.yaml"
@@
-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+umask 077
+echo "$OCM_TOKEN" > ocm_token.txt
+echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fd460a and a8146fa.

📒 Files selected for processing (6)

test/evals/eval.py (1 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

test/prow/Dockerfile
test/prow/cluster_create_test_setup.sh
test/evals/eval.py

🧰 Additional context used

🪛 Shellcheck (0.10.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 14-14: Declare and assign separately to avoid masking return values.

(SC2155)

[warning] 16-16: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request

🔇 Additional comments (4)

test/evals/eval_data.yaml (3)

62-62: Confirm absolute cleanup_script path handling.

You substitute at runtime. Verify the harness resolves absolute paths for cleanup_script when the YAML from $TEMP_DIR is used. If not, prefer relative paths from repo root.

93-103: Empty ssh_public_key is now expected; confirm tool accepts it.

Setting ssh_public_key: "" assumes create_cluster permits an empty key for multi-node. Ensure the tool doesn’t reject empty strings or silently coerce to null.

180-181: Setup/cleanup lifecycle order.

Ensure the group-level setup_script runs before all tests in the group and cleanup_script runs even on failure, to avoid leftover clusters if a prior step fails.

test/prow/entrypoint.sh (1)

27-27: Double YAML args: confirm precedence/merge behavior.

You pass both repo and temp YAMLs. Ensure eval.py uses the latter to override values (like substituted paths). If not guaranteed, drop the first arg.

coderabbitai · 2025-09-19T19:08:15Z

test/scripts/delete_cluster.sh

+curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do
+  id=$(echo "$item" | jq -r '.id')
+  name=$(echo "$item" | jq -r '.name')
+  if [[ "$name" == *"-${UNIQUE_ID}" ]]; then
+    echo "The cluster '${name}', ${id} is going to be deleted"
+    curl -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}"
+  fi
+done


⚠️ Potential issue

Make deletion robust: fail on non-2xx, retry, and guard required env vars.

Right now DELETE requests don’t use --fail, so HTTP errors won’t abort, risking leaked clusters. Also add retries/timeouts and validate OCM_TOKEN/UNIQUE_ID are set. Use read -r to avoid backslash mangling.

Apply:

@@ -set -o nounset +set -o nounset set -o errexit set -o pipefail @@ -ASSISTED_SERVICE_URL="https://api.stage.openshift.com/api/assisted-install/v2" +ASSISTED_SERVICE_URL="https://api.stage.openshift.com/api/assisted-install/v2" + +# Required env vars +: "${OCM_TOKEN:?OCM_TOKEN is required}" +: "${UNIQUE_ID:?UNIQUE_ID is required}" @@ -curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do +curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \ + -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \ + "${ASSISTED_SERVICE_URL}/clusters" \ +| jq -c '[.[] | {id, name}][]' \ +| while IFS= read -r item; do id=$(echo "$item" | jq -r '.id') name=$(echo "$item" | jq -r '.name') if [[ "$name" == *"-${UNIQUE_ID}" ]]; then echo "The cluster '${name}', ${id} is going to be deleted" - curl -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}" + curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \ + -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \ + "${ASSISTED_SERVICE_URL}/clusters/${id}" \ + || echo "WARN: failed to delete cluster ${id} (${name})" fi done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do

id=$(echo "$item" | jq -r '.id')

name=$(echo "$item" | jq -r '.name')

if [[ "$name" == *"-${UNIQUE_ID}" ]]; then

echo "The cluster '${name}', ${id} is going to be deleted"

curl -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}"

fi

done

curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \

-H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \

"${ASSISTED_SERVICE_URL}/clusters" \

| jq -c '[.[] | {id, name}][]' \

| while IFS= read -r item; do

id=$(echo "$item" | jq -r '.id')

name=$(echo "$item" | jq -r '.name')

if [[ "$name" == *"-${UNIQUE_ID}" ]]; then

echo "The cluster '${name}', ${id} is going to be deleted"

curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \

-X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \

"${ASSISTED_SERVICE_URL}/clusters/${id}" \

|| echo "WARN: failed to delete cluster ${id} (${name})"

fi

done

🤖 Prompt for AI Agents

In test/prow/delete_cluster.sh around lines 9 to 16, the DELETE loop is brittle: it doesn't validate required env vars, uses read without -r, and issues curl without failing on non-2xx or retrying; update the script to first check that OCM_TOKEN and UNIQUE_ID are set (exit non-zero with a clear message if not), change the while read to while read -r to avoid backslash mangling, and call curl with --fail and sensible timeout/retry flags (e.g., --retry, --retry-delay and --max-time) so failures return non-zero and transient errors are retried; ensure the script exits non-zero on any unrecoverable HTTP error so cluster deletions cannot silently fail.

andrej1991 · 2025-09-19T19:08:56Z

/hold

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/prow/entrypoint.sh (1)

8-12: Harden token retrieval; fix SC2155; fail fast on HTTP/parse errors.

Avoid export with command substitution (SC2155). Assign first, then export.
Add curl -sSsf + retries/timeouts; use jq -er instead of sed; validate non-empty token.
Note: --data-urlencode still exposes CLIENT_SECRET in the process list. If runners are multi-tenant, prefer piping body via stdin (heredoc) to avoid arglist exposure.

Apply this diff:

-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+OCM_TOKEN="$(
+  curl -sSsf --retry 3 --retry-all-errors --max-time 20 \
+    -H 'Content-Type: application/x-www-form-urlencoded' \
+    --data-urlencode grant_type=client_credentials \
+    --data-urlencode "client_id=${CLIENT_ID}" \
+    --data-urlencode "client_secret=${CLIENT_SECRET}" \
+    'https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token' \
+  | jq -er '.access_token'
+)"
+[ -n "${OCM_TOKEN}" ] || { echo "access_token not found in response"; exit 1; }
+export OCM_TOKEN

Alternative (avoids secret in argv; recommended on shared runners):

OCM_TOKEN="$(
  curl -sSsf --retry 3 --retry-all-errors --max-time 20 \
    -H 'Content-Type: application/x-www-form-urlencoded' \
    --data @- \
    'https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token' <<EOF \
  | jq -er '.access_token'
grant_type=client_credentials&client_id=${CLIENT_ID}&client_secret=${CLIENT_SECRET}
EOF
)"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8146fa and 92fabeb.

📒 Files selected for processing (6)

test/evals/eval.py (1 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

test/prow/Dockerfile
test/prow/cluster_create_test_setup.sh
test/prow/delete_cluster.sh
test/evals/eval.py
test/evals/eval_data.yaml

🧰 Additional context used

🪛 Shellcheck (0.10.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

coderabbitai · 2025-09-19T19:21:38Z

test/prow/entrypoint.sh

 cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
 sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
+sed -i "s|<workdir-placeholder>|${WORK_DIR}|g" $TEMP_DIR/eval_data.yaml

 python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml


⚠️ Potential issue

🧩 Analysis chain

Quote paths; secure secret files; add cleanup trap.

Quote $TEMP_DIR/$TEST_DIR usages to survive spaces.

Write secrets with 0600 perms; prefer printf over echo.

Add umask 077 and trap to remove the mktemp dir.

Verify eval.py truly accepts repeated --eval_data_yaml.

Apply this diff:

-TEMP_DIR=$(mktemp -d) +TEMP_DIR=$(mktemp -d) +umask 077 +trap 'rm -rf "$TEMP_DIR"' EXIT @@ -cd $TEMP_DIR +cd "$TEMP_DIR" @@ -echo "$OCM_TOKEN" > ocm_token.txt -echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env +install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt" +printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt" +install -m 600 /dev/null "$TEMP_DIR/.env" +printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env" @@ -cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml -sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml -sed -i "s|<workdir-placeholder>|${WORK_DIR}|g" $TEMP_DIR/eval_data.yaml +cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml" +sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml" +sed -i "s|<workdir-placeholder>|${WORK_DIR}|g" "$TEMP_DIR/eval_data.yaml" @@ -python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml +python "$TEST_DIR/eval.py" \ + --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \ + --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \ + --eval_data_yaml "$TEST_DIR/eval_data.yaml" \ + --eval_data_yaml "$TEMP_DIR/eval_data.yaml"

Run to verify necessity of exporting OCM_TOKEN and duplicate flag support:

🏁 Script executed:

#!/bin/bash # Search for OCM_TOKEN consumption and eval_data_yaml handling rg -nC2 '\bOCM_TOKEN\b' test/prow || true rg -nC2 --type=python '\-\-eval_data_yaml' || true

Length of output: 3393

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== test/prow/entrypoint.sh (head) ===" sed -n '1,240p' test/prow/entrypoint.sh || true echo echo "=== OCM_TOKEN occurrences in test/prow ===" rg -nC2 '\bOCM_TOKEN\b' test/prow || true echo echo "=== test/prow/eval.py (if present) ===" if [ -f test/prow/eval.py ]; then sed -n '1,400p' test/prow/eval.py else echo "test/prow/eval.py not found" fi echo echo "=== occurrences of --eval_data_yaml in repo ===" rg -n -- '--eval_data_yaml' || true

Length of output: 4803

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== test/evals/eval.py (first 400 lines) ===" sed -n '1,400p' test/evals/eval.py || true echo echo "=== add_argument lines that mention eval_data_yaml ===" rg -n --type=python "add_argument.*eval_data_yaml" test/evals || true echo echo "=== all occurrences of eval_data_yaml in test/evals ===" rg -n --type=python "eval_data_yaml" test/evals || true

Length of output: 2763

Quote vars, secure secret files, add cleanup trap; keep OCM_TOKEN export; make --eval_data_yaml intent explicit.

Quote all variable/path usages (e.g., cd "$TEMP_DIR", cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml", quote the python invocation).

Secure secrets: create files with 0600 and write with printf (avoid plain echo). Add umask 077 and trap to remove the temp dir.

Do not remove/export change OCM_TOKEN: test/prow/entrypoint.sh exports it (line 8) and it is consumed by other scripts (test/prow/cluster_create_test_setup.sh, test/prow/delete_cluster.sh).

test/evals/eval.py defines a single --eval_data_yaml (around line 26); passing it twice will let the last value win. Either change eval.py to accept multiple files (action='append') or pass a single merged file — do not assume argparse will merge both.

Apply this diff:

-TEMP_DIR=$(mktemp -d) +TEMP_DIR=$(mktemp -d) +umask 077 +trap 'rm -rf "$TEMP_DIR"' EXIT @@ -cd $TEMP_DIR +cd "$TEMP_DIR" @@ -echo "$OCM_TOKEN" > ocm_token.txt -echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env +install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt" +printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt" +install -m 600 /dev/null "$TEMP_DIR/.env" +printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env" @@ -cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml -sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml -sed -i "s|<workdir-placeholder>|${WORK_DIR}|g" $TEMP_DIR/eval_data.yaml +cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml" +sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml" +sed -i "s|<workdir-placeholder>|${WORK_DIR}|g" "$TEMP_DIR/eval_data.yaml" @@ -python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml +python "$TEST_DIR/eval.py" \ + --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \ + --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \ + --eval_data_yaml "$TEST_DIR/eval_data.yaml" \ + --eval_data_yaml "$TEMP_DIR/eval_data.yaml"

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In test/prow/entrypoint.sh around lines 23 to 27, quote all variable and path usages (e.g., use quoted cd, cp and sed args and quote the python invocation and its args), ensure secrets are created securely by setting umask 077, creating token files with mode 0600 and writing via printf (not echo), add a trap to remove the temp dir on exit for cleanup, keep the existing OCM_TOKEN export unchanged, and fix the duplicate --eval_data_yaml usage by passing a single merged/effective YAML (either merge files before invocation or only pass the temp copy) rather than supplying the same flag twice so argparse semantics are correct.

carbonin

I like the idea, but I'm worried the script changes are making these tests very hard to run locally which I think is something we want to preserve.

So unless I'm missing something and it's actually handling running these locally I think we need some changes before this can go in.

test/evals/eval_data.yaml

test/prow/entrypoint.sh

test/scripts/cluster_create_test_setup.sh

test/prow/Dockerfile

andrej1991 · 2025-09-19T20:25:53Z

/test eval-test

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

test/evals/eval.py (1)
84-86: Fix the typo in the error message.

There's a spelling error in the error message that should be corrected.

Apply this diff:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+    print("The environment variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
test/prow/entrypoint.sh (1)
8-8: Fix shellcheck warning: declare and assign separately.

The current approach of exporting and assigning in the same line can mask return values from the command substitution, potentially hiding curl failures.

Apply this diff to address the shellcheck warning:
-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
+OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
   -H "Content-Type: application/x-www-form-urlencoded" \
   -d "grant_type=client_credentials" \
   -d "client_id=$CLIENT_ID" \
   -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+export OCM_TOKEN

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92fabeb and ddf0082.

📒 Files selected for processing (8)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

test/prow/cluster_create_test_setup.sh
test/prow/Dockerfile

🧰 Additional context used

🪛 Shellcheck (0.11.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

🔇 Additional comments (14)

assisted-chat-pod.yaml (1)

50-53: LGTM! Environment variables properly configured for stage deployment.

The addition of INVENTORY_URL and PULL_SECRET_URL environment variables for the stage OpenShift API endpoints aligns well with the test infrastructure setup. The new INVENTORY_URL corresponds to the stage Assisted Install API used by test setups and the new PULL_SECRET_URL aligns with the accounts_mgmt/v1/access_token endpoint used by test scripts to retrieve a pull secret.

test/evals/eval.py (3)

6-6: LGTM! Import added for environment variable checking.

The os import is correctly added to support the environment variable validation.

95-97: More strict failure condition improves test reliability.

The change from failed_evals_count > 2 to any non-zero failed_evals_count makes the evaluation more strict, which aligns with the PR objective of creating more stable evaluation tests. This ensures that any failure is treated as a test failure rather than allowing up to 2 failures to pass.

99-100: LGTM! Success message improves user feedback.

The success message and explicit exit(0) provide clear feedback and proper exit handling.
test/prow/delete_cluster.sh (2)
3-5: LGTM! Proper error handling configured.

The strict Bash error handling options are correctly set to ensure the script fails fast on errors.

9-16: Make deletion robust: fail on non-2xx, retry, and guard required env vars.

The current implementation has several reliability and security issues:

No validation of required environment variables (OCM_TOKEN, UNIQUE_ID)

curl doesn't fail on HTTP errors, risking leaked clusters

No retry logic for transient failures

Uses read without -r flag, risking backslash interpretation issues

Missing timeouts and proper error handling

Apply this diff to make the script more robust:
+# Required env vars
+: "${OCM_TOKEN:?OCM_TOKEN is required}"
+: "${UNIQUE_ID:?UNIQUE_ID is required}"
+
-curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do
+curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+  -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+  "${ASSISTED_SERVICE_URL}/clusters" \
+| jq -c '[.[] | {id, name}][]' \
+| while IFS= read -r item; do
   id=$(echo "$item" | jq -r '.id')
   name=$(echo "$item" | jq -r '.name')
   if [[ "$name" == *"-${UNIQUE_ID}" ]]; then
     echo "The cluster '${name}', ${id} is going to be deleted"
-    curl -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}"
+    curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+      -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+      "${ASSISTED_SERVICE_URL}/clusters/${id}" \
+    || echo "WARN: failed to delete cluster ${id} (${name})"
   fi
 done
test/prow/entrypoint.sh (1)
16-27: Security and robustness improvements needed for temp file handling.

Multiple security and robustness issues need to be addressed:

Unquoted variables: Risk of word splitting and glob expansion

Insecure secret handling: Using echo for secrets can expose them in process lists

No cleanup: Temp directory left behind on script exit

Missing error handling: No validation that required environment variables are set

Duplicate argument handling: Using --eval_data_yaml twice may not work as expected

The mktemp utility, by default, sets the permissions to be only read/writable by the owner. This removes the need to make umask changes or use chmod in your script, but additional security measures are still recommended.

Apply this diff to improve security and robustness:
+umask 077
+trap 'rm -rf "$TEMP_DIR"' EXIT
+
-cd $TEMP_DIR
+cd "$TEMP_DIR"

-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt"
+printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt"
+install -m 600 /dev/null "$TEMP_DIR/.env" 
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env"

-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../prow|: ${WORK_DIR}|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|: ../prow|: ${WORK_DIR}|g" "$TEMP_DIR/eval_data.yaml"

-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"
Note: I removed one of the duplicate --eval_data_yaml arguments since the argument parser expects only one value, and using the processed temp file should be sufficient.
test/evals/eval_data.yaml (7)

43-44: Improved evaluation approach with accuracy-based testing.

The change from keyword-based evaluation to accuracy-based evaluation with a specific expected response makes the test more reliable and precise. This aligns well with the PR objective of creating more stable evaluation tests.

62-62: LGTM! Cleanup scripts properly added for resource management.

The addition of cleanup_script: ../prow/delete_cluster.sh to both SNO and multi-node cluster workflows ensures proper cleanup of test resources, preventing cluster accumulation in the test environment.

Also applies to: 89-89

93-93: Minor: Add period for consistency.

Line 93 adds a period at the end of the query for consistency with other queries in the file.

101-102: Good architectural defaults for test stability.

Setting cpu_architecture: "x86_64" and ssh_public_key: "" provides sensible defaults that make the tests more predictable and less dependent on external configuration. The x86_64 architecture specification addresses the previous review comment about adding architecture to make tests more reliable.

123-125: Simplified evaluation improves test reliability.

The change from complex tool call expectations to a simple accuracy-based response check makes this test more focused and reliable. The expected response "hosts hasn't been discovered yet." is straightforward to validate.

180-181: Setup and cleanup scripts enhance test isolation.

Adding both setup_script and cleanup_script to the cluster_id_from_name test group ensures proper test isolation by creating a known cluster state before the test and cleaning up afterwards. The updated query and expected tool calls (list_clusters and cluster_info) provide better test coverage of the cluster lookup functionality.

Also applies to: 184-192

192-192: Updated expected keywords match new cluster naming convention.

The updated expected keywords referencing eval-testcluster-uniq-cluster-name align with the new cluster naming convention established by the setup scripts, ensuring test consistency.

coderabbitai · 2025-09-22T12:03:12Z

Makefile

 	./scripts/mcphost.sh

 test-eval: ## Run agent evaluation tests
+	@export UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8)


⚠️ Potential issue

Fix shell portability issue with /dev/urandom.

Using head /dev/urandom is not portable across all systems and may not behave consistently. The approach also has a potential race condition where the export might not be available to subsequent commands.

Apply this diff to use a more portable and reliable approach:

- @export UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8) + @export UNIQUE_ID=$$(openssl rand -hex 4 | tr '[:upper:]' '[:lower:]'); \ + echo "UNIQUE_ID set to: $$UNIQUE_ID"

If OpenSSL is not available, use this alternative:

- @export UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8) + @export UNIQUE_ID=$$(date +%s | sha256sum | head -c 8); \ + echo "UNIQUE_ID set to: $$UNIQUE_ID"

andrej1991 · 2025-09-22T12:24:19Z

/test eval-test

andrej1991 · 2025-09-22T13:07:00Z

/test eval-test

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

test/evals/eval.py (1)
84-86: Fix the typo and improve error message clarity.

There's a typo in "varialbe" and the error message could be clearer about the intended behavior.
-if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
-    sys.exit(1)
+if os.getenv('UNIQUE_ID') is None:
+    print("ERROR: UNIQUE_ID environment variable must be set for proper cluster creation and cleanup.")
+    sys.exit(1)
test/prow/entrypoint.sh (2)
8-12: Address the shellcheck warning for masked return values.

The export assignment in a single line can mask the curl command's return value, making error handling less reliable.
-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
+  -H "Content-Type: application/x-www-form-urlencoded" \
+  -d "grant_type=client_credentials" \
+  -d "client_id=$CLIENT_ID" \
+  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+export OCM_TOKEN
31-31: Remove duplicate --eval_data_yaml flag; pass one explicit YAML path.
Argparse uses the last value for duplicate flags (confirmed in test/evals/eval.py and by a small argparse test), so the second --eval_data_yaml overrides the first — update test/prow/entrypoint.sh (line 31) to pass only the intended file (e.g. either $TEST_DIR/eval_data.yaml or $TEMP_DIR/eval_data.yaml).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ddf0082 and da08af1.

📒 Files selected for processing (8)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

test/prow/delete_cluster.sh
test/prow/Dockerfile
Makefile
test/prow/cluster_create_test_setup.sh

🧰 Additional context used

🪛 Shellcheck (0.11.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

🔇 Additional comments (11)

test/evals/eval.py (2)

6-6: LGTM!

The import addition is necessary for the environment variable check.

95-100: LGTM! Stricter failure detection improves test reliability.

The change from failed_evals_count > 2 to if failed_evals_count: makes the test more strict by failing on any failure, which is appropriate for evaluation tests. The success message provides clear feedback.

test/prow/entrypoint.sh (2)

18-31: Duplicate comment: Apply security and quoting fixes from previous review.

The security concerns around temp directory cleanup, secret file permissions, and path quoting that were identified in the previous review remain unaddressed.

Apply the security improvements suggested in the previous review to ensure proper cleanup and secure handling of secrets.

25-29: Verified — sed substitution pattern matches eval_data.yaml

test/evals/eval_data.yaml contains "../prow" at lines 62, 89, 180, 181 (cleanup_script/setup_script), so s|: ../prow|: ${WORK_DIR}|g will correctly replace those entries.

test/evals/eval_data.yaml (6)

43-44: LGTM! Switching to accuracy evaluation improves reliability.

The change from keyword-based evaluation to accuracy-based evaluation with a specific expected response makes the test more deterministic and reliable than keyword matching.

62-62: LGTM! Adding cleanup scripts ensures proper test isolation.

The cleanup scripts will help maintain test environment hygiene by removing created clusters after test completion.

Also applies to: 89-89

101-102: Address the previous review comment about x86_64 architecture.

As mentioned in the previous review, adding x86_64 to the prompt would make the evaluation more reliable since the architecture doesn't affect the test outcome significantly.

Consider updating the eval_query to include architecture specification for consistency.

123-125: LGTM! Simplified accuracy check improves test reliability.

The simplified approach using accuracy evaluation instead of complex tool call expectations makes the test more maintainable and less brittle.

180-181: LGTM! Setup and cleanup scripts provide proper test isolation.

The addition of setup and cleanup scripts ensures each test starts with a known state and cleans up after itself.

187-188: LGTM! Adding list_clusters call improves test accuracy.

The expectation that list_clusters is called first before cluster_info reflects the realistic workflow for finding a cluster by name.

assisted-chat-pod.yaml (1)

50-53: Do not deploy hardcoded stage endpoints to production — verify and parameterize.

Hardcoded stage URLs were found; confirm these are intentional for testing and either parameterize them (ConfigMap/Secret/templating) or switch to production endpoints for production deployments.

Locations:

assisted-chat-pod.yaml (lines 51–53): INVENTORY_URL / PULL_SECRET_URL → https://api.stage.openshift.com/...

test/prow/delete_cluster.sh and test/prow/cluster_create_test_setup.sh: ASSISTED_SERVICE_URL set to stage (PULL_SECRET fetched from stage).

scripts/query.sh: contains both "stage" and "prod" BASE_URL mappings (assisted-chat.api.stage.openshift.com / assisted-chat.api.openshift.com).

template.yaml: ROUTE_HOST = "api.openshift.com" (prod host).

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

test/prow/entrypoint.sh (2)

8-12: Avoid SC2155; URL‑encode form fields; hard‑fail on token fetch.

Split assignment/export, use --data-urlencode for secrets, and bail out on errors.

-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+OCM_TOKEN="$(
+  curl -sSf -X POST "https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token" \
+    -H "Content-Type: application/x-www-form-urlencoded" \
+    --data-urlencode "grant_type=client_credentials" \
+    --data-urlencode "client_id=${CLIENT_ID}" \
+    --data-urlencode "client_secret=${CLIENT_SECRET}" \
+  | jq -r '.access_token'
+)"
+export OCM_TOKEN
+if [[ -z "${OCM_TOKEN:-}" ]]; then
+  echo "Failed to obtain OCM access token." >&2
+  exit 1
+fi

16-22: Secure temp dir usage and secrets; quote paths.

Set umask, cleanup trap, and write secret files with 0600 using printf.

-TEMP_DIR=$(mktemp -d)
-
-cd $TEMP_DIR
-
-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+TEMP_DIR="$(mktemp -d)"
+umask 077
+trap 'rm -rf "$TEMP_DIR"' EXIT
+
+cd "$TEMP_DIR"
+
+install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt"
+printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt"
+install -m 600 /dev/null "$TEMP_DIR/.env"
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env"

🧹 Nitpick comments (8)

test/evals/eval.py (3)

61-65: Help text mismatches default value for judge_model.

Help mentions gemini-2.0-flash but default is gemini-2.5-flash. Align to avoid confusion.

-    parser.add_argument(
-        "--judge_model",
-        default="gemini-2.5-flash",
-        help="Judge model for LLM evaluation (default: gemini-2.0-flash)",
-    )
+    parser.add_argument(
+        "--judge_model",
+        default="gemini-2.5-flash",
+        help="Judge model for LLM evaluation (default: gemini-2.5-flash)",
+    )

26-31: Consider supporting multiple --eval_data_yaml flags (entrypoint passes it twice).

Argparse currently takes a single value; repeated flags will just override. If desired, accept multiple and handle accordingly in AgentGoalEval.

-    parser.add_argument(
-        "--eval_data_yaml",
-        default="eval_data.yaml",
-        help="Path to evaluation data YAML file (default: eval_data.yaml)",
-    )
+    parser.add_argument(
+        "--eval_data_yaml",
+        action="append",
+        default=["eval_data.yaml"],
+        help="Path(s) to evaluation data YAML file(s); can be passed multiple times.",
+    )

If AgentGoalEval expects a single path, either merge upstream (in entrypoint) or extend AgentGoalEval to accept a list. Please confirm expected behavior.

84-86: Tighten env check; fix typos and stderr.

Guard should treat empty string as unset, print to stderr, and correct spelling.

-if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
-    sys.exit(1)
+if not os.environ.get("UNIQUE_ID"):
+    print(
+        "Environment variable UNIQUE_ID must be set so cluster creation/removal work (e.g., export UNIQUE_ID=$(date +%s)).",
+        file=sys.stderr,
+    )
+    sys.exit(1)

test/prow/entrypoint.sh (1)

26-29: Reduce noisy logs; quote variables.

Keep logs minimal on CI; quoting avoids globbing/splitting surprises.
-cat $TEMP_DIR/eval_data.yaml
-ls $WORK_DIR
-echo "---"
-echo $WORK_DIR
+cat "$TEMP_DIR/eval_data.yaml"
+ls -la "$WORK_DIR"
+echo "---"
+echo "$WORK_DIR"

test/prow/cluster_create_test_setup.sh (2)

9-11: Safer JSON escaping for pull secret.

Use jq -Rs to JSON‑escape rather than sed.

-PULL_SECRET=$(curl -sSf -X POST -H "Content-Type: application/json" -H "Authorization: Bearer ${OCM_TOKEN}" \
-              https://api.stage.openshift.com/api/accounts_mgmt/v1/access_token | sed 's/"/\\"/g')
+PULL_SECRET_RAW="$(curl -sSf -X POST \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer ${OCM_TOKEN}" \
+  "https://api.stage.openshift.com/api/accounts_mgmt/v1/access_token")"
+PULL_SECRET_ESCAPED="$(printf '%s' "$PULL_SECRET_RAW" | jq -Rs .)"

And use "pull_secret":${PULL_SECRET_ESCAPED} below.

21-29: Polling window likely too short; add max attempts param.

3 retries (≈30s) may be flaky. Consider ~2 minutes.

-COUNTER=0
-while ! curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | grep "eval-testcluster-${UNIQUE_ID}"; do
-    if [[ $COUNTER -gt 3 ]]; then
+COUNTER=0
+MAX_RETRIES="${MAX_RETRIES:-12}" # ~2 minutes at 10s
+while ! curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | grep -F "eval-testcluster-${UNIQUE_ID}" >/dev/null; do
+    if [[ $COUNTER -ge $MAX_RETRIES ]]; then
         echo "Cluster creation timed out"
         exit 1
     fi
     ((COUNTER++))
     sleep 10
 done

test/evals/eval_data.yaml (2)

113-114: Fix duplicated phrase in expected_response.

Minor text nit; duplicate “for cluster”.
-      expected_response: The SSH public key is set for the cluster for cluster
+      expected_response: The SSH public key is set for the cluster
123-126: Grammar tweak in expected_response (plural subject).

If judged by exact match, this will matter.
-      expected_response: "hosts hasn't been discovered yet."
+      expected_response: "hosts haven't been discovered yet."

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da08af1 and edb5580.

📒 Files selected for processing (8)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

test/prow/Dockerfile
test/prow/delete_cluster.sh
assisted-chat-pod.yaml
Makefile

🧰 Additional context used

🪛 Shellcheck (0.11.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

🔇 Additional comments (7)

test/evals/eval.py (1)

95-96: Stricter gating: any failure now fails the run. Verify intended threshold.

Previously only >2 failed evals failed the job (per PR summary). Confirm downstream jobs expect fail-on-first semantics.
test/prow/entrypoint.sh (2)
23-25: Quote paths; sed replacements on the temp copy only.

Use quotes to survive spaces and ensure correct file targeting.
-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../prow|: ${WORK_DIR}/test/prow|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|: ../prow|: ${WORK_DIR}/test/prow|g" "$TEMP_DIR/eval_data.yaml"
31-31: Duplicate --eval_data_yaml flag; either pass one file or teach eval.py to accept multiple.

Right now argparse takes one value; the last flag wins. Prefer passing only the temp‑patched YAML.
-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"
If you adopt action=append in eval.py, keep both flags and rely on append.
test/evals/eval_data.yaml (3)

101-103: Explicit cpu_architecture and empty ssh_public_key look consistent. LGTM.

179-193: Setup/cleanup integration and tool-call expectations look coherent.

The name, domain, and version align with cluster_create_test_setup.sh payload. No issues.

Please confirm the evaluator tolerates additional tool calls (e.g., list_clusters before cluster_info) in any order for this test.

43-45: Switch to accuracy eval for static networking refusal looks good.

Text is unambiguous and should reduce judge flapping.
test/prow/cluster_create_test_setup.sh (1)
1-8: Validate required environment early.

Explicitly check UNIQUE_ID and OCM_TOKEN; fail fast with a clear message.
 #!/bin/bash
 
 set -o nounset
 set -o errexit
 set -o pipefail
 
+if [[ -z "${UNIQUE_ID:-}" ]]; then
+  echo "UNIQUE_ID is required (e.g., export UNIQUE_ID=\$(date +%s))." >&2
+  exit 1
+fi
+if [[ -z "${OCM_TOKEN:-}" ]]; then
+  echo "OCM_TOKEN is required in the environment." >&2
+  exit 1
+fi

coderabbitai · 2025-09-22T15:22:29Z

test/prow/cluster_create_test_setup.sh

+CLUSTER_ID=$(curlcurl -sSf -X POST -H "Content-Type: application/json" -H "Authorization: Bearer ${OCM_TOKEN}" \
+    -d "{\"name\":\"eval-testcluster-${UNIQUE_ID}\",\"control_plane_count\":1,\"openshift_version\":\"4.18.22\",\"pull_secret\":\"${PULL_SECRET}\",\"base_dns_domain\":\"test.local\"}" \
+    "${ASSISTED_SERVICE_URL}/clusters" | jq '.id')
+


⚠️ Potential issue

Typos break cluster creation (curlcurl); quote and parse ID robustly.

Fix doubled command, use jq -r, and handle empty ID.

-CLUSTER_ID=$(curlcurl -sSf -X POST -H "Content-Type: application/json" -H "Authorization: Bearer ${OCM_TOKEN}" \ - -d "{\"name\":\"eval-testcluster-${UNIQUE_ID}\",\"control_plane_count\":1,\"openshift_version\":\"4.18.22\",\"pull_secret\":\"${PULL_SECRET}\",\"base_dns_domain\":\"test.local\"}" \ - "${ASSISTED_SERVICE_URL}/clusters" | jq '.id') +CLUSTER_ID="$( + curl -sSf -X POST \ + -H "Content-Type: application/json" \ + -H "Authorization: Bearer ${OCM_TOKEN}" \ + -d "{\"name\":\"eval-testcluster-${UNIQUE_ID}\",\"control_plane_count\":1,\"openshift_version\":\"4.18.22\",\"pull_secret\":${PULL_SECRET_ESCAPED},\"base_dns_domain\":\"test.local\"}" \ + "${ASSISTED_SERVICE_URL}/clusters" | jq -r '.id' +)" +if [[ -z "${CLUSTER_ID:-}" ]]; then + echo "Failed to create cluster or parse cluster ID." >&2 + exit 1 +fi

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In test/prow/cluster_create_test_setup.sh around lines 12 to 15, the cluster creation call has a typo and fragile parsing: change the doubled "curlcurl" to "curl", parse the returned JSON id with jq -r to get the raw string (not quoted), and add a check that the resulting CLUSTER_ID is non-empty (log an error and exit if empty). Prefer capturing the full response to a variable before parsing so you can log it on failure and ensure the POST payload quoting remains intact.

test/scripts/cluster_create_test_setup.sh

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/prow/entrypoint.sh (1)

8-12: Harden token fetch; fix SC2155; fail fast on HTTP; avoid sed hack.

Split assignment/export to satisfy SC2155; add curl --fail-with-body -sS --retry; use jq -r to drop sed.
URL-encode client_id/secret to be safe.

Apply:

-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+OCM_TOKEN="$(
+  curl -sS --fail-with-body -X POST \
+    -H "Content-Type: application/x-www-form-urlencoded" \
+    --retry 3 --retry-connrefused --connect-timeout 10 --max-time 30 \
+    --data-urlencode "grant_type=client_credentials" \
+    --data-urlencode "client_id=${CLIENT_ID}" \
+    --data-urlencode "client_secret=${CLIENT_SECRET}" \
+    "https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token" \
+  | jq -r '.access_token'
+)"
+export OCM_TOKEN

🧹 Nitpick comments (5)

test/prow/entrypoint.sh (2)
23-25: Quote paths; prefer YAML-safe edits (yq) over sed.

Quote cp/sed args.

Consider yq to rewrite cleanup/setup paths robustly.

Apply minimal quoting:
-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../prow|: ${WORK_DIR}/test/prow|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|: ../prow|: ${WORK_DIR}/test/prow|g" "$TEMP_DIR/eval_data.yaml"
Optional yq-based edit (if yq available):
yq -i '. as $r | (.. | select(has("cleanup_script"))).cleanup_script |= (env.WORK_DIR + "/test/prow/" + (.|split("/")|.[-1]))' "$TEMP_DIR/eval_data.yaml"
26-29: Avoid noisy/unnecessary logging; quote vars.

Quote file/vars; consider gating cat/ls/echo behind VERBOSE.

Apply:
-cat $TEMP_DIR/eval_data.yaml
-ls $WORK_DIR
+cat "$TEMP_DIR/eval_data.yaml"
+ls -la "$WORK_DIR"
@@
-echo $WORK_DIR
+echo "$WORK_DIR"
test/evals/eval_data.yaml (3)
43-45: Consistency/naming nit in expected_response.

Use “Assisted Installer” capitalization for consistency with other tests.

Apply:
-expected_response: "I do not support creating clusters with static networking, however, you can use the assisted-installer web-based wizard to configure a cluster with static networking."
+expected_response: "I do not support creating clusters with static networking; however, you can use the Assisted Installer web-based wizard to configure a cluster with static networking."
111-114: Duplicate words in expected_response.

“The SSH public key is set for the cluster for cluster” has “for cluster” twice; fix to avoid accidental failure.

Apply:
-expected_response: The SSH public key is set for the cluster for cluster
+expected_response: The SSH public key is set for the cluster
123-125: Grammar tweak to reduce brittleness.

Prefer plural/aux agreement or relax to substring to avoid exact-match brittleness.

Apply:
-expected_response: "hosts hasn't been discovered yet."
+expected_response: "Hosts haven't been discovered yet."
Or switch to sub-string eval with expected_keywords: ["discovered", "yet"].

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between edb5580 and 2b3b9a7.

📒 Files selected for processing (8)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

test/prow/delete_cluster.sh
test/evals/eval.py
test/prow/Dockerfile
test/prow/cluster_create_test_setup.sh
assisted-chat-pod.yaml
Makefile

🧰 Additional context used

🪛 Shellcheck (0.11.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request

🔇 Additional comments (6)

test/prow/entrypoint.sh (2)
16-21: Secure temp dir and secrets; quote paths; add cleanup trap.

Add umask 077 and trap; quote cd; write secrets with 0600 perms via install/printf, not echo.

Apply:
-TEMP_DIR=$(mktemp -d)
+TEMP_DIR=$(mktemp -d)
+umask 077
+trap 'rm -rf "$TEMP_DIR"' EXIT
@@
-cd $TEMP_DIR
+cd "$TEMP_DIR"
@@
-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt"
+printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt"
+install -m 600 /dev/null "$TEMP_DIR/.env"
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env"
31-31: Duplicate --eval_data_yaml; quote args; pass a single effective file.

Passing the same flag twice likely overrides prior; either make eval.py accept append or pass only the temp file.

Apply:
-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"
Verify argparse behavior:
#!/bin/bash
rg -nC2 --type=python 'add_argument\(.+eval_data_yaml' test/evals || true
test/evals/eval_data.yaml (4)

62-62: LGTM: cleanup_script added.

Hooking delete script here aligns with the entrypoint substitution.

93-104: LGTM: clearer prompt; explicit arch; empty SSH key.

These tighten the tool-call expectations without over-constraining response text.

101-103: LGTM: explicit cpu_architecture and ssh_public_key.

Matches intent to set SSH later in the flow.

179-193: LGTM — add setup/cleanup and explicit list_clusters + cluster_info; confirm evaluator ordering/extra-call behavior.

expected_tool_calls live in test/evals/eval_data.yaml and the runner imports AgentGoalEval from lsc_agent_eval in test/evals/eval.py; no evaluator implementation found in this repo to confirm strictness — confirm AgentGoalEval permits extra tool calls and non-strict ordering, or update tests to express intended ordering.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/prow/entrypoint.sh (1)

8-12: Avoid SC2155; fail fast on HTTP errors; use jq -r.

Apply:

-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+OCM_TOKEN=$(
+  curl -sS --fail-with-body -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
+    -H "Content-Type: application/x-www-form-urlencoded" \
+    -d "grant_type=client_credentials" \
+    -d "client_id=$CLIENT_ID" \
+    -d "client_secret=$CLIENT_SECRET" \
+  | jq -r '.access_token'
+)
+export OCM_TOKEN

🧹 Nitpick comments (6)

Makefile (1)
109-115: Optional: avoid future env leakage with .ONESHELL.

Add near the top:
+.ONESHELL:
test/evals/eval.py (2)
84-86: Fix typo and write error to stderr.

Apply:
-if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+if os.getenv("UNIQUE_ID") is None:
+    print("The environment variable 'UNIQUE_ID' must be set for cluster creation and cleanup.", file=sys.stderr)
     sys.exit(1)
61-65: Help text mismatch for judge_model default.

Apply:
-        help="Judge model for LLM evaluation (default: gemini-2.0-flash)",
+        help="Judge model for LLM evaluation (default: gemini-2.5-flash)",
test/prow/entrypoint.sh (1)
8-12: Guard required env vars for clarity.

Add early in the script:
+: "${CLIENT_ID:?CLIENT_ID is required}"
+: "${CLIENT_SECRET:?CLIENT_SECRET is required}"
+: "${GEMINI_API_KEY:?GEMINI_API_KEY is required}"
+: "${UNIQUE_ID:?UNIQUE_ID is required}"
test/evals/eval_data.yaml (2)
123-125: Grammar nit (may affect accuracy eval).
-      expected_response: "hosts hasn't been discovered yet."
+      expected_response: "hosts haven't been discovered yet."
Update any corresponding logic/messages accordingly.

111-114: Typo in expected response ("for cluster for cluster").
-      expected_response: The SSH public key is set for the cluster for cluster
+      expected_response: The SSH public key is set for the cluster

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b3b9a7 and 3840300.

📒 Files selected for processing (8)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

test/prow/Dockerfile
assisted-chat-pod.yaml
test/prow/cluster_create_test_setup.sh

🧰 Additional context used

🪛 Shellcheck (0.11.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (6)

Makefile (1)

109-115: UNIQUE_ID doesn’t persist across recipe lines; /dev/urandom usage is broken.

The export runs in its own shell and won’t be visible to later lines, so eval.py will exit(1) on missing UNIQUE_ID. Also head /dev/urandom is non-portable and unsafe.

Apply:

 test-eval: ## Run agent evaluation tests
-	@export UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8)
-	@echo "Refreshing OCM token..."
-	@. utils/ocm-token.sh && get_ocm_token && echo "$$OCM_TOKEN" > test/evals/ocm_token.txt
-	@echo "Running agent evaluation tests..."
-	@cd test/evals && python eval.py
+	@UNIQUE_ID=$$(openssl rand -hex 4 2>/dev/null | tr '[:upper:]' '[:lower:]' || date +%s | sha256sum | head -c 8); \
+	  export UNIQUE_ID; \
+	  echo "UNIQUE_ID set to: $$UNIQUE_ID"; \
+	  echo "Refreshing OCM token..."; \
+	  . utils/ocm-token.sh && get_ocm_token && printf "%s" "$$OCM_TOKEN" > test/evals/ocm_token.txt; \
+	  echo "Running agent evaluation tests..."; \
+	  cd test/evals && env UNIQUE_ID="$$UNIQUE_ID" python eval.py

test/prow/entrypoint.sh (1)

16-23: Quote paths, secure secrets, and pass a single eval_data file.

Apply:

-TEMP_DIR=$(mktemp -d)
-
-cd $TEMP_DIR
-
-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+TEMP_DIR=$(mktemp -d)
+umask 077
+trap 'rm -rf "$TEMP_DIR"' EXIT
+
+cd "$TEMP_DIR"
+
+install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt"
+printf '%s' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt"
+install -m 600 /dev/null "$TEMP_DIR/.env"
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env"
@@
-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../prow|: ${WORK_DIR}/test/prow|g" $TEMP_DIR/eval_data.yaml
-cat $TEMP_DIR/eval_data.yaml
-ls $WORK_DIR
-echo "---"
-echo $WORK_DIR
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i.bak "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i.bak "s|: ../prow|: ${WORK_DIR}/test/prow|g" "$TEMP_DIR/eval_data.yaml"
+rm -f "$TEMP_DIR/eval_data.yaml.bak"
@@
-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"

Also applies to: 31-31

test/evals/eval_data.yaml (3)

43-45: Switch to accuracy eval: confirm exact wording stability.

If this string is sourced from UX copy, lock it here; otherwise consider sub-string to reduce flakiness.

60-63: Good: add cleanup script to prevent leaked clusters.

Ensure delete_cluster.sh runs with OCM_TOKEN and UNIQUE_ID in the environment in CI and local runs.

179-193: Setup/cleanup wiring looks right; verify placeholders resolve.

Confirm cluster_create_test_setup.sh creates:

name: eval-testcluster-${UNIQUE_ID}

base_domain: test.local

version: 4.18.22
so that expected_keywords align.

test/prow/delete_cluster.sh (1)

1-16: Harden deletions: validate env, fail on HTTP errors, retry, and read safely.

Apply:

 #!/bin/bash

 set -o nounset
 set -o errexit
 set -o pipefail

 ASSISTED_SERVICE_URL="https://api.stage.openshift.com/api/assisted-install/v2"

-curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do
+: "${OCM_TOKEN:?OCM_TOKEN is required}"
+: "${UNIQUE_ID:?UNIQUE_ID is required}"
+
+curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+  -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+  "${ASSISTED_SERVICE_URL}/clusters" \
+| jq -c '[.[] | {id, name}][]' \
+| while IFS= read -r item; do
-  id=$(echo "$item" | jq -r '.id')
-  name=$(echo "$item" | jq -r '.name')
+  id=$(echo "$item" | jq -r '.id')
+  name=$(echo "$item" | jq -r '.name')
   if [[ "$name" == *"-${UNIQUE_ID}" ]]; then
     echo "The cluster '${name}', ${id} is going to be deleted"
-    curl -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}"
+    curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+      -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+      "${ASSISTED_SERVICE_URL}/clusters/${id}" \
+    || echo "WARN: failed to delete cluster ${id} (${name})"
   fi
 done

coderabbitai · 2025-09-22T20:53:09Z

test/evals/eval_data.yaml

+      eval_query: Create a multi-node cluster named 'eval-test-multinode-uniq-cluster-name' with OpenShift 4.18.22 and domain test.local.
      expected_tool_calls:
        - - tool_name: create_cluster
            arguments:
              name: "eval-test-multinode-uniq-cluster-name"
              version: "4\\.18\\.22"
              base_domain: "test\\.local"
              single_node: "(?i:false)"
-              cpu_architecture: None
-              ssh_public_key: None
+              cpu_architecture: "x86_64"
+              ssh_public_key: ""
      expected_keywords: ["eval-test-multinode-uniq-cluster-name", "ID", "Discovery ISO", "cluster"]


⚠️ Potential issue

Don’t require an empty ssh_public_key argument.

Some tools omit empty fields; forcing ssh_public_key: "" will fail matching. Remove it or make it optional.

arguments: name: "eval-test-multinode-uniq-cluster-name" version: "4\\.18\\.22" base_domain: "test\\.local" single_node: "(?i:false)" - cpu_architecture: "x86_64" - ssh_public_key: "" + cpu_architecture: "x86_64"

If you must assert absence, consider a custom matcher that treats missing or empty as equivalent.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

eval_query: Create a multi-node cluster named 'eval-test-multinode-uniq-cluster-name' with OpenShift 4.18.22 and domain test.local.

expected_tool_calls:

- - tool_name: create_cluster

arguments:

name: "eval-test-multinode-uniq-cluster-name"

version: "4\\.18\\.22"

base_domain: "test\\.local"

single_node: "(?i:false)"

cpu_architecture: None

ssh_public_key: None

cpu_architecture: "x86_64"

ssh_public_key: ""

expected_keywords: ["eval-test-multinode-uniq-cluster-name", "ID", "Discovery ISO", "cluster"]

eval_query: Create a multi-node cluster named 'eval-test-multinode-uniq-cluster-name' with OpenShift 4.18.22 and domain test.local.

expected_tool_calls:

- - tool_name: create_cluster

arguments:

name: "eval-test-multinode-uniq-cluster-name"

version: "4\\.18\\.22"

base_domain: "test\\.local"

single_node: "(?i:false)"

cpu_architecture: "x86_64"

expected_keywords: ["eval-test-multinode-uniq-cluster-name", "ID", "Discovery ISO", "cluster"]

🤖 Prompt for AI Agents

In test/evals/eval_data.yaml around lines 93 to 103, the expected_tool_calls block is forcing ssh_public_key: "" which breaks matching when tools omit empty fields; remove the ssh_public_key entry from expected arguments (or change it to an optional/matcher that accepts missing or empty) so the test does not require an explicit empty string—update the YAML to either omit the ssh_public_key key entirely or replace it with a matcher that treats absent and empty as equivalent.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

assisted-chat-pod.yaml (1)
50-53: Parameterize stage URLs instead of hard-coding (keep local/prod parity).

Be consistent with other envs in this manifest (e.g., GEMINI_API_KEY, AIUI_OCM_REFRESH_TOKEN) and allow overrides via envsubst/kustomize.

Apply this diff:
         - name: INVENTORY_URL
-          value: "https://api.stage.openshift.com/api/assisted-install/v2"
+          value: ${INVENTORY_URL}
         - name: PULL_SECRET_URL
-          value: "https://api.stage.openshift.com/api/accounts_mgmt/v1/access_token"
+          value: ${PULL_SECRET_URL}
Verify your deploy path runs envsubst or equivalent so ${…} get expanded before kubectl apply.
test/prow/entrypoint.sh (2)
23-25: Quote cp/sed args to handle spaces; keep consistent.

Apply this diff:
-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../prow|: ${WORK_DIR}/test/prow|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|: ../prow|: ${WORK_DIR}/test/prow|g" "$TEMP_DIR/eval_data.yaml"
25-29: Debug prints can leak context; gate under DEBUG.

Apply this diff:
-cat $TEMP_DIR/eval_data.yaml
-ls $WORK_DIR
-echo "---"
-echo $WORK_DIR
+if [[ -n "${DEBUG:-}" ]]; then
+  cat "$TEMP_DIR/eval_data.yaml"
+  ls "$WORK_DIR"
+  echo "---"
+  echo "$WORK_DIR"
+fi
test/evals/eval.py (1)
84-86: UNIQUE_ID guard misses empty-string; fix and tighten message.

Empty env var bypasses the check; also fix typo and print to stderr.

Apply this diff:
-if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
-    sys.exit(1)
+unique_id = os.getenv("UNIQUE_ID")
+if not unique_id:
+    print("The environment variable 'UNIQUE_ID' must be set so cluster creation and cleanup work properly.", file=sys.stderr)
+    sys.exit(1)
test/evals/eval_data.yaml (3)
113-113: Fix duplicated wording in expected response.

Current text is self-referential and likely won’t match.

Apply this diff:
-      expected_response: The SSH public key is set for the cluster for cluster
+      expected_response: The SSH public key is set for the cluster.
123-125: Grammar fix to improve exact-match reliability.

Plural subject requires “haven’t”.

Apply this diff:
-      expected_response: "hosts hasn't been discovered yet."
+      expected_response: "hosts haven't been discovered yet."
176-176: Minor terminology nit (optional): “on‑premises” vs “on‑premise”.

Apply this diff:
-      expected_response: I can help you install OpenShift on-premise using the Assisted Installer, either on bare metal servers or virtual machines (e.g., vSphere, KVM, libvirt). I do not support public cloud platforms like AWS, Azure, or GCP.
+      expected_response: I can help you install OpenShift on-premises using the Assisted Installer, either on bare metal servers or virtual machines (e.g., vSphere, KVM, libvirt). I do not support public cloud platforms like AWS, Azure, or GCP.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3840300 and 6c11fee.

📒 Files selected for processing (8)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (5 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/cluster_create_test_setup.sh (1 hunks)
test/prow/delete_cluster.sh (1 hunks)
test/prow/entrypoint.sh (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

test/prow/delete_cluster.sh
test/prow/Dockerfile
test/prow/cluster_create_test_setup.sh
Makefile

🧰 Additional context used

🪛 Shellcheck (0.11.0)

test/prow/entrypoint.sh

[warning] 8-8: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (8)

test/prow/entrypoint.sh (3)
16-21: Quote paths; secure secret files; cleanup temp dir.

Quote cd/paths.

Create secret files with 0600, write via printf.

Add umask 077 and trap to remove the temp dir.

Apply this diff:
 TEMP_DIR=$(mktemp -d)
+umask 077
+trap 'rm -rf "$TEMP_DIR"' EXIT
 
-cd $TEMP_DIR
+cd "$TEMP_DIR"
 
-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt"
+printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt"
+install -m 600 /dev/null "$TEMP_DIR/.env"
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env"
31-31: Pass a single --eval_data_yaml (argparse doesn’t merge duplicates).

Only the last wins; drop the original to avoid confusion. Alternatively, change eval.py to action='append'. Prior feedback noted this.

Apply this diff:
-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"
8-12: Avoid export-with-command-subst; add retries/robustness; support local OCM_TOKEN

File: test/prow/entrypoint.sh (lines 8-12) — Replace the inline export with guarded logic that honors an existing $OCM_TOKEN, uses curl -sSsf with retries, parses with jq -r, and fails if empty (addresses SC2155).
-export OCM_TOKEN=$(curl -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq '.access_token' | sed "s/^['\"]*//; s/['\"]*$//")
+if [[ -z "${OCM_TOKEN:-}" ]]; then
+  resp="$(curl -sSsf -X POST 'https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token' \
+    -H 'Content-Type: application/x-www-form-urlencoded' \
+    --retry 3 --retry-connrefused \
+    -d "grant_type=client_credentials" \
+    -d "client_id=${CLIENT_ID}" \
+    -d "client_secret=${CLIENT_SECRET}")"
+  OCM_TOKEN="$(printf '%s' "$resp" | jq -r '.access_token // empty')"
+  if [[ -z "$OCM_TOKEN" ]]; then
+    echo "Failed to obtain OCM token" >&2
+    exit 1
+  fi
+fi
+export OCM_TOKEN
Confirmed: --eval_data_yaml is defined only once in test/evals/eval.py (parser.add_argument at lines 26–30).
test/evals/eval.py (1)

95-96: LGTM on stricter failure gating.

Failing on any FAIL/ERROR is appropriate for stability.
test/evals/eval_data.yaml (4)
43-45: LGTM: clearer, deterministic expectation for static networking.

62-62: Good addition: cleanup_script to avoid resource leaks.

87-90: LGTM: add cleanup_script for MNO flow.

101-103: Don’t require ssh_public_key: "" in expected tool args.

Some toolchains omit empty fields; forcing an explicit empty string breaks matching. Remove it or make it optional.

Apply this diff:
               single_node: "(?i:false)"
-              cpu_architecture: "x86_64"
-              ssh_public_key: ""
+              cpu_architecture: "x86_64"

test/evals/eval_data.yaml

andrej1991 · 2025-09-23T10:39:34Z

/test eval-test

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

test/scripts/cluster_create_test_setup.sh (2)
55-63: Extend polling timeout and suppress noisy output.

The polling loop attempts only 4 times (≈40 seconds total), which may be insufficient for API propagation delays. Also, the successful curl output pollutes logs. Based on author feedback that 30 seconds should suffice for cluster definition, consider increasing to 6 attempts (60s) for safety margin and redirect successful output.

Apply this diff:
 COUNTER=0
-while ! curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${CLUSTER_ID}"; do
-    if [[ $COUNTER -gt 3 ]]; then
+MAX_ATTEMPTS=6
+while ! curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" \
+         "${ASSISTED_SERVICE_URL}/clusters/${CLUSTER_ID}" >/dev/null 2>&1; do
+    if [[ $COUNTER -ge $MAX_ATTEMPTS ]]; then
         echo "Cluster creation timed out"
         exit 1
     fi
     ((COUNTER++))
     sleep 10
 done
+echo "Cluster ${CLUSTER_ID} verified successfully"
Based on past review comments.

7-8: Document UNIQUE_ID requirements for local usage.

Per past reviewer feedback, it's unclear whether UNIQUE_ID will be set when running eval tests locally. Consider adding a comment explaining how developers should set this variable for local execution.

Add documentation:
 : "${OCM_TOKEN:?OCM_TOKEN is required}"
+# UNIQUE_ID should be set by CI (e.g., via Makefile) or manually for local runs
 : "${UNIQUE_ID:?UNIQUE_ID is required}"
Based on past review comments.
Makefile (1)
128-130: Simplify strict mode flags.

The recipe uses separate set -e and set -o pipefail statements. While functional under .ONESHELL: with bash, consolidating into a single line improves readability and is more conventional.

Apply this diff:
 	#!/bin/bash
-	set -e
-	set -o pipefail
+	set -euo pipefail

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b366673 and 0b4383a.

📒 Files selected for processing (11)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
scripts/eval_k8s.sh (1 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (7 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/entrypoint.sh (2 hunks)
test/prow/template.yaml (2 hunks)
test/scripts/cluster_create_test_setup.sh (1 hunks)
test/scripts/cluster_created_check.sh (1 hunks)
test/scripts/delete_cluster.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

assisted-chat-pod.yaml
test/prow/template.yaml
test/prow/Dockerfile

🧰 Additional context used

🪛 Shellcheck (0.11.0)

scripts/eval_k8s.sh

[warning] 22-22: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (28)

test/evals/eval.py (2)
95-95: Good change: stricter failure threshold improves test reliability.

Changing from > 2 to any non-zero count means the test will fail on the first error, making the evaluation more deterministic and catching issues earlier.

84-86: Fix typo and print to stderr.

The guard message contains a typo ("varialbe" → "variable") and should print to stderr for clearer UX when the check fails.

Apply this diff:
-if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+if os.getenv("UNIQUE_ID") is None:
+    print("Environment variable 'UNIQUE_ID' must be set so cluster creation and cleanup can proceed.", file=sys.stderr)
     sys.exit(1)
scripts/eval_k8s.sh (2)
23-27: Good additions: cleanup trap and proper quoting.

The trap ensures temp directory cleanup on exit, and all paths are now properly quoted. These changes improve reliability and avoid word-splitting issues.

20-22: Fix UNIQUE_ID generation and SC2155 warning.

head /dev/urandom is line-based and can hang waiting for newlines. Use a byte-safe approach. Also, separate assignment and export on line 22 to avoid masking return values (SC2155).

Apply this diff:
-UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8)
-export UNIQUE_ID
-export TEMP_DIR=$(mktemp -d)
+UNIQUE_ID="$(LC_ALL=C tr -dc 'a-z0-9' </dev/urandom | head -c 8)"
+export UNIQUE_ID
+TEMP_DIR="$(mktemp -d)"
+export TEMP_DIR
test/prow/entrypoint.sh (3)
17-28: Add security hardening: quote paths, secure secrets, and add cleanup trap.

Multiple issues remain: unquoted variable expansions risk word-splitting, secrets are written insecurely with echo, and the temp directory isn't cleaned up on exit.

Apply this diff:
 TEMP_DIR=$(mktemp -d)
+umask 077
+trap 'rm -rf "$TEMP_DIR"' EXIT
 
-cd $TEMP_DIR
+cd "$TEMP_DIR"
 
-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt"
+printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt"
+install -m 600 /dev/null "$TEMP_DIR/.env"
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env"
 
-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../scripts|: ${WORK_DIR}/test/scripts|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|: ../scripts|: ${WORK_DIR}/test/scripts|g" "$TEMP_DIR/eval_data.yaml"
 
-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"
28-28: Good fix: removed duplicate --eval_data_yaml argument.

The previous version passed --eval_data_yaml twice, which would cause argparse to use only the last value. Now it correctly passes only the processed YAML from TEMP_DIR.

8-13: Separate token assignment and export to avoid SC2155.

Inline export masks curl failures and makes error handling difficult. Assign to a variable first, validate it's non-empty, then export.

Apply this diff:
-OCM_TOKEN=$(curl -sSf -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
+OCM_TOKEN="$(curl -sS --fail-with-body --connect-timeout 10 --max-time 60 \
+  -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
   -H "Content-Type: application/x-www-form-urlencoded" \
   -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq -r '.access_token')
+  -d "client_id=${CLIENT_ID}" \
+  -d "client_secret=${CLIENT_SECRET}" | jq -r '.access_token'
+)"
+if [[ -z "${OCM_TOKEN}" || "${OCM_TOKEN}" == "null" ]]; then
+  echo "Failed to obtain OCM_TOKEN" >&2
+  exit 1
+fi
 export OCM_TOKEN
test/scripts/delete_cluster.sh (1)
11-18: Add robustness: use --fail-with-body, retries, and read -r.

The curl commands lack error handling and the read is susceptible to backslash mangling. Without --fail-with-body on the DELETE request, HTTP errors won't abort the script, risking leaked clusters.

Apply this diff:
-curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do
+curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+  -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+  "${ASSISTED_SERVICE_URL}/clusters" \
+| jq -c '[.[] | {id, name}][]' \
+| while IFS= read -r item; do
   id=$(echo "$item" | jq -r '.id')
   name=$(echo "$item" | jq -r '.name')
   if [[ "$name" == *"-${UNIQUE_ID}" ]]; then
     echo "The cluster '${name}', ${id} is going to be deleted"
-    curl -fsS -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}"
+    curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+      -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+      "${ASSISTED_SERVICE_URL}/clusters/${id}" \
+    || echo "WARN: failed to delete cluster ${id} (${name})" >&2
   fi
 done
test/scripts/cluster_created_check.sh (1)
11-19: Harden polling: use --fail-with-body, grep -q, and make timeout configurable.

The polling loop doesn't distinguish HTTP errors from "not found", uses noisy grep, and has a hardcoded timeout. Make the max attempts configurable and improve curl error handling.

Apply this diff:
+MAX_ATTEMPTS=${MAX_ATTEMPTS:-30}
+SLEEP_SECONDS=${SLEEP_SECONDS:-10}
+ATTEMPTS=0
-COUNTER=0
-while ! curl -sSf -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | grep "${UNIQUE_ID}"; do
-    if [[ $COUNTER -gt 3 ]]; then
-        echo "Cluster creation timed out"
-        exit 1
-    fi
-    ((COUNTER++))
-    sleep 10
-done
+while true; do
+  if curl -sS --fail-with-body --retry 3 --connect-timeout 10 --max-time 60 \
+      -H "Authorization: Bearer ${OCM_TOKEN}" \
+      -H "Accept: application/json" \
+      "${ASSISTED_SERVICE_URL}/clusters" | grep -q "${UNIQUE_ID}"; then
+    break
+  fi
+  if [[ ${ATTEMPTS} -ge ${MAX_ATTEMPTS} ]]; then
+    echo "Cluster creation timed out after $((MAX_ATTEMPTS * SLEEP_SECONDS)) seconds" >&2
+    exit 1
+  fi
+  ((ATTEMPTS++))
+  sleep "${SLEEP_SECONDS}"
+done
test/scripts/cluster_create_test_setup.sh (5)
1-6: LGTM! Strict shell mode properly configured.

The shebang and error-handling flags (nounset, errexit, pipefail) provide robust error detection for this setup script.

12-17: LGTM! Pull secret fetched securely.

Using jq-based JSON construction (addressed in previous commits) prevents injection issues. The token is passed via header rather than embedded in the payload.

19-27: LGTM! Cluster payload constructed safely with jq.

Using jq -n with --arg and --argjson properly escapes all variables and prevents JSON injection vulnerabilities.

37-44: LGTM! Infra-env payload safely constructed.

Consistent use of jq -n for JSON construction maintains security and correctness.

29-35: Validate CLUSTER_ID before proceeding.

The script extracts CLUSTER_ID from the API response but doesn't verify it's valid before using it in subsequent API calls. If the cluster creation fails or returns malformed JSON, the infra-env creation will fail with a cryptic error.

Apply this diff:
 CLUSTER_ID="$(
   curl -sSf -X POST \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer ${OCM_TOKEN}" \
     -d "${CLUSTER_PAYLOAD}" \
     "${ASSISTED_SERVICE_URL}/clusters" | jq -r '.id'
 )"
+
+if [[ -z "${CLUSTER_ID}" || "${CLUSTER_ID}" == "null" ]]; then
+  echo "Failed to create cluster: missing or null CLUSTER_ID" >&2
+  exit 1
+fi
Based on past review comments.
Makefile (4)
131-133: UNIQUE_ID generation is adequate for test isolation.

While past comments suggested using openssl rand or sha256sum for portability, the current approach (/dev/urandom | tr | head) is widely supported on Linux CI environments and generates sufficient entropy for test cluster naming. The 8-character hex ID provides good collision resistance for parallel test runs.

134-139: LGTM! Eval data preparation workflow is sound.

The script correctly sources the token utility, copies eval_data.yaml to a temp directory, substitutes the UNIQUE_ID placeholder, and runs eval.py with the updated path. The trap ensures cleanup on exit.

141-155: LGTM! Token handling properly secured in test-eval-k8s.

Lines 147-148 correctly apply umask 077 before writing the token file, ensuring mode 0600. The conditional logic handles both pre-set OCM_TOKEN and sourcing from the utility script.

126-139: Set restrictive permissions when writing OCM_TOKEN to disk.

Line 136 writes the sensitive OCM_TOKEN to a file without explicitly setting restrictive permissions. Although the file may inherit safe umask defaults in CI, explicitly ensuring mode 0600 prevents accidental credential exposure in local dev environments.

Apply this diff:
 	export UNIQUE_ID=$(shell head /dev/urandom | tr -dc 0-9a-z | head -c 8)
 	. utils/ocm-token.sh
 	get_ocm_token
+	umask 077
 	echo "$$OCM_TOKEN" > test/evals/ocm_token.txt
Based on past review comments.
test/evals/eval_data.yaml (10)
55-55: LGTM! Cleanup script added for test stability.

Adding the cleanup script ensures clusters created during this test are removed, reducing cluster pool pollution and improving test environment uniformity as stated in the PR objectives.

59-70: LGTM! Comprehensive evaluation with verification.

Adding action_eval type and the eval_verify_script ensures the cluster was actually created (not just that the tool was invoked), directly addressing one of the PR objectives. The addition of the platform: "none" parameter clarifies deployment intent.

84-99: LGTM! Multi-node cluster flow enhanced with verification and cleanup.

Adding action_eval, eval_verify_script, and cleanup_script brings the multi-node test to parity with SNO tests. The explicit cpu_architecture and platform parameters make the test more deterministic.

101-101: LGTM! Enhanced expected response improves test clarity.

The updated response now explicitly mentions DHCP as the default and offers static network configuration, making the multi-node workflow more realistic and instructive.

177-189: LGTM! Setup script ensures cluster exists for name-based lookup.

Adding setup_script addresses the case where chat-created clusters might retain UUIDs in their names, ensuring a known cluster exists for the test to query. The list_clusters followed by cluster_info flow correctly tests name-to-ID resolution.

164-165: LGTM! Improved refusal response with helpful guidance.

The updated response now uses action_eval and provides clearer guidance about on-prem scope and alternatives for Azure deployments, improving the user experience while maintaining appropriate boundaries.

194-209: LGTM! Static network config test enhanced with verification.

Adding action_eval, eval_verify_script, and cleanup_script ensures the cluster is actually created and subsequently cleaned up, directly supporting the PR's stability objectives.

120-122: Fix grammatical error in expected response.

"hosts hasn't" is grammatically incorrect; should be "hosts haven't" (plural).

Apply this diff:
-      expected_response: "hosts hasn't been discovered yet."
+      expected_response: "hosts haven't been discovered yet."
Based on past review comments.

223-227: NMState validation will fail: literal IP doesn't match full YAML payload.

The alter_static_network_config_nmstate_for_host tool submits a complete NMState YAML document (with interfaces, VLANs, DNS, etc.), but the expected new_nmstate_yaml is set to the literal string "10.0.0.5". This guarantees the action_eval will flag every correct invocation as a failure.

Apply this diff to match the actual payload structure:
               cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
               index: null
-              new_nmstate_yaml: "10.0.0.5"
+              new_nmstate_yaml: '(?s).*10\.0\.0\.5.*'
This regex will match any NMState YAML containing the expected IP address while tolerating the full document structure.

Based on past review comments.

97-97: Unable to verify the concern about empty ssh_public_key field matching.

The test data at lines 97 and 208 expects ssh_public_key: "" when no SSH key is provided during cluster creation. The actual tool argument matching is performed by the external lsc_agent_eval library (from the lightspeed-evaluation repository), which is not available in this codebase for inspection.

Key findings:

No evidence found of this issue being "flagged in past reviews" as claimed

Git history shows eval stability improvements but no mention of ssh_public_key matching problems

Other test cases (line 68) successfully use regex patterns for ssh_public_key when provided

The empty string "" is a valid JSON value, distinct from an omitted field

Without access to the matching implementation or evidence of actual failures, the concern remains speculative. The test data appears intentionally structured this way to handle cases where SSH keys are not provided.

coderabbitai · 2025-09-29T19:07:38Z

test/scripts/cluster_create_test_setup.sh

+curl -sSf -X POST \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer ${OCM_TOKEN}" \
+  -d "${INFRA_PAYLOAD}" \
+  "${ASSISTED_SERVICE_URL}/infra-envs"


⚠️ Potential issue | 🟠 Major

Capture and validate infra-env creation response.

The infra-env creation result is not captured or validated. If this step fails silently, the polling loop may succeed (finding the cluster) but the environment will be incomplete for testing.

Apply this diff:

-curl -sSf -X POST \ +INFRA_ENV_RESPONSE="$(curl -sSf -X POST \ -H "Content-Type: application/json" \ -H "Authorization: Bearer ${OCM_TOKEN}" \ -d "${INFRA_PAYLOAD}" \ - "${ASSISTED_SERVICE_URL}/infra-envs" + "${ASSISTED_SERVICE_URL}/infra-envs")" + +if ! echo "${INFRA_ENV_RESPONSE}" | jq -e '.id' >/dev/null 2>&1; then + echo "Failed to create infra-env: ${INFRA_ENV_RESPONSE}" >&2 + exit 1 +fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

curl -sSf -X POST \

-H "Content-Type: application/json" \

-H "Authorization: Bearer ${OCM_TOKEN}" \

-d "${INFRA_PAYLOAD}" \

"${ASSISTED_SERVICE_URL}/infra-envs"

INFRA_ENV_RESPONSE="$(curl -sSf -X POST \

-H "Content-Type: application/json" \

-H "Authorization: Bearer ${OCM_TOKEN}" \

-d "${INFRA_PAYLOAD}" \

"${ASSISTED_SERVICE_URL}/infra-envs")"

if ! echo "${INFRA_ENV_RESPONSE}" | jq -e '.id' >/dev/null 2>&1; then

echo "Failed to create infra-env: ${INFRA_ENV_RESPONSE}" >&2

exit 1

fi

🤖 Prompt for AI Agents

In test/scripts/cluster_create_test_setup.sh around lines 46 to 50, the script posts INFRA_PAYLOAD to create an infra-env but does not capture or validate the response; modify the curl call to capture the HTTP response and body, check the HTTP status code (ensure 2xx), parse the returned JSON to extract required fields (e.g., infraEnv id or name) using jq (or plain shell parsing), verify the extracted values are non-empty, and if the creation failed or values are missing print an error and exit non-zero so the test setup stops early instead of proceeding with an incomplete environment.

andrej1991 · 2025-09-29T20:48:53Z

/test eval-test

andrej1991 · 2025-09-30T08:12:55Z

/test eval-test

andrej1991 · 2025-09-30T09:13:12Z

/test eval-test

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

Makefile (1)
143-153: Guard the optional OCM_TOKEN access under set -u.
With set -u in effect, referencing "$OCM_TOKEN" when the variable is unset aborts the target before the fallback path can run. Use a defaulted expansion (or ${OCM_TOKEN+x}) so the test exercises the intended branch.
-	if [ -n "$$OCM_TOKEN" ]; then
+	if [ -n "${OCM_TOKEN:-}" ]; then
Remember to apply the same pattern to any other direct $OCM_TOKEN reads in this block.
test/evals/eval_data.yaml (1)
223-228: Match the full NMState payload, not just the IP literal.
alter_static_network_config_nmstate_for_host sends the entire nmstate document; expecting the argument to equal "10.0.0.5" guarantees an action_eval failure even when the assistant responds correctly. Please switch this to a regex that tolerates the full multi-line YAML while asserting the critical fields.
-              new_nmstate_yaml: "10.0.0.5"
+              new_nmstate_yaml: '(?s).*mac_address:\s*c5:d6:bc:f0:05:20.*vlan_id:\s*400.*10\.0\.0\.5/24.*8\.8\.8\.8.*'

♻️ Duplicate comments (8)

assisted-chat-pod.yaml (1)
50-53: Stop hardcoding stage endpoints; align on OCM_BASE_URL and ensure assisted-service-mcp actually reads these envs.

These vars bake in stage URLs and may be unused if assisted-service-mcp doesn’t read them. Prefer a single OCM_BASE_URL and derive subpaths in the service; keep these as fallbacks only if the service consumes them. Also align with test/prow/template.yaml’s OCM_BASE_URL.

Suggested change (YAML side) if you want to keep the pod lean and derive in code:
-        - name: INVENTORY_URL
-          value: "https://api.stage.openshift.com/api/assisted-install/v2"
-        - name: PULL_SECRET_URL
-          value: "https://api.stage.openshift.com/api/accounts_mgmt/v1/access_token"
+        - name: OCM_BASE_URL
+          value: ${OCM_BASE_URL:-https://api.stage.openshift.com}
Then, in assisted-service-mcp, read OCM_BASE_URL and build:

$OCM_BASE_URL/api/assisted-install/v2

$OCM_BASE_URL/api/accounts_mgmt/v1/access_token
#!/bin/bash
set -euo pipefail
# Check assisted-service-mcp for env usage
echo "Searching assisted-service-mcp for INVENTORY_URL/PULL_SECRET_URL consumption…"
gh search code 'INVENTORY_URL' --repo rh-ecosystem-edge/assisted-service-mcp --json path,url -q '.[]|.path+" <- "+.url'
gh search code 'PULL_SECRET_URL' --repo rh-ecosystem-edge/assisted-service-mcp --json path,url -q '.[]|.path+" <- "+.url'
gh search code 'Getenv("INVENTORY_URL")' --repo rh-ecosystem-edge/assisted-service-mcp --json path,url -q '.[]|.path+" <- "+.url'
gh search code 'Getenv("PULL_SECRET_URL")' --repo rh-ecosystem-edge/assisted-service-mcp --json path,url -q '.[]|.path+" <- "+.url'
scripts/eval_k8s.sh (1)
20-27: Fix flaky UNIQUE_ID generation and SC2155; keep quoting/trap.

head /dev/urandom | … | head -c 8 can hang; and export TEMP_DIR=$(mktemp -d) triggers SC2155. Generate bytes directly via redirection and split assign/export.

Apply:
-UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8)
-export UNIQUE_ID
-export TEMP_DIR=$(mktemp -d)
+UNIQUE_ID="$(LC_ALL=C tr -dc 'a-z0-9' </dev/urandom | head -c 8)"
+export UNIQUE_ID
+TEMP_DIR="$(mktemp -d)"
+export TEMP_DIR
 trap 'rm -rf "$TEMP_DIR"' EXIT
-cp test/evals/eval_data.yaml "${TEMP_DIR}/eval_data.yaml"
+cp "test/evals/eval_data.yaml" "${TEMP_DIR}/eval_data.yaml"
 sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "${TEMP_DIR}/eval_data.yaml"
test/evals/eval.py (1)
84-86: UNIQUE_ID guard: fix typo and print to stderr.

Minor polish for clearer CI failures.
-if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+if os.getenv("UNIQUE_ID") is None:
+    print("Environment variable 'UNIQUE_ID' must be set so cluster creation and cleanup can proceed.", file=sys.stderr)
     sys.exit(1)
test/scripts/delete_cluster.sh (1)
1-18: Harden deletion: retries, fail-with-body, timeouts, and safe read.

Current loop may miss errors and leak clusters. Add retries/timeouts, fail on non‑2xx, and use read -r.
-ASSISTED_SERVICE_URL="${OCM_BASE_URL:-https://api.stage.openshift.com}/api/assisted-install/v2"
+ASSISTED_SERVICE_URL="${OCM_BASE_URL:-https://api.stage.openshift.com}/api/assisted-install/v2"
@@
-curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do
+curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+  -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+  "${ASSISTED_SERVICE_URL}/clusters" \
+| jq -c '.[] | {id, name}' \
+| while IFS= read -r item; do
@@
-    curl -fsS -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}"
+    curl -sS --fail-with-body --retry 5 --retry-all-errors --connect-timeout 10 --max-time 60 \
+      -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" -H 'Accept: application/json' \
+      "${ASSISTED_SERVICE_URL}/clusters/${id}" \
+    || echo "WARN: failed to delete cluster ${id} (${name})"
test/prow/entrypoint.sh (2)
17-28: Quote paths, secure secret files, and clean up the temp dir.
This is the same set of hardening steps called out earlier: quote $TEMP_DIR/$TEST_DIR usages, create the token/env files with 0600 perms (or a umask 077), and register a trap to remove the mktemp directory on exit. Right now a workspace with spaces still breaks, the token/env files inherit default world-readable perms, and /tmp accumulates leftovers. Please apply the earlier fix wholesale.

8-13: Fail fast when the OCM token fetch returns empty/null.
If SSO ever responds without an access_token, jq -r will happily emit null (exit 0), so the script keeps going with an unusable token and the evaluator later fails with 401s. Please validate the value immediately and abort on empty/null so we never run the expensive test suite with bad credentials.
-OCM_TOKEN=$(curl -sSf -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
-  -H "Content-Type: application/x-www-form-urlencoded" \
-  -d "grant_type=client_credentials" \
-  -d "client_id=$CLIENT_ID" \
-  -d "client_secret=$CLIENT_SECRET" | jq -r '.access_token')
-export OCM_TOKEN
+OCM_TOKEN=$(curl -sSf -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
+  -H "Content-Type: application/x-www-form-urlencoded" \
+  -d "grant_type=client_credentials" \
+  -d "client_id=$CLIENT_ID" \
+  -d "client_secret=$CLIENT_SECRET" | jq -r '.access_token')
+if [ -z "${OCM_TOKEN}" ] || [ "${OCM_TOKEN}" = "null" ]; then
+  echo "Failed to obtain OCM access token" >&2
+  exit 1
+fi
+export OCM_TOKEN
Makefile (2)
1-140: Run the recipes under bash so set -o pipefail works.
With .ONESHELL the recipe still executes in /bin/sh, so set -o pipefail throws “bad option” and the whole target exits immediately. Please make the target (or the whole Makefile) use bash explicitly and collapse the strict-mode lines accordingly.
+# Use bash for recipes that rely on pipefail/[[...]]
+SHELL := /usr/bin/env bash
@@
-.ONESHELL:
-test-eval: ## Run agent evaluation tests
-	#!/bin/bash
-	set -e
-	set -o pipefail
+.ONESHELL:
+test-eval: ## Run agent evaluation tests
+	set -euo pipefail
133-139: Write the cached token with 0600 perms.
echo "$$OCM_TOKEN" > test/evals/ocm_token.txt respects the caller’s umask (022 in many environments), leaving the token world-readable. Please lock it down the same way you already do in the k8s target.
-	echo "$$OCM_TOKEN" > test/evals/ocm_token.txt
+	umask 077
+	printf '%s\n' "$$OCM_TOKEN" > test/evals/ocm_token.txt

🧹 Nitpick comments (1)

test/prow/template.yaml (1)

46-47: OCM_BASE_URL usage verified; strip trailing slash before use
Scripts in test/scripts (e.g. cluster_create_test_setup.sh lines 9–10) consume OCM_BASE_URL to build ASSISTED_SERVICE_URL but don’t remove trailing slashes. Add normalization (e.g. OCM_BASE_URL=${OCM_BASE_URL%/}) before constructing URLs to avoid double slashes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b4383a and d49faeb.

📒 Files selected for processing (12)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
scripts/eval_k8s.sh (1 hunks)
template.yaml (2 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (7 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/entrypoint.sh (2 hunks)
test/prow/template.yaml (2 hunks)
test/scripts/cluster_create_test_setup.sh (1 hunks)
test/scripts/cluster_created_check.sh (1 hunks)
test/scripts/delete_cluster.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

test/scripts/cluster_created_check.sh
test/prow/Dockerfile
test/scripts/cluster_create_test_setup.sh

🧰 Additional context used

🪛 Shellcheck (0.11.0)

scripts/eval_k8s.sh

[warning] 22-22: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

andrej1991 · 2025-09-30T11:33:32Z

/test eval-test

carbonin · 2025-09-30T12:05:36Z

/retest

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (5)

scripts/eval_k8s.sh (1)
20-20: Fix UNIQUE_ID generation to avoid hanging.

The current pipeline head /dev/urandom | tr -dc 0-9a-z | head -c 8 can hang because head without -c reads line-by-line and /dev/urandom may not produce newlines quickly. Use a byte-safe approach:

Apply this diff:
-UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8)
+UNIQUE_ID="$(LC_ALL=C tr -dc 'a-z0-9' </dev/urandom | head -c 8)"
Alternatively, use openssl rand -hex 4 for a more portable solution.
Makefile (3)
128-131: Set SHELL to bash for pipefail compatibility.

The .ONESHELL: directive doesn't override Make's default shell, so set -o pipefail may fail under /bin/sh.

Add at the top of the file or before the target:
+SHELL := /usr/bin/env bash
+
 .ONESHELL:
 test-eval: ## Run agent evaluation tests
-	#!/bin/bash
-	set -e
-	set -o pipefail
+	set -euo pipefail
133-133: Use portable UNIQUE_ID generation.

The head /dev/urandom | tr -dc 0-9a-z | head -c 8 approach is not portable and can fail on some systems.

Apply this diff:
-	export UNIQUE_ID=$(shell head /dev/urandom | tr -dc 0-9a-z | head -c 8)
+	export UNIQUE_ID=$$(date +%s%N | sha256sum | head -c 8)
Or if OpenSSL is available:
-	export UNIQUE_ID=$(shell head /dev/urandom | tr -dc 0-9a-z | head -c 8)
+	export UNIQUE_ID=$$(openssl rand -hex 4)
136-136: Restrict token file permissions to 0600.

The OCM token is written without restrictive permissions, exposing credentials on disk. The test-eval-k8s target (lines 147-148, 151-152) already uses umask 077 correctly.

Apply this diff to match the secure approach in test-eval-k8s:
+	umask 077
-	echo "$$OCM_TOKEN" > test/evals/ocm_token.txt
+	printf '%s\n' "$$OCM_TOKEN" > test/evals/ocm_token.txt
test/evals/eval_data.yaml (1)
223-227: Fix NMState YAML payload matching.

The alter_static_network_config_nmstate_for_host tool submits a full NMState YAML document, but the expected new_nmstate_yaml value matches only the literal IP "10.0.0.5". This will cause the action_eval to fail on every correct invocation.

Apply this diff to match the full payload:
-              new_nmstate_yaml: "10.0.0.5"
+              new_nmstate_yaml: '(?s).*mac_address:\s*c5:d6:bc:f0:05:20.*vlan_id:\s*400.*10\.0\.0\.5/24.*'
This regex asserts the key elements (MAC address, VLAN ID, IP with CIDR) are present in the full YAML document.

🧹 Nitpick comments (3)

scripts/eval_k8s.sh (1)
22-22: Separate variable assignment and export to avoid masking return values.

Inline export VAR=$(command) masks the exit status of command, making it impossible to detect if mktemp fails.

Apply this diff:
-export TEMP_DIR=$(mktemp -d)
+TEMP_DIR=$(mktemp -d)
+export TEMP_DIR
This resolves ShellCheck SC2155 and ensures mktemp failures will cause the script to exit immediately due to set -e.
test/scripts/delete_cluster.sh (1)
11-18: Consider adding retry/timeout flags and read -r for robustness.

The deletion logic is functional but could be more resilient:

read without -r can mangle backslashes in cluster names

No retry/timeout flags on curl calls could cause transient failures

DELETE failures are not caught or logged

Consider applying:
-curl -fsS -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" | jq '[.[] |{id, name}]' | jq -c '.[]' | while read item; do
+curl -fsS --retry 3 --retry-all-errors --connect-timeout 10 --max-time 60 \
+  -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters" \
+| jq -c '[.[] | {id, name}][]' \
+| while IFS= read -r item; do
   id=$(echo "$item" | jq -r '.id')
   name=$(echo "$item" | jq -r '.name')
   if [[ "$name" == *"-${UNIQUE_ID}" ]]; then
     echo "The cluster '${name}', ${id} is going to be deleted"
-    curl -fsS -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}"
+    curl -fsS --retry 3 --retry-all-errors --connect-timeout 10 --max-time 60 \
+      -X DELETE -H "Authorization: Bearer ${OCM_TOKEN}" "${ASSISTED_SERVICE_URL}/clusters/${id}" \
+    || echo "WARN: failed to delete cluster ${id} (${name})" >&2
   fi
 done
test/prow/entrypoint.sh (1)
17-28: Add quotes, secure secret handling, and cleanup trap for robustness.

Several robustness improvements are needed:

Missing quotes: Variables at lines 19, 24, 25, 26, 28 should be quoted to handle spaces/special chars

Insecure secrets: Lines 21-22 write tokens with default permissions; use install -m 600 and printf

No cleanup: TEMP_DIR is not removed on exit; add trap 'rm -rf "$TEMP_DIR"' EXIT

No umask: Set umask 077 before creating files

Apply:
 TEMP_DIR=$(mktemp -d)
+umask 077
+trap 'rm -rf "$TEMP_DIR"' EXIT

-cd $TEMP_DIR
+cd "$TEMP_DIR"

-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+install -m 600 /dev/null "$TEMP_DIR/ocm_token.txt"
+printf '%s\n' "$OCM_TOKEN" > "$TEMP_DIR/ocm_token.txt"
+install -m 600 /dev/null "$TEMP_DIR/.env"
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > "$TEMP_DIR/.env"

-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../scripts|: ${WORK_DIR}/test/scripts|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|: ../scripts|: ${WORK_DIR}/test/scripts|g" "$TEMP_DIR/eval_data.yaml"

-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d49faeb and fdd645f.

📒 Files selected for processing (12)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
scripts/eval_k8s.sh (1 hunks)
template.yaml (2 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (7 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/entrypoint.sh (2 hunks)
test/prow/template.yaml (2 hunks)
test/scripts/cluster_create_test_setup.sh (1 hunks)
test/scripts/cluster_created_check.sh (1 hunks)
test/scripts/delete_cluster.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

assisted-chat-pod.yaml
test/evals/eval.py
test/scripts/cluster_create_test_setup.sh
test/prow/Dockerfile
test/scripts/cluster_created_check.sh

🧰 Additional context used

🪛 Shellcheck (0.11.0)

scripts/eval_k8s.sh

[warning] 22-22: Declare and assign separately to avoid masking return values.

(SC2155)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request

🔇 Additional comments (28)

template.yaml (2)

210-210: LGTM: Documentation now covers ROSA/ARO scope boundaries.

The updated behavior documentation appropriately references ROSA and ARO alongside the OpenShift Installer, clarifying the assistant's scope for users who might request cloud-managed installations.

325-329: LGTM: Exact-match cluster name resolution workflow is clear and robust.

The new multi-branch exact-match logic is well-specified:

Exact string match search only (no partial matches)

Handle single match (proceed), multiple matches (ask user), no matches (request Cluster ID)

This eliminates ambiguity in cluster name resolution and aligns with the PR's goal to make tests more stable.

test/prow/template.yaml (1)

46-47: LGTM: OCM_BASE_URL parameterization enables flexible endpoint configuration.

The addition of OCM_BASE_URL as both a parameter and environment variable allows test jobs to target different OCM environments (stage, production, etc.) without code changes. The default value appropriately points to the staging environment for CI tests.

Also applies to: 69-70

scripts/eval_k8s.sh (1)

23-27: Good: Cleanup trap and proper quoting are now in place.

The trap ensures TEMP_DIR is cleaned up on exit, and all variable expansions are properly quoted. These address previous review concerns.

test/scripts/delete_cluster.sh (1)

7-9: LGTM: Environment variable guards and URL derivation are robust.

The required variable checks will fail fast if OCM_TOKEN or UNIQUE_ID are missing, and the ASSISTED_SERVICE_URL derivation from OCM_BASE_URL with a sensible default enables flexible environment configuration.
test/prow/entrypoint.sh (1)
7-13: Good: UNIQUE_ID guard and improved OCM_TOKEN fetch.

The required UNIQUE_ID check (line 7) ensures the test flow has a unique identifier. The OCM_TOKEN fetch (lines 8-13) now uses jq -r to strip quotes and separates assignment from export, addressing SC2155. However, consider adding validation that the token is non-empty:

Consider adding after line 12:
if [[ -z "${OCM_TOKEN}" || "${OCM_TOKEN}" == "null" ]]; then
  echo "ERROR: Failed to obtain OCM_TOKEN" >&2
  exit 1
fi
Makefile (3)

131-132: LGTM: Proper temp directory cleanup.

The trap ensures temporary files are removed on exit, even if the eval fails.

134-135: LGTM: OCM token sourcing.

The token sourcing logic correctly delegates to the utility script.

137-139: LGTM: YAML staging and substitution logic.

The approach of copying eval_data.yaml to a temp directory and substituting uniq-cluster-name with ${UNIQUE_ID} before running the eval correctly addresses the PR objective to make tests more stable with unique cluster names.

test/evals/eval_data.yaml (19)

55-55: LGTM: Cleanup script addition.

Adding cleanup_script for the SNO creation workflow supports the PR objective to reduce cluster pool size and make tests more stable.

59-59: LGTM: Action evaluation coverage.

Adding action_eval extends validation to confirm the cluster creation action was performed, not just that the tool was invoked.

69-70: LGTM: Platform parameter and verification script.

The platform: "none" parameter clarifies cluster deployment type, and eval_verify_script confirms actual cluster creation per PR objectives.

84-84: LGTM: Cleanup for multi-node workflow.

Consistent cleanup script usage across workflows reduces cluster accumulation.

87-88: LGTM: Enhanced validation and clearer query.

Adding action_eval and rewording the query makes the test requirements more explicit.

96-99: LGTM: Platform parameter and verification additions.

The platform: "baremetal" correctly specifies the multi-node cluster type, and eval_verify_script ensures the cluster was created.

97-97: Verify empty SSH key is intentional for test coverage.

The ssh_public_key: "" appears to test cluster creation without an SSH key (followed by setting one in eval_id set_ssh_key_eval_test_ssh). Past comments suggested removing this, but it seems intentional for testing the two-step workflow.

Confirm this empty value is required to test:

Cluster creation without SSH key

Subsequent SSH key addition via set_cluster_ssh_key

If this is the intended test flow, consider adding a comment in the YAML to clarify the test design.

101-101: LGTM: Updated expected response.

The new response text correctly reflects the multi-node cluster workflow, offering static network configuration before ISO download and mentioning DHCP as the default.

120-122: LGTM: Grammar correction applied.

The expected response now uses the grammatically correct "hosts haven't been discovered yet."

148-148: LGTM: Clearer query for error handling test.

The updated query text makes the invalid SSH key test more explicit and straightforward.

164-165: LGTM: Enhanced Azure refusal validation.

Switching to action_eval with an accuracy-based expected response ensures the assistant properly refuses Azure requests and provides helpful on-prem guidance.

177-178: LGTM: Setup script for name-based lookup test.

The setup_script ensures a cluster exists before testing name-based retrieval, addressing the UUID vs. name issue mentioned in PR objectives.

181-181: LGTM: Query text updated for consistency.

The query now uses the substitution-ready cluster name format.

184-189: LGTM: Correct tool call sequence for name-based lookup.

The sequence list_clusters → cluster_info correctly implements the workflow to resolve a cluster name to its ID before retrieving details. This aligns with the typical pattern where the assistant must first list clusters to find the ID corresponding to a given name.

194-195: LGTM: Cleanup script for static network test.

Consistent cleanup reduces cluster accumulation across all test workflows.

197-199: LGTM: Query update and action validation.

The updated query text is clearer, and adding action_eval with eval_verify_script ensures the cluster is actually created.

208-209: LGTM: Empty SSH key and platform specification.

Consistent with the test design pattern where SSH key is optional (or added later), and platform is correctly specified.

212-212: LGTM: Comprehensive static network configuration query.

The updated query provides all necessary details in a single, clear instruction for static VLAN configuration.

220-220: LGTM: Clearer cluster reference.

The query now explicitly refers to "the cluster you just created" for better context.

coderabbitai · 2025-09-30T12:50:41Z

Makefile

+	get_ocm_token
+	echo "$$OCM_TOKEN" > test/evals/ocm_token.txt
+	cp test/evals/eval_data.yaml $$TEMP_DIR/eval_data.yaml
+	sed -i "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml


🛠️ Refactor suggestion | 🟠 Major

Use portable sed syntax for in-place editing.

The sed -i flag behaves differently on BSD (macOS) vs GNU (Linux) systems. BSD sed requires -i '' while GNU accepts -i alone.

Apply this diff for portability:

- sed -i "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml + sed "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml > $$TEMP_DIR/eval_data.yaml.tmp + mv $$TEMP_DIR/eval_data.yaml.tmp $$TEMP_DIR/eval_data.yaml

Or use a portable wrapper:

- sed -i "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml + if sed --version 2>&1 | grep -q GNU; then \ + sed -i "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml; \ + else \ + sed -i '' "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml; \ + fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sed -i "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml

# Replace in-place sed (non-portable) with portable redirection + mv

sed "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml > $$TEMP_DIR/eval_data.yaml.tmp

mv $$TEMP_DIR/eval_data.yaml.tmp $$TEMP_DIR/eval_data.yaml

🤖 Prompt for AI Agents

In Makefile around line 138, the sed -i usage is not portable between GNU and BSD (macOS) sed; replace the in-place edit with a portable approach such as either detecting sed variant and using -i '' on macOS and -i on Linux, or avoid in-place flags entirely by writing output to a temporary file and moving it back (e.g., run sed 's/uniq-cluster-name/$${UNIQUE_ID}/g' $$TEMP_DIR/eval_data.yaml > $$TEMP_DIR/eval_data.yaml.tmp && mv $$TEMP_DIR/eval_data.yaml.tmp $$TEMP_DIR/eval_data.yaml) so the Makefile works on both macOS and Linux.

andrej1991 · 2025-09-30T13:38:56Z

/test eval-test

andrej1991 · 2025-09-30T14:50:04Z

/test eval-test

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/prow/entrypoint.sh (1)

17-28: Quote paths, secure secrets, and add cleanup trap.

Multiple issues remain from previous reviews:

Unquoted variable expansions risk word-splitting
Secrets written with echo lack restrictive permissions
No trap to cleanup TEMP_DIR on exit

Apply this diff:

 TEMP_DIR=$(mktemp -d)
+trap 'rm -rf "$TEMP_DIR"' EXIT
 
-cd $TEMP_DIR
+cd "$TEMP_DIR"
 
-echo "$OCM_TOKEN" > ocm_token.txt
-echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env
+umask 077
+printf '%s\n' "$OCM_TOKEN" > ocm_token.txt
+printf 'GEMINI_API_KEY=%s\n' "${GEMINI_API_KEY}" > .env
 
-cp $TEST_DIR/eval_data.yaml $TEMP_DIR/eval_data.yaml
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEMP_DIR/eval_data.yaml
-sed -i "s|: ../scripts|: ${WORK_DIR}/test/scripts|g" $TEMP_DIR/eval_data.yaml
+cp "$TEST_DIR/eval_data.yaml" "$TEMP_DIR/eval_data.yaml"
+sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "$TEMP_DIR/eval_data.yaml"
+sed -i "s|: ../scripts|: ${WORK_DIR}/test/scripts|g" "$TEMP_DIR/eval_data.yaml"
 
-python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEMP_DIR/eval_data.yaml
+python "$TEST_DIR/eval.py" \
+  --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" \
+  --agent_auth_token_file "$TEMP_DIR/ocm_token.txt" \
+  --eval_data_yaml "$TEMP_DIR/eval_data.yaml"

♻️ Duplicate comments (9)

test/evals/eval.py (1)
84-86: Fix typo in error message.

The error message contains "varialbe" instead of "variable". This was previously flagged but remains unaddressed.

Apply this diff:
 if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+    print("Environment variable 'UNIQUE_ID' must be set for cluster creation and cleanup.", file=sys.stderr)
     sys.exit(1)
Makefile (2)
126-130: Set SHELL to bash for pipefail compatibility.

The .ONESHELL directive is appropriate, but the shebang in the recipe is ignored by make. Without explicitly setting SHELL := /bin/bash, set -o pipefail may fail on systems where /bin/sh is not bash.

Add at the top of the Makefile (or before this target):
+SHELL := /bin/bash
+
 .ONESHELL:
 test-eval: ## Run agent evaluation tests
-	#!/bin/bash
-	set -e
-	set -o pipefail
+	set -euo pipefail
136-136: Secure token file permissions.

Writing the token with echo does not set restrictive permissions, exposing credentials.

Apply this diff:
+	umask 077
 	echo "$$OCM_TOKEN" > test/evals/ocm_token.txt
Or better, use printf and explicit permissions:
-	echo "$$OCM_TOKEN" > test/evals/ocm_token.txt
+	umask 077
+	printf '%s\n' "$$OCM_TOKEN" > test/evals/ocm_token.txt
scripts/eval_k8s.sh (1)
20-21: UNIQUE_ID generation may hang.

Using head /dev/urandom is line-based and can block waiting for newlines from a binary stream.

Apply this diff:
-UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8)
+UNIQUE_ID=$(LC_ALL=C tr -dc 'a-z0-9' </dev/urandom | head -c 8)
 export UNIQUE_ID
test/prow/entrypoint.sh (1)
8-13: Harden token fetch with timeout and validation.

The curl command now uses -f to fail on HTTP errors, but lacks timeouts and doesn't validate the token is non-empty or non-null.

Apply this diff:
-OCM_TOKEN=$(curl -sSf -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
+OCM_TOKEN=$(curl -sSf --connect-timeout 10 --max-time 60 \
+  -X POST https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token \
   -H "Content-Type: application/x-www-form-urlencoded" \
   -d "grant_type=client_credentials" \
   -d "client_id=$CLIENT_ID" \
   -d "client_secret=$CLIENT_SECRET" | jq -r '.access_token')
+if [[ -z "${OCM_TOKEN}" || "${OCM_TOKEN}" == "null" ]]; then
+  echo "Failed to obtain OCM_TOKEN" >&2
+  exit 1
+fi
 export OCM_TOKEN
test/evals/eval_data.yaml (4)
96-97: Remove or relax the empty ssh_public_key assertion.

Some tools omit empty fields entirely; requiring ssh_public_key: "" will fail when the tool call doesn't include the field at all. Consider removing this line from expected arguments or using a matcher that accepts both missing and empty values.

Apply this diff:
               cpu_architecture: "x86_64"
-              ssh_public_key: ""
               platform: "baremetal"
120-122: Fix grammar: use "haven't" (plural).

The response uses "hosts hasn't" but should be "hosts haven't" for grammatical correctness.

Apply this diff:
-      expected_response: "hosts hasn't been discovered yet."
+      expected_response: "hosts haven't been discovered yet."
207-209: Remove or relax the empty ssh_public_key assertion.

As with line 97, requiring ssh_public_key: "" can fail when tools omit empty fields. Remove this line or use a matcher that accepts both missing and empty values.

Apply this diff:
               cpu_architecture: "x86_64"
-              ssh_public_key: ""
               platform: "none"
223-227: Match the full NMState YAML document, not just the IP literal.

The alter_static_network_config_nmstate_for_host tool submits a complete NMState YAML document containing MAC address, VLAN ID, interface names, DNS config, etc. Expecting it to equal just "10.0.0.5" will cause every correct call to fail evaluation.

Apply this diff to use a regex that matches the full document while asserting key details:
               cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
               index: null
-              new_nmstate_yaml: "10.0.0.5"
+              new_nmstate_yaml: '(?s).*mac_address:\s*c5:d6:bc:f0:05:20.*vlan_id:\s*400.*10\.0\.0\.5/24.*'
This pattern uses (?s) (dotall mode) to match across newlines and asserts the presence of the MAC address, VLAN ID, and IP address from the query.

🧹 Nitpick comments (2)

Makefile (1)
138-138: Improve sed portability.

sed -i behaves differently on GNU (Linux) vs BSD (macOS). The current usage may fail on macOS.

Apply this diff for portability:
-	sed -i "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml
+	sed "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml > $$TEMP_DIR/eval_data.yaml.tmp
+	mv $$TEMP_DIR/eval_data.yaml.tmp $$TEMP_DIR/eval_data.yaml
scripts/eval_k8s.sh (1)
25-25: Improve sed portability.

sed -i behaves differently on GNU vs BSD systems.

Apply this diff:
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" "${TEMP_DIR}/eval_data.yaml"
+sed "s/uniq-cluster-name/${UNIQUE_ID}/g" "${TEMP_DIR}/eval_data.yaml" > "${TEMP_DIR}/eval_data.yaml.tmp"
+mv "${TEMP_DIR}/eval_data.yaml.tmp" "${TEMP_DIR}/eval_data.yaml"

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fdd645f and 88b3e00.

📒 Files selected for processing (12)

Makefile (1 hunks)
assisted-chat-pod.yaml (1 hunks)
scripts/eval_k8s.sh (1 hunks)
template.yaml (2 hunks)
test/evals/eval.py (3 hunks)
test/evals/eval_data.yaml (6 hunks)
test/prow/Dockerfile (1 hunks)
test/prow/entrypoint.sh (2 hunks)
test/prow/template.yaml (2 hunks)
test/scripts/cluster_create_test_setup.sh (1 hunks)
test/scripts/cluster_created_check.sh (1 hunks)
test/scripts/delete_cluster.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

template.yaml
test/prow/Dockerfile
test/scripts/cluster_create_test_setup.sh
assisted-chat-pod.yaml
test/scripts/cluster_created_check.sh
test/scripts/delete_cluster.sh

🧰 Additional context used

🪛 Shellcheck (0.11.0)

scripts/eval_k8s.sh

[warning] 22-22: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (6)

test/evals/eval.py (2)

95-95: Stricter failure threshold improves test reliability.

Changing from > 2 to any non-zero failed_evals_count makes the evaluation stricter and more aligned with the PR objective of stabilizing tests.

99-100: Success message improves user experience.

Clear success feedback when all evaluations pass.

test/prow/entrypoint.sh (1)

7-7: UNIQUE_ID guard is appropriate.

Using parameter expansion to require UNIQUE_ID ensures the script fails fast if the variable is unset, which aligns with the new validation in eval.py.

test/prow/template.yaml (1)

46-47: OCM_BASE_URL parameter enables environment flexibility.

Adding OCM_BASE_URL as a configurable parameter with a sensible stage default allows the evaluation tests to target different OCM environments without code changes.

Also applies to: 69-70

test/evals/eval_data.yaml (2)

53-72: LGTM: Good addition of test lifecycle hooks and action verification.

The addition of cleanup_script, eval_verify_script, and action_eval ensures that clusters are actually created (not just tool-invoked) and properly cleaned up afterward. The explicit platform: "none" parameter clarifies the expected platform type for SNO clusters.

176-190: LGTM: Proper test setup and name-to-ID resolution flow.

The addition of setup_script ensures the test cluster exists before querying, and the two-step flow (list_clusters → cluster_info) correctly resolves cluster name to ID. The cleanup_script ensures proper teardown.

coderabbitai · 2025-09-30T16:28:20Z

Makefile

+	set -o pipefail
+	export TEMP_DIR=$(shell mktemp -d)
+	trap 'rm -rf "$$TEMP_DIR"' EXIT
+	export UNIQUE_ID=$(shell head /dev/urandom | tr -dc 0-9a-z | head -c 8)


⚠️ Potential issue | 🟠 Major

UNIQUE_ID generation may hang.

Using head /dev/urandom is line-based and can block indefinitely waiting for newlines. Use a byte-safe approach.

Apply this diff:

- export UNIQUE_ID=$(shell head /dev/urandom | tr -dc 0-9a-z | head -c 8) + export UNIQUE_ID=$(shell LC_ALL=C tr -dc 'a-z0-9' </dev/urandom | head -c 8)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export UNIQUE_ID=$(shell head /dev/urandom | tr -dc 0-9a-z | head -c 8)

export UNIQUE_ID=$(shell LC_ALL=C tr -dc 'a-z0-9' </dev/urandom | head -c 8)

🤖 Prompt for AI Agents

In Makefile around line 133, the UNIQUE_ID assignment uses `head /dev/urandom` which can block; replace it with a byte-safe read (use `head -c` to read a fixed number of bytes or use a reliable tool like `openssl rand` or `uuidgen`) and then filter/trim to an 8-character alphanumeric lowercase ID. Update the line to read a fixed number of bytes from /dev/urandom (or use openssl/uuidgen), run the existing tr and head/fold pipeline to produce exactly 8 characters, and ensure the command cannot block waiting for newlines.

coderabbitai · 2025-09-30T16:28:20Z

scripts/eval_k8s.sh

 # Run evaluation
+UNIQUE_ID=$(head /dev/urandom | tr -dc 0-9a-z | head -c 8)
+export UNIQUE_ID
+export TEMP_DIR=$(mktemp -d)


⚠️ Potential issue | 🟠 Major

Fix inline export assignment (SC2155).

Combining export with command substitution masks failures from mktemp.

Apply this diff:

-export TEMP_DIR=$(mktemp -d) +TEMP_DIR=$(mktemp -d) +export TEMP_DIR

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export TEMP_DIR=$(mktemp -d)

TEMP_DIR=$(mktemp -d)

export TEMP_DIR

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 22-22: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents

In scripts/eval_k8s.sh around line 22, the inline export with command substitution (export TEMP_DIR=$(mktemp -d)) masks mktemp failures; change it to assign first (TEMP_DIR=$(mktemp -d)) then check the command succeeded (e.g., if [ -z "$TEMP_DIR" ] || [ ! -d "$TEMP_DIR" ]; then echo "mktemp failed" >&2; exit 1; fi) and then export the variable (export TEMP_DIR) so failures are detectable and handled.

andrej1991 · 2025-09-30T17:19:58Z

/test eval-test

openshift-ci · 2025-09-30T17:53:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrej1991, carbonin, maorfr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [carbonin,maorfr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

carbonin · 2025-09-30T17:53:20Z

/retest

carbonin · 2025-09-30T17:53:36Z

@andrej1991 unhold if you're done making changes. Otherwise I can re-lgtm after

openshift-ci · 2025-09-30T18:45:15Z

@andrej1991: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/eval-test	`88b3e00`	link	false	`/test eval-test`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

andrej1991 · 2025-09-30T18:45:35Z

/unhold

openshift-ci bot requested review from carbonin and keitwb September 19, 2025 15:04

andrej1991 force-pushed the stable_eval_test_2 branch from b6619da to 1fd460a Compare September 19, 2025 15:04

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from 1fd460a to a8146fa Compare September 19, 2025 19:01

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

openshift-ci bot added the do-not-merge/hold label Sep 19, 2025

andrej1991 force-pushed the stable_eval_test_2 branch from a8146fa to 92fabeb Compare September 19, 2025 19:10

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

carbonin requested changes Sep 19, 2025

View reviewed changes

test/evals/eval_data.yaml Show resolved Hide resolved

test/prow/entrypoint.sh Outdated Show resolved Hide resolved

test/scripts/cluster_create_test_setup.sh Outdated Show resolved Hide resolved

test/prow/Dockerfile Show resolved Hide resolved

openshift-ci bot assigned carbonin Sep 19, 2025

andrej1991 force-pushed the stable_eval_test_2 branch from 92fabeb to ddf0082 Compare September 22, 2025 11:53

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from ddf0082 to da08af1 Compare September 22, 2025 14:27

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from da08af1 to edb5580 Compare September 22, 2025 15:13

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from edb5580 to 2b3b9a7 Compare September 22, 2025 17:59

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from 2b3b9a7 to 3840300 Compare September 22, 2025 20:46

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from 3840300 to 6c11fee Compare September 23, 2025 09:29

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

test/evals/eval_data.yaml Outdated Show resolved Hide resolved

andrej1991 marked this pull request as draft September 23, 2025 18:01

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from 0b4383a to d49faeb Compare September 30, 2025 10:20

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

andrej1991 force-pushed the stable_eval_test_2 branch from d49faeb to fdd645f Compare September 30, 2025 12:44

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

an experiment for more stable eval tests

88b3e00

andrej1991 force-pushed the stable_eval_test_2 branch from fdd645f to 88b3e00 Compare September 30, 2025 16:23

coderabbitai bot reviewed Sep 30, 2025

View reviewed changes

carbonin approved these changes Sep 30, 2025

View reviewed changes

openshift-ci bot added the lgtm label Sep 30, 2025

openshift-ci bot removed the do-not-merge/hold label Sep 30, 2025

openshift-merge-bot bot merged commit 53a0112 into rh-ecosystem-edge:main Sep 30, 2025
6 of 7 checks passed

coderabbitai bot mentioned this pull request Oct 27, 2025

Make eval tests more reliable #239

Merged

This was referenced Nov 10, 2025

Adding RAG to assisted-chat #241

Merged

Resolves asking for cluster ID when it's known. #249

Merged

Integrate QE into dev suite and add filtering system #246

Merged

coderabbitai bot mentioned this pull request Nov 20, 2025

fix(ai): Enforce single-cluster creation limit in AI assistant #255

Merged

-	sed -i "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml
+	# Replace in-place sed (non-portable) with portable redirection + mv
+	sed "s/uniq-cluster-name/$${UNIQUE_ID}/g" $$TEMP_DIR/eval_data.yaml > $$TEMP_DIR/eval_data.yaml.tmp
+	mv $$TEMP_DIR/eval_data.yaml.tmp $$TEMP_DIR/eval_data.yaml

	export UNIQUE_ID=$(shell head /dev/urandom \| tr -dc 0-9a-z \| head -c 8)
	export UNIQUE_ID=$(shell LC_ALL=C tr -dc 'a-z0-9' </dev/urandom \| head -c 8)

	export TEMP_DIR=$(mktemp -d)
	TEMP_DIR=$(mktemp -d)
	export TEMP_DIR

making the evaluation tests more stable #213

making the evaluation tests more stable #213

Uh oh!

Conversation

andrej1991 commented Sep 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

andrej1991 commented Sep 19, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

carbonin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrej1991 commented Sep 19, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

andrej1991 commented Sep 22, 2025

Uh oh!

andrej1991 commented Sep 22, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrej1991 commented Sep 23, 2025

Uh oh!

andrej1991 commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 19, 2025 •

edited

Loading