Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 21 additions & 24 deletions test/evals/eval_data.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,11 @@
conversation:
- eval_id: available_operators
eval_query: What operators are available?
eval_types: [response_eval:accuracy, tool_eval, response_eval:sub-string]
eval_types: [response_eval:accuracy, tool_eval]
expected_response: "The operators that can be installed onto clusters are OpenShift AI and OpenShift Virtualization."
expected_tool_calls:
- - tool_name: list_operator_bundles
arguments: {}
expected_keywords: ["operator bundles", "Virtualization", "OpenShift AI"]

- conversation_group: static_networking_support_conv
conversation:
Expand All @@ -62,21 +61,21 @@
description: Create SNO and then retrieve Discovery ISO in two steps with all the information provided
conversation:
- eval_id: create_eval_test_sno
eval_query: create a new single node cluster named eval-test-singlenode-ClustER-NAme, running on version 4.19.7 with the x86_64 CPU architecture, configured under the base domain example.com, using the provided SSH key "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCmeaBFhSJ/MLECmqUaKweRgo10ABpwdvJ7v76qLYfP0pzfzYsF3hGP/fH5OQfHi9pTbWynjaEcPHVfaTaFWHvyMtv8PEMUIDgQPWlBSYzb+3AgQ5AsChhzTJCYnRdmCdzENlV+azgtb3mVfXiyCfjxhyy3QAV4hRrMaVtJGuUQfQ== [email protected]".
eval_query: create a new single node cluster named eval-test-singlenode-uniq-cluster-name, running on version 4.19.7 with the x86_64 CPU architecture, configured under the base domain example.com, using the provided SSH key "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCmeaBFhSJ/MLECmqUaKweRgo10ABpwdvJ7v76qLYfP0pzfzYsF3hGP/fH5OQfHi9pTbWynjaEcPHVfaTaFWHvyMtv8PEMUIDgQPWlBSYzb+3AgQ5AsChhzTJCYnRdmCdzENlV+azgtb3mVfXiyCfjxhyy3QAV4hRrMaVtJGuUQfQ== [email protected]".
eval_types: [tool_eval, response_eval:sub-string, response_eval:accuracy]
expected_tool_calls:
- - tool_name: create_cluster
arguments:
name: "eval-test-singlenode-ClustER-NAme"
name: "eval-test-singlenode-uniq-cluster-name"
version: "4\\.19\\.7"
base_domain: "example\\.com"
single_node: "(?i:true)"
cpu_architecture: "x86_64"
ssh_public_key: 'ssh-rsa\s+[A-Za-z0-9+/]+[=]{0,3}(\s+.+)?\s*'
expected_keywords: ["eval-test-singlenode-ClustER-NAme", "ID", "Discovery ISO", "download", "cluster"]
expected_response: I have created a cluster with name eval-test-singlenode-ClustER-NAme. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
expected_keywords: ["eval-test-singlenode-uniq-cluster-name", "ID", "Discovery ISO", "download", "cluster"]
expected_response: I have created a cluster with name eval-test-singlenode-uniq-cluster-name. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
- eval_id: get_iso_eval_test_sno
eval_query: Using the ID of the cluster you just created, get the Discovery ISO download URL for cluster 'eval-test-singlenode-ClustER-NAme'
eval_query: Using the ID of the cluster you just created, get the Discovery ISO download URL for cluster 'eval-test-singlenode-uniq-cluster-name'
eval_types: [tool_eval, response_eval:sub-string]
expected_tool_calls:
- - tool_name: cluster_iso_download_url
Expand All @@ -89,18 +88,18 @@
conversation:
- eval_id: create_eval_test_multinode
eval_types: [tool_eval, response_eval:accuracy, response_eval:sub-string]
eval_query: Create a multi-node cluster named 'eval-test-multinode-ClustER-NAme' with OpenShift 4.18.22 and domain test.local
eval_query: Create a multi-node cluster named 'eval-test-multinode-uniq-cluster-name' with OpenShift 4.18.22 and domain test.local
expected_tool_calls:
- - tool_name: create_cluster
arguments:
name: "eval-test-multinode-ClustER-NAme"
name: "eval-test-multinode-uniq-cluster-name"
version: "4\\.18\\.22"
base_domain: "test\\.local"
single_node: "(?i:false)"
cpu_architecture: "x86_64"
ssh_public_key: ""
expected_keywords: ["eval-test-multinode-ClustER-NAme", "ID", "Discovery ISO", "cluster"]
expected_response: I have created a cluster with name eval-test-multinode-ClustER-NAme. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
cpu_architecture: None
ssh_public_key: None
expected_keywords: ["eval-test-multinode-uniq-cluster-name", "ID", "Discovery ISO", "cluster"]
expected_response: I have created a cluster with name eval-test-multinode-uniq-cluster-name. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
- eval_id: set_ssh_key_eval_test_ssh
eval_query: Set the SSH key for the cluster you just created to "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAAAgQCmeaBFhSJ/MLECmqUaKweRgo10ABpwdvJ7v76qLYfP0pzfzYsF3hGP/fH5OQfHi9pTbWynjaEcPHVfaTaFWHvyMtv8PEMUIDgQPWlBSYzb+3AgQ5AsChhzTJCYnRdmCdzENlV+azgtb3mVfXiyCfjxhyy3QAV4hRrMaVtJGuUQfQ== [email protected]"
eval_types: [tool_eval, response_eval:accuracy]
Expand All @@ -111,7 +110,7 @@
ssh_public_key: 'ssh-rsa\s+[A-Za-z0-9+/]+[=]{0,3}(\s+.+)?\s*'
expected_response: The SSH public key is set for the cluster for cluster
- eval_id: get_iso_eval_test_multinode
eval_query: Using the ID of the cluster you just created, get the Discovery ISO for cluster 'eval-test-multinode-ClustER-NAme'
eval_query: Using the ID of the cluster you just created, get the Discovery ISO for cluster 'eval-test-multinode-uniq-cluster-name'
eval_types: [tool_eval, response_eval:sub-string]
expected_tool_calls:
- - tool_name: cluster_iso_download_url
Expand All @@ -131,14 +130,14 @@
- conversation_group: cluster_info_conv
conversation:
- eval_id: cluster_info_tool_call
eval_query: Give me details about cluster named 'abc123'
eval_query: Give me details about cluster named 'abc123abc'
eval_types: [tool_eval, response_eval:accuracy]
expected_tool_calls:
# It should list the clusters to try to match up the name
- - tool_name: list_clusters
arguments: {}
description: Test error handling for non-existent cluster ID/Name
expected_response: Retrieval failed for cluster 'abc123' because the resource was not found.
description: Test handling for non-existent cluster ID/Name
expected_response: Retrieval failed for cluster 'abc123abc' because the resource was not found.

- conversation_group: error_handling_conv
description: Validate graceful handling of invalid SSH key format
Expand Down Expand Up @@ -175,18 +174,16 @@
- conversation_group: cluster_id_from_name
conversation:
- eval_id: create_single_node_cluser
eval_query: Create a multi-node cluster named 'eval-test-ClustER-NAme' with OpenShift 4.18.22 and domain test.local. I do not have an SSH key to provide.
eval_query: Create a multi-node cluster named 'eval-test2-uniq-cluster-name' with OpenShift 4.18.22 and domain test.local. I do not have an SSH key to provide.
eval_types: [response_eval:accuracy, response_eval:sub-string]
expected_keywords: ["eval-test-ClustER-NAme", "ID", "Discovery ISO", "download", "cluster"]
expected_response: I have created a cluster with name eval-test-ClustER-NAme. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
expected_keywords: ["eval-test2-uniq-cluster-name", "ID", "Discovery ISO", "download", "cluster"]
expected_response: I have created a cluster with name eval-test-uniq-cluster-name. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
- eval_id: cluster_name_tool_call
eval_query: Show me information on cluster eval-test-ClustER-NAme
eval_query: Show me information on cluster eval-test2-uniq-cluster-name
eval_types: [tool_eval, response_eval:sub-string]
expected_tool_calls:
- - tool_name: list_clusters
arguments: {}
- - tool_name: cluster_info
arguments:
cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
expected_keywords: ["cluster", "eval-test-ClustER-NAme", "test.local", "4.18.22"]
expected_keywords: ["cluster", "eval-test2-uniq-cluster-name", "test.local", "4.18.22"]
description: Test handling requesting a cluster by name
2 changes: 1 addition & 1 deletion test/prow/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@ cd $TEMP_DIR
echo "$OCM_TOKEN" > ocm_token.txt
echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env

sed -i "s/ClustER-NAme/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Make sed replacement robust to special chars in UNIQUE_ID and quote paths.

If UNIQUE_ID contains “&” or the path has spaces, the current sed may misbehave. Quote variables and escape “&” in the replacement.

Apply:

- sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
+ UNIQUE_ESCAPED=${UNIQUE_ID//&/\\&}
+ sed -i "s|uniq-cluster-name|${UNIQUE_ESCAPED}|g" "$TEST_DIR/eval_data.yaml"

Alternatively, use perl for fully safe replacement:

perl -0777 -pe 's/\Quniq-cluster-name\E/$ENV{UNIQUE_ID}/g' "$TEST_DIR/eval_data.yaml" > "$TEMP_DIR/eval_data.patched" && mv "$TEMP_DIR/eval_data.patched" "$TEST_DIR/eval_data.yaml"

🏁 Script executed:

#!/bin/bash
# Dry-run preview of substitutions to ensure only the intended tokens change
rg -n "uniq-cluster-name" test/evals/eval_data.yaml

Length of output: 2417


Fix sed replacement to handle '&' in UNIQUE_ID and quote $TEST_DIR

File: test/prow/entrypoint.sh:23 — rg shows multiple occurrences of "uniq-cluster-name" in test/evals/eval_data.yaml; the current sed will mis-handle '&' in UNIQUE_ID and can break if $TEST_DIR contains spaces.

Apply:

- sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
+ UNIQUE_ESCAPED=${UNIQUE_ID//&/\\&}
+ sed -i "s|uniq-cluster-name|${UNIQUE_ESCAPED}|g" "$TEST_DIR/eval_data.yaml"

Alternatively, for a fully robust replacement (handles arbitrary chars), use perl:

perl -0777 -pe 's/\Quniq-cluster-name\E/$ENV{UNIQUE_ID}/g' "$TEST_DIR/eval_data.yaml" > "$TEMP_DIR/eval_data.patched" && mv "$TEMP_DIR/eval_data.patched" "$TEST_DIR/eval_data.yaml"
🤖 Prompt for AI Agents
In test/prow/entrypoint.sh around line 23, the sed invocation isn't robust:
$TEST_DIR should be quoted to handle spaces, and sed's replacement will
mis-handle '&' characters in $UNIQUE_ID. Fix by quoting
"$TEST_DIR/eval_data.yaml" and either (a) escape any '&' (and sed delimiter
chars) in $UNIQUE_ID before passing it to sed, or (b) replace the sed step with
the suggested perl approach that uses \Q...\E (or $ENV{UNIQUE_ID}) to perform a
safe, global replacement that tolerates arbitrary characters; ensure the
temp-file->mv pattern is used to atomically update the file.


python $TEST_DIR/eval.py --agent_endpoint "${AGENT_URL}:${AGENT_PORT}" --agent_auth_token_file $TEMP_DIR/ocm_token.txt --eval_data_yaml $TEST_DIR/eval_data.yaml