Skip to content

Conversation

@eranco74
Copy link
Collaborator

@eranco74 eranco74 commented Sep 14, 2025

  • Replaced "ClustER-NAme" with a valid cluster name to allow local eval tests run
  • Removed response_eval:sub-string and expected_keywords because response_eval:accuracy already performs an exact match, making the substring check unnecessary.
  • The cluster name in the cluster_id_from_name conversation was changed from eval-test-uniq-cluster-name to eval-test2-uniq-cluster-name to avoid potential name conflicts with other tests.
  • Removed list_clusters from the expected tool calls in cluster_id_from_name since the cluster name and ID are part of the current conversation context

Summary by CodeRabbit

  • Tests
    • Standardized the cluster name placeholder to “uniq-cluster-name” across all eval scenarios, updating queries, expected responses, and keyword checks.
    • Simplified specific evaluations by removing substring-level checks and dropping a redundant expected-keyword assertion.
    • Adjusted evaluation steps to reflect a reduced set of expected tool-call checks.
  • Chores
    • Updated the test runner substitution to use the new placeholder during test execution.

@openshift-ci openshift-ci bot requested review from carbonin and omertuc September 14, 2025 17:38
@coderabbitai
Copy link

coderabbitai bot commented Sep 14, 2025

Walkthrough

Renamed the cluster placeholder to "uniq-cluster-name" across eval blocks and updated all corresponding queries, tool-call refs, IDs, and expected responses/keywords; removed a substring-level eval and trimmed expected_tool_calls in one block; updated prow entrypoint sed to target the new placeholder.

Changes

Cohort / File(s) Summary
Eval data renames and expectation updates
test/evals/eval_data.yaml
Replaced all cluster-name variants with uniq-cluster-name across eval IDs, eval_query text, tool call args, get_iso/cluster_info references, expected_response and expected_keywords; removed substring eval type and its expected_keywords in available_operators_conv; removed list_clusters from expected_tool_calls in cluster_id_from_name.
Prow entrypoint substitution
test/prow/entrypoint.sh
Updated sed target to replace uniq-cluster-name with ${UNIQUE_ID} in test/evals/eval_data.yaml; no other script logic changed.

Sequence Diagram(s)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • omertuc
  • carbonin

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Fix(test): Update evaluation data" is concise and accurately summarizes the primary purpose of the changeset—modifying evaluation test data and expectations. The PR's edits (replacing an invalid cluster name, adjusting expected responses/keywords, and removing unnecessary eval steps) are all encompassed by "Update evaluation data." This makes the title appropriate for reviewers scanning project history.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between faa8c30 and 23221f2.

📒 Files selected for processing (2)
  • test/evals/eval_data.yaml (6 hunks)
  • test/prow/entrypoint.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/evals/eval_data.yaml
  • test/prow/entrypoint.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
  • GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Sep 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eranco74

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
test/evals/eval_data.yaml (1)

91-102: Same brittleness warning for multi-node workflow accuracy checks.

Ensure the agent’s exact phrasing matches expected_response and the keywords after UNIQUE_ID substitution; otherwise accuracy evals will flap.

Use the same script above to verify these blocks too.

Also applies to: 112-120

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between de2e8a0 and faa8c30.

📒 Files selected for processing (2)
  • test/evals/eval_data.yaml (5 hunks)
  • test/prow/entrypoint.sh (1 hunks)
🔇 Additional comments (2)
test/evals/eval_data.yaml (2)

33-37: Good change: switched to exact-match eval and removed keyword check.

Moving to response_eval:accuracy with tool_eval and dropping expected_keywords aligns with the stated objective and removes redundant substring checks.


64-76: Accuracy eval is brittle — confirm exact agent wording or relax to substring

  • expected_response is an exact-match; punctuation or casing differences will fail. Confirm the agent returns this identical sentence after UNIQUE_ID templating, or change eval_types to response_eval:sub-string (recommended).

Location: test/evals/eval_data.yaml lines 74–76 (SNO creation expected_response/expected_keywords). Also review other response_eval:accuracy entries in this file (e.g., lines 50, 57, 148, 170, 178).

echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env

sed -i "s/ClustER-NAme/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Make sed replacement robust to special chars in UNIQUE_ID and quote paths.

If UNIQUE_ID contains “&” or the path has spaces, the current sed may misbehave. Quote variables and escape “&” in the replacement.

Apply:

- sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
+ UNIQUE_ESCAPED=${UNIQUE_ID//&/\\&}
+ sed -i "s|uniq-cluster-name|${UNIQUE_ESCAPED}|g" "$TEST_DIR/eval_data.yaml"

Alternatively, use perl for fully safe replacement:

perl -0777 -pe 's/\Quniq-cluster-name\E/$ENV{UNIQUE_ID}/g' "$TEST_DIR/eval_data.yaml" > "$TEMP_DIR/eval_data.patched" && mv "$TEMP_DIR/eval_data.patched" "$TEST_DIR/eval_data.yaml"

🏁 Script executed:

#!/bin/bash
# Dry-run preview of substitutions to ensure only the intended tokens change
rg -n "uniq-cluster-name" test/evals/eval_data.yaml

Length of output: 2417


Fix sed replacement to handle '&' in UNIQUE_ID and quote $TEST_DIR

File: test/prow/entrypoint.sh:23 — rg shows multiple occurrences of "uniq-cluster-name" in test/evals/eval_data.yaml; the current sed will mis-handle '&' in UNIQUE_ID and can break if $TEST_DIR contains spaces.

Apply:

- sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
+ UNIQUE_ESCAPED=${UNIQUE_ID//&/\\&}
+ sed -i "s|uniq-cluster-name|${UNIQUE_ESCAPED}|g" "$TEST_DIR/eval_data.yaml"

Alternatively, for a fully robust replacement (handles arbitrary chars), use perl:

perl -0777 -pe 's/\Quniq-cluster-name\E/$ENV{UNIQUE_ID}/g' "$TEST_DIR/eval_data.yaml" > "$TEMP_DIR/eval_data.patched" && mv "$TEMP_DIR/eval_data.patched" "$TEST_DIR/eval_data.yaml"
🤖 Prompt for AI Agents
In test/prow/entrypoint.sh around line 23, the sed invocation isn't robust:
$TEST_DIR should be quoted to handle spaces, and sed's replacement will
mis-handle '&' characters in $UNIQUE_ID. Fix by quoting
"$TEST_DIR/eval_data.yaml" and either (a) escape any '&' (and sed delimiter
chars) in $UNIQUE_ID before passing it to sed, or (b) replace the sed step with
the suggested perl approach that uses \Q...\E (or $ENV{UNIQUE_ID}) to perform a
safe, global replacement that tolerates arbitrary characters; ensure the
temp-file->mv pattern is used to atomically update the file.

- Replaced "ClustER-NAme" with a valid cluster name to allow local eval tests run
- Removed `response_eval:sub-string` and `expected_keywords` because `response_eval:accuracy` already performs an exact match, making the substring check unnecessary.
- The cluster name in the `cluster_id_from_name` conversation was changed from `eval-test-uniq-cluster-name` to `eval-test2-uniq-cluster-name` to avoid potential name conflicts with other tests.
- Removed list_clusters from the expected tool calls in cluster_id_from_name since the cluster name and ID are part of the current conversation context

Signed-off-by: Eran Cohen <[email protected]>
@maorfr
Copy link
Collaborator

maorfr commented Sep 15, 2025

/lgtm

@eranco74
Copy link
Collaborator Author

/retest required

@openshift-ci
Copy link

openshift-ci bot commented Sep 15, 2025

@eranco74: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test images

The following commands are available to trigger optional jobs:

/test eval-test

Use /test all to run all jobs.

In response to this:

/retest required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@eranco74
Copy link
Collaborator Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants