-
Notifications
You must be signed in to change notification settings - Fork 21
Fix(test): Update evaluation data #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRenamed the cluster placeholder to "uniq-cluster-name" across eval blocks and updated all corresponding queries, tool-call refs, IDs, and expected responses/keywords; removed a substring-level eval and trimmed expected_tool_calls in one block; updated prow entrypoint sed to target the new placeholder. Changes
Sequence Diagram(s)Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eranco74 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
test/evals/eval_data.yaml (1)
91-102: Same brittleness warning for multi-node workflow accuracy checks.Ensure the agent’s exact phrasing matches expected_response and the keywords after UNIQUE_ID substitution; otherwise accuracy evals will flap.
Use the same script above to verify these blocks too.
Also applies to: 112-120
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
test/evals/eval_data.yaml(5 hunks)test/prow/entrypoint.sh(1 hunks)
🔇 Additional comments (2)
test/evals/eval_data.yaml (2)
33-37: Good change: switched to exact-match eval and removed keyword check.Moving to response_eval:accuracy with tool_eval and dropping expected_keywords aligns with the stated objective and removes redundant substring checks.
64-76: Accuracy eval is brittle — confirm exact agent wording or relax to substring
- expected_response is an exact-match; punctuation or casing differences will fail. Confirm the agent returns this identical sentence after UNIQUE_ID templating, or change eval_types to response_eval:sub-string (recommended).
Location: test/evals/eval_data.yaml lines 74–76 (SNO creation expected_response/expected_keywords). Also review other response_eval:accuracy entries in this file (e.g., lines 50, 57, 148, 170, 178).
| echo "GEMINI_API_KEY=${GEMINI_API_KEY}" > .env | ||
|
|
||
| sed -i "s/ClustER-NAme/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml | ||
| sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Make sed replacement robust to special chars in UNIQUE_ID and quote paths.
If UNIQUE_ID contains “&” or the path has spaces, the current sed may misbehave. Quote variables and escape “&” in the replacement.
Apply:
- sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
+ UNIQUE_ESCAPED=${UNIQUE_ID//&/\\&}
+ sed -i "s|uniq-cluster-name|${UNIQUE_ESCAPED}|g" "$TEST_DIR/eval_data.yaml"Alternatively, use perl for fully safe replacement:
perl -0777 -pe 's/\Quniq-cluster-name\E/$ENV{UNIQUE_ID}/g' "$TEST_DIR/eval_data.yaml" > "$TEMP_DIR/eval_data.patched" && mv "$TEMP_DIR/eval_data.patched" "$TEST_DIR/eval_data.yaml"🏁 Script executed:
#!/bin/bash
# Dry-run preview of substitutions to ensure only the intended tokens change
rg -n "uniq-cluster-name" test/evals/eval_data.yamlLength of output: 2417
Fix sed replacement to handle '&' in UNIQUE_ID and quote $TEST_DIR
File: test/prow/entrypoint.sh:23 — rg shows multiple occurrences of "uniq-cluster-name" in test/evals/eval_data.yaml; the current sed will mis-handle '&' in UNIQUE_ID and can break if $TEST_DIR contains spaces.
Apply:
- sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
+ UNIQUE_ESCAPED=${UNIQUE_ID//&/\\&}
+ sed -i "s|uniq-cluster-name|${UNIQUE_ESCAPED}|g" "$TEST_DIR/eval_data.yaml"Alternatively, for a fully robust replacement (handles arbitrary chars), use perl:
perl -0777 -pe 's/\Quniq-cluster-name\E/$ENV{UNIQUE_ID}/g' "$TEST_DIR/eval_data.yaml" > "$TEMP_DIR/eval_data.patched" && mv "$TEMP_DIR/eval_data.patched" "$TEST_DIR/eval_data.yaml"🤖 Prompt for AI Agents
In test/prow/entrypoint.sh around line 23, the sed invocation isn't robust:
$TEST_DIR should be quoted to handle spaces, and sed's replacement will
mis-handle '&' characters in $UNIQUE_ID. Fix by quoting
"$TEST_DIR/eval_data.yaml" and either (a) escape any '&' (and sed delimiter
chars) in $UNIQUE_ID before passing it to sed, or (b) replace the sed step with
the suggested perl approach that uses \Q...\E (or $ENV{UNIQUE_ID}) to perform a
safe, global replacement that tolerates arbitrary characters; ensure the
temp-file->mv pattern is used to atomically update the file.
- Replaced "ClustER-NAme" with a valid cluster name to allow local eval tests run - Removed `response_eval:sub-string` and `expected_keywords` because `response_eval:accuracy` already performs an exact match, making the substring check unnecessary. - The cluster name in the `cluster_id_from_name` conversation was changed from `eval-test-uniq-cluster-name` to `eval-test2-uniq-cluster-name` to avoid potential name conflicts with other tests. - Removed list_clusters from the expected tool calls in cluster_id_from_name since the cluster name and ID are part of the current conversation context Signed-off-by: Eran Cohen <[email protected]>
|
/lgtm |
|
/retest required |
|
@eranco74: The The following commands are available to trigger optional jobs: Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
ced2c4a
into
rh-ecosystem-edge:main
response_eval:sub-stringandexpected_keywordsbecauseresponse_eval:accuracyalready performs an exact match, making the substring check unnecessary.cluster_id_from_nameconversation was changed fromeval-test-uniq-cluster-nametoeval-test2-uniq-cluster-nameto avoid potential name conflicts with other tests.Summary by CodeRabbit