-
Notifications
You must be signed in to change notification settings - Fork 21
MGMT-21240: improve eval test coverage #182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-21240: improve eval test coverage #182
Conversation
|
@zszabo-rh: This pull request references MGMT-21240 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughExpanded and restructured test/evals/eval_data.yaml to add tool-driven, multi-step evaluation flows: SNO and multinode cluster creation with ISO retrieval, cluster listing and info (including error handling), operator bundles listing, SSH key validation, and non-disclosure checks. Updated evaluations to use tool_eval and substring response checks with revised descriptions and keywords. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant A as Assistant
participant T as Tools
rect rgb(230,245,255)
note over U,A: SNO creation with ISO retrieval
U->>A: Request SNO cluster and ISO
A->>T: create_cluster(name, version, base_domain, single_node, cpu_arch, ssh_public_key)
T-->>A: cluster_id
A->>T: cluster_iso_download_url(cluster_id)
T-->>A: download_url
A-->>U: Return cluster_id and ISO URL
end
sequenceDiagram
autonumber
participant U as User
participant A as Assistant
participant T as Tools
rect rgb(235,255,235)
note over U,A: Multinode cluster workflow
U->>A: Create multinode cluster, set SSH, get ISO
A->>T: create_cluster(name, version, base_domain)
T-->>A: cluster_id
A->>T: set_cluster_ssh_key(cluster_id, ssh_public_key)
T-->>A: status: updated
A->>T: cluster_iso_download_url(cluster_id)
T-->>A: download_url
A-->>U: Summarize ID, SSH update, ISO URL
end
sequenceDiagram
autonumber
participant U as User
participant A as Assistant
participant T as Tools
rect rgb(255,240,240)
note over U,A: Cluster info with not-found handling
U->>A: Get info for cluster abc123
A->>T: cluster_info(cluster_id="abc123")
T-->>A: error: not found
A-->>U: Report not found for abc123
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
@zszabo-rh: This pull request references MGMT-21240 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
eranco74
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
/lgtm
| - conversation_group: cluster_creation_with_iso_conv | ||
| description: Test sequential tool calling for cluster creation and ISO retrieval | ||
| - conversation_group: sno_creation_with_all_info_conv | ||
| description: Create SNO and then retrieve Discovery ISO in two steps with all the information provided |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (10)
test/evals/eval_data.yaml (10)
33-38: Drop brittle accuracy check; rely on tool_eval + substringsExact-match
response_eval:accuracywill flap on harmless wording/casing changes. Keep this test resilient by removing it and the hardcoded sentence.- eval_types: [response_eval:accuracy, tool_eval, response_eval:sub-string] - expected_response: "The operators that can be installed onto clusters are OpenShift AI and OpenShift Virtualization." + eval_types: [tool_eval, response_eval:sub-string]
54-69: Scope the SNO create-step keywords to creation onlyRequiring "Discovery ISO" and "download" at creation time over-constrains the agent; those belong to the next step.
- expected_keywords: ["eval-test-sno", "4.19.7", "ID", "Discovery ISO", "download"] + expected_keywords: ["eval-test-sno", "4.19.7", "ID"]
70-78: Anchor and case-normalize UUID regex; move "download" keyword hereCurrent UUID regex can overmatch and excludes uppercase. Anchor it and make it case-insensitive. Add "download" here.
- arguments: - cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}" - expected_keywords: ["Discovery ISO"] + arguments: + cluster_id: "(?i)^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$" + expected_keywords: ["Discovery ISO", "download"]
82-94: Don’t enforce empty ssh_public_key on createSome agents omit fields rather than sending empty strings; enforcing
ssh_public_key: ""may cause false negatives. Prefer not asserting the field at all here.arguments: name: "eval-test-multinode" version: "4\\.18\\.22" base_domain: "test\\.local" single_node: "(?i:false)" cpu_architecture: "x86_64" - ssh_public_key: ""
95-103: Anchor and case-normalize UUID regex for set_ssh_keyTighten the pattern to avoid partial matches and allow uppercase hex.
- cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}" + cluster_id: "(?i)^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"
104-111: Anchor and case-normalize UUID regex for ISO retrievalSame UUID tightening as above.
- cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}" + cluster_id: "(?i)^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"
123-134: Cluster info negative-path looks good; consider surfacing status codeOptional: include keywords like "404" or "not found" reason to make the check more explicit about HTTP failure semantics. Current setup is acceptable.
135-142: Replace exact message with substring checks for invalid SSH keyAn exact
expected_responseis brittle; wording varies (e.g., "ECDSA" vs "ecdsa-sha2-nistp256"). Prefer robust substrings.- eval_types: [response_eval:accuracy] - expected_response: The cluster creation failed because the provided SSH key is not in a supported format. Supported formats are ssh-rsa, ssh-ed25519, and ecdsa. Please provide the SSH key in one of these formats. + eval_types: [response_eval:sub-string] + expected_keywords: ["not in a supported format", "ssh-rsa", "ssh-ed25519", "ecdsa"]
146-149: Broaden refusal keywords to reduce phrasing sensitivityRelying on "internal system details" plus one helper phrase might miss valid refusals (e.g., "can't share", "cannot disclose"). Consider adding refusal tokens like "cannot", "can't", "share/disclose" to tolerate stylistic variance.
149-149: Add trailing newlineYAML lint flagged missing newline at EOF; add one to keep linters happy.
- expected_keywords: ["internal system details", "help you with your OpenShift task"] + expected_keywords: ["internal system details", "help you with your OpenShift task"] +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
test/evals/eval_data.yaml(2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml
[error] 149-149: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (1)
test/evals/eval_data.yaml (1)
113-121: LGTM: succinct list_clusters tool evalSolid, deterministic tool call with supportive substrings.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eranco74, zszabo-rh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
3fe1ae5
into
rh-ecosystem-edge:main
just a few more tests, focusing mainly on tool calls and using the new regex capabilities for validating arguments
Summary by CodeRabbit