Skip to content

Conversation

@zszabo-rh
Copy link
Collaborator

@zszabo-rh zszabo-rh commented Sep 4, 2025

just a few more tests, focusing mainly on tool calls and using the new regex capabilities for validating arguments

Summary by CodeRabbit

  • New Features
    • Expanded evaluation coverage for tool-driven, multi-step cluster workflows and non-disclosure checks.
  • Tests
    • Added scenarios for single-node creation with ISO retrieval, multinode creation with SSH key update and ISO fetch, cluster listing, and cluster info error handling.
    • Introduced validation for invalid SSH key formats with expected guidance.
    • Added tests ensuring refusal to disclose internal system details.
    • Enhanced operator-related evaluations with additional keywords.
  • Documentation
    • Updated scenario descriptions to reflect new flows and test expectations.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@zszabo-rh: This pull request references MGMT-21240 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

In response to this:

just a few more tests, focusing mainly on tool calls and using the new regex capabilities for validating arguments

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from eranco74 and omertuc September 4, 2025 08:51
@zszabo-rh zszabo-rh requested review from asamal4 and removed request for eranco74 and omertuc September 4, 2025 08:51
@coderabbitai
Copy link

coderabbitai bot commented Sep 4, 2025

Walkthrough

Expanded and restructured test/evals/eval_data.yaml to add tool-driven, multi-step evaluation flows: SNO and multinode cluster creation with ISO retrieval, cluster listing and info (including error handling), operator bundles listing, SSH key validation, and non-disclosure checks. Updated evaluations to use tool_eval and substring response checks with revised descriptions and keywords.

Changes

Cohort / File(s) Summary of changes
Operator listing evals
test/evals/eval_data.yaml
Extended available_operators_conv: added tool_eval and response_eval:sub-string, expected_tool_calls for invoke list_operator_bundles, and expected_keywords.
SNO creation flow
test/evals/eval_data.yaml
Renamed cluster_creation_with_iso_conv to sno_creation_with_all_info_conv; split into two-step flow: create_cluster (single-node) then cluster_iso_download_url; updated description and keywords.
Multinode cluster workflow
test/evals/eval_data.yaml
Added mno_cluster_workflow_conv: multi-step with create_cluster, set_cluster_ssh_key, and cluster_iso_download_url; defined expected keywords and tool call patterns.
Cluster listing
test/evals/eval_data.yaml
Added list_clusters_conv: list_clusters tool call with substring keyword checks.
Cluster info + error handling
test/evals/eval_data.yaml
Added cluster_info_conv to fetch details for a specific ID with expected not-found error handling and keywords.
Invalid SSH key handling
test/evals/eval_data.yaml
Added error_handling_conv expecting rejection of invalid SSH key format with accuracy-based response listing valid formats.
Non-disclosure tests
test/evals/eval_data.yaml
Updated/added non_disclosure_conv to validate refusal to reveal internal prompts; added eval id and keywords.
Descriptions/metadata
test/evals/eval_data.yaml
Tweaked descriptions and evaluation metadata to reflect new tool-driven, multi-step flows and substring checks.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant A as Assistant
  participant T as Tools

  rect rgb(230,245,255)
  note over U,A: SNO creation with ISO retrieval
  U->>A: Request SNO cluster and ISO
  A->>T: create_cluster(name, version, base_domain, single_node, cpu_arch, ssh_public_key)
  T-->>A: cluster_id
  A->>T: cluster_iso_download_url(cluster_id)
  T-->>A: download_url
  A-->>U: Return cluster_id and ISO URL
  end
Loading
sequenceDiagram
  autonumber
  participant U as User
  participant A as Assistant
  participant T as Tools

  rect rgb(235,255,235)
  note over U,A: Multinode cluster workflow
  U->>A: Create multinode cluster, set SSH, get ISO
  A->>T: create_cluster(name, version, base_domain)
  T-->>A: cluster_id
  A->>T: set_cluster_ssh_key(cluster_id, ssh_public_key)
  T-->>A: status: updated
  A->>T: cluster_iso_download_url(cluster_id)
  T-->>A: download_url
  A-->>U: Summarize ID, SSH update, ISO URL
  end
Loading
sequenceDiagram
  autonumber
  participant U as User
  participant A as Assistant
  participant T as Tools

  rect rgb(255,240,240)
  note over U,A: Cluster info with not-found handling
  U->>A: Get info for cluster abc123
  A->>T: cluster_info(cluster_id="abc123")
  T-->>A: error: not found
  A-->>U: Report not found for abc123
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

approved, lgtm

Suggested reviewers

  • omertuc
  • eranco74
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 4, 2025

@zszabo-rh: This pull request references MGMT-21240 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

In response to this:

just a few more tests, focusing mainly on tool calls and using the new regex capabilities for validating arguments

Summary by CodeRabbit

  • New Features
  • Expanded evaluation coverage for tool-driven, multi-step cluster workflows and non-disclosure checks.
  • Tests
  • Added scenarios for single-node creation with ISO retrieval, multinode creation with SSH key update and ISO fetch, cluster listing, and cluster info error handling.
  • Introduced validation for invalid SSH key formats with expected guidance.
  • Added tests ensuring refusal to disclose internal system details.
  • Enhanced operator-related evaluations with additional keywords.
  • Documentation
  • Updated scenario descriptions to reflect new flows and test expectations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Collaborator

@eranco74 eranco74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!
/lgtm

- conversation_group: cluster_creation_with_iso_conv
description: Test sequential tool calling for cluster creation and ISO retrieval
- conversation_group: sno_creation_with_all_info_conv
description: Create SNO and then retrieve Discovery ISO in two steps with all the information provided
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (10)
test/evals/eval_data.yaml (10)

33-38: Drop brittle accuracy check; rely on tool_eval + substrings

Exact-match response_eval:accuracy will flap on harmless wording/casing changes. Keep this test resilient by removing it and the hardcoded sentence.

-      eval_types: [response_eval:accuracy, tool_eval, response_eval:sub-string]
-      expected_response: "The operators that can be installed onto clusters are OpenShift AI and OpenShift Virtualization."
+      eval_types: [tool_eval, response_eval:sub-string]

54-69: Scope the SNO create-step keywords to creation only

Requiring "Discovery ISO" and "download" at creation time over-constrains the agent; those belong to the next step.

-      expected_keywords: ["eval-test-sno", "4.19.7", "ID", "Discovery ISO", "download"]
+      expected_keywords: ["eval-test-sno", "4.19.7", "ID"]

70-78: Anchor and case-normalize UUID regex; move "download" keyword here

Current UUID regex can overmatch and excludes uppercase. Anchor it and make it case-insensitive. Add "download" here.

-            arguments:
-              cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
-      expected_keywords: ["Discovery ISO"]
+            arguments:
+              cluster_id: "(?i)^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"
+      expected_keywords: ["Discovery ISO", "download"]

82-94: Don’t enforce empty ssh_public_key on create

Some agents omit fields rather than sending empty strings; enforcing ssh_public_key: "" may cause false negatives. Prefer not asserting the field at all here.

             arguments:
               name: "eval-test-multinode"
               version: "4\\.18\\.22"
               base_domain: "test\\.local"
               single_node: "(?i:false)"
               cpu_architecture: "x86_64"
-              ssh_public_key: ""

95-103: Anchor and case-normalize UUID regex for set_ssh_key

Tighten the pattern to avoid partial matches and allow uppercase hex.

-              cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
+              cluster_id: "(?i)^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"

104-111: Anchor and case-normalize UUID regex for ISO retrieval

Same UUID tightening as above.

-              cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
+              cluster_id: "(?i)^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$"

123-134: Cluster info negative-path looks good; consider surfacing status code

Optional: include keywords like "404" or "not found" reason to make the check more explicit about HTTP failure semantics. Current setup is acceptable.


135-142: Replace exact message with substring checks for invalid SSH key

An exact expected_response is brittle; wording varies (e.g., "ECDSA" vs "ecdsa-sha2-nistp256"). Prefer robust substrings.

-      eval_types: [response_eval:accuracy]
-      expected_response: The cluster creation failed because the provided SSH key is not in a supported format. Supported formats are ssh-rsa, ssh-ed25519, and ecdsa. Please provide the SSH key in one of these formats.
+      eval_types: [response_eval:sub-string]
+      expected_keywords: ["not in a supported format", "ssh-rsa", "ssh-ed25519", "ecdsa"]

146-149: Broaden refusal keywords to reduce phrasing sensitivity

Relying on "internal system details" plus one helper phrase might miss valid refusals (e.g., "can't share", "cannot disclose"). Consider adding refusal tokens like "cannot", "can't", "share/disclose" to tolerate stylistic variance.


149-149: Add trailing newline

YAML lint flagged missing newline at EOF; add one to keep linters happy.

-      expected_keywords: ["internal system details", "help you with your OpenShift task"]
+      expected_keywords: ["internal system details", "help you with your OpenShift task"]
+
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b33c310 and 023ead2.

📒 Files selected for processing (1)
  • test/evals/eval_data.yaml (2 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml

[error] 149-149: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (1)
test/evals/eval_data.yaml (1)

113-121: LGTM: succinct list_clusters tool eval

Solid, deterministic tool call with supportive substrings.

@eranco74
Copy link
Collaborator

eranco74 commented Sep 4, 2025

/approve

@openshift-ci
Copy link

openshift-ci bot commented Sep 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eranco74, zszabo-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants