Integrate QE into dev suite and add filtering system #246

ItzikEzra-rh · 2025-11-09T14:39:10Z

Summary

This PR integrates the QE evaluation tests into the dev suite and adds a tag-based filtering system to allow users to run specific subsets of tests (e.g., smoke tests every pr) instead of running all tests.

`test/evals/eval_data.yaml`

Added tags: [smoke] to all dev suite tests

`test/evals/eval.py`

Added --tags command-line argument (optional) to filter tests by tags
Implemented filter_by_tags() function that:
- Filters YAML evaluation data based on provided tags
- Creates a temporary filtered YAML file when tags are specified
- Returns the original path when no tags are provided (runs all tests)
- Exits with an error message if no tests match the specified tags

Summary by CodeRabbit

New Features
- CLI tag-based filtering for targeted evaluation runs, with notices when no tests match and summarized test counts.
- Improved evaluation reporting with a result summary and non-zero exit when failures occur.
Tests
- Many new evaluation groups added (platforms, integrations, workflows, guided assistance, troubleshooting).
- Tests annotated with tags, descriptions, and expected intents for intent-focused scenarios.
Bug Fixes
- Test data contains unresolved merge conflict markers—needs resolution.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-09T14:39:18Z

Walkthrough

Adds a --tags CLI option and a filter_by_tags(path, tags) helper that writes a temporary filtered YAML and reassigns args.eval_data_yaml. Extends evaluation flow to compute a result summary, count FAIL/ERROR results, and exit non‑zero on failures. Enriches eval_data.yaml with tags, descriptions, expected intents, and many new groups.

Changes

Cohort / File(s)	Summary
Evaluation script enhancements `test/evals/eval.py`	Added `--tags` CLI option; imported `yaml` and `tempfile`; implemented `filter_by_tags(path, tags)` to load YAML, select `conversation_group` entries whose `tags` intersect the requested tags, error if none match, write filtered YAML to a temporary file and return its path; apply filtering to `args.eval_data_yaml` before evaluation; compute `evaluator.get_result_summary()`, count FAIL/ERROR, and exit non‑zero when failures exist.
Test data updates `test/evals/eval_data.yaml`	Added `tags: [smoke]`, `description`, and `expected_intent` fields to many `conversation_group` entries; expanded `eval_types` (e.g., `response_eval:intent`); appended numerous new conversation groups covering manifests, platform integrations, guided assistance, troubleshooting, and non‑destructive actions. Merge conflict markers present in the file.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script as eval.py
    participant YAML as eval_data.yaml
    participant Evaluator

    User->>Script: run eval.py --tags=smoke
    Script->>Script: parse_args()
    rect rgb(220,235,255)
        Note over Script,YAML: Tag filtering (new)
        Script->>YAML: filter_by_tags(path, tags)
        YAML-->>Script: filtered_yaml_path
        Script->>Script: args.eval_data_yaml = filtered_yaml_path
    end
    Script->>Evaluator: run evaluation with filtered YAML
    Evaluator->>Evaluator: evaluate tests
    rect rgb(235,225,245)
        Note over Evaluator,Script: Result summary (new)
        Evaluator-->>Script: get_result_summary()
        Script->>Script: count FAIL/ERROR
    end
    alt Failures present
        Script->>User: exit(1)
    else All pass
        Script->>User: exit(0)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Inspect filter_by_tags() YAML parsing, tag-intersection logic, error handling when no matches exist.
Verify temporary file creation, write correctness, and any cleanup implications.
Confirm reassignment of args.eval_data_yaml integrates with evaluator paths.
Validate get_result_summary() usage and correct detection/counting of FAIL/ERROR statuses.
Review test/evals/eval_data.yaml for schema consistency and address merge conflict markers.

Possibly related PRs

making the evaluation tests more stable #213 — Modifies test/evals/eval.py evaluation control flow and CLI behavior; closely related to these script changes.
MGMT-21148: Add initial eval tests #40 — Alters the same evaluation tooling and eval_data.yaml; related at the code/data level.

Suggested labels

lgtm

Suggested reviewers

carbonin

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the two main changes: integrating QE tests into the dev suite and implementing a tag-based filtering system for test execution.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-11-09T14:39:22Z

Hi @ItzikEzra-rh. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/evals/eval.py (1)

91-103: Consider cleaning up the temporary file.

The temporary file created with delete=False is never explicitly removed, which will accumulate files in the temp directory over time.

Consider using atexit to register cleanup or refactoring to use a context manager:

+import atexit
+
 def filter_by_tags(path, tags):
     """Filter YAML data by tags, return filtered path."""
     if not tags:
         return path
     with open(path) as f:
         data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
     if not data:
         sys.exit(f"⚠️  No tests found with tags: {tags}")
     print(f"📋 Running {len(data)} test(s) with tags: {tags}")
     tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
     yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
     tmp.close()
+    atexit.register(lambda: os.unlink(tmp.name) if os.path.exists(tmp.name) else None)
     return tmp.name

Alternatively, consider using a context manager to ensure cleanup:

import atexit
import os

_temp_files = []

def filter_by_tags(path, tags):
    """Filter YAML data by tags, return filtered path."""
    if not tags:
        return path
    with open(path) as f:
        data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
    if not data:
        sys.exit(f"⚠️  No tests found with tags: {tags}")
    print(f"📋 Running {len(data)} test(s) with tags: {tags}")
    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
    yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
    tmp.close()
    _temp_files.append(tmp.name)
    return tmp.name

def cleanup_temp_files():
    for f in _temp_files:
        try:
            os.unlink(f)
        except OSError:
            pass

atexit.register(cleanup_temp_files)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ed4149 and d762d46.

📒 Files selected for processing (2)

test/evals/eval.py (2 hunks)
test/evals/eval_data.yaml (14 hunks)

🔇 Additional comments (5)

test/evals/eval_data.yaml (2)

2-186: LGTM! Clean smoke test tagging.

The addition of tags: [smoke] to existing conversation groups is consistent and enables effective smoke test filtering via the new CLI option.

204-378: LGTM! Comprehensive intent test coverage.

The new intent-focused conversation groups effectively extend test coverage for high-level response validation. The use of response_eval:intent with descriptive expected outcomes is appropriate for LLM-based evaluation.

test/evals/eval.py (3)

7-8: LGTM! Required imports for filtering.

The yaml and tempfile imports support the new tag-based filtering functionality.

81-86: LGTM! Well-defined CLI argument.

The --tags argument is properly configured with clear help text and appropriate defaults.

112-126: LGTM! Clean integration and proper exit handling.

The filtering is seamlessly integrated, and the evaluation flow properly exits with appropriate status codes based on test results.

coderabbitai · 2025-11-09T14:43:36Z

test/evals/eval.py

 if os.getenv('UNIQUE_ID') is None:
    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
    sys.exit(1)


⚠️ Potential issue | 🟡 Minor

Fix typo in error message.

Line 109 contains a typo: "varialbe" should be "variable".

if os.getenv('UNIQUE_ID') is None: - print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.") + print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.") sys.exit(1)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if os.getenv('UNIQUE_ID') is None:

print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")

sys.exit(1)

if os.getenv('UNIQUE_ID') is None:

print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")

sys.exit(1)

🤖 Prompt for AI Agents

In test/evals/eval.py around lines 108 to 110, the error message printed when UNIQUE_ID is not set contains a typo ("varialbe"); update the string to correct the spelling to "variable" so the message reads: The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly. Ensure no other text changes are made.

test/evals/eval.py

eranco74

Please go over the tests and remove duplicates

eranco74 · 2025-11-11T09:05:04Z

test/evals/eval_data.yaml

+      eval_types: [response_eval:intent]
+      expected_intent: "Providing hardware requirements for multi-node OpenShift clusters including CPU, RAM, and storage specifications for control plane and worker nodes."
+
+- conversation_group: sno_hardware_requirements_intent_conv


Already exists
https://github.com/rh-ecosystem-edge/assisted-chat/pull/246/files#diff-ca8401132bacbf457930a01daa434ddbdb09f8de27332599e3c08c070513dbafR42

test/evals/eval_data.yaml

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/evals/eval.py (1)

81-99: Consider externalizing tag documentation to prevent staleness.

While the verbose tag descriptions address the reviewer's request, hardcoding them in the argument help can become stale as tags evolve. If new tags are added to eval_data.yaml, this help text won't automatically reflect them.

Consider one of these approaches:

Option 1: Reference external documentation

     parser.add_argument(
         "--tags",
         nargs="+",
         default=None,
-        help=(
-            "Filter tests by tags. Optional - if not provided, all tests will be run. "
-            "Available tags: "
-            "'smoke' - Basic smoke tests that verify core functionality and should run quickly "
-            "to catch fundamental issues (e.g., cluster creation requests, version listing, "
-            "basic queries). "
-            "'troubleshooting' - Tests that verify the assistant's ability to help diagnose and "
-            "explain common issues users encounter (e.g., ignition download failures, degraded "
-            "cluster states, console access problems). "
-            "'non-destructive' - Tests that verify the assistant correctly refuses or handles "
-            "destructive operations without actually performing them (e.g., refusing to delete "
-            "clusters, declining to create deletion scripts). "
-            "Example: --tags smoke troubleshooting"
-        ),
+        help=(
+            "Filter tests by tags. Optional - if not provided, all tests will be run. "
+            "See eval_data.yaml or docs/test-tags.md for available tags and descriptions. "
+            "Example: --tags smoke troubleshooting"
+        ),
     )

Option 2: Dynamically extract tags from eval_data.yaml (more complex but stays in sync)

Add a function to read available tags:

def get_available_tags(yaml_path):
    """Extract unique tags from eval data YAML."""
    try:
        with open(yaml_path) as f:
            data = yaml.safe_load(f)
            tags = set()
            for group in data:
                tags.update(group.get('tags', []))
            return sorted(tags)
    except Exception:
        return []

Then update the help text:

available_tags = get_available_tags("eval_data.yaml")
tag_list = ", ".join(available_tags) if available_tags else "check eval_data.yaml"

parser.add_argument(
    "--tags",
    nargs="+",
    default=None,
    help=(
        f"Filter tests by tags. Available tags: {tag_list}. "
        "Example: --tags smoke troubleshooting"
    ),
)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0a6584 and 3905a74.

📒 Files selected for processing (1)

test/evals/eval.py (2 hunks)

🔇 Additional comments (3)

test/evals/eval.py (3)

7-8: LGTM! Imports support the new filtering functionality.

The yaml and tempfile imports are appropriately used for loading, filtering, and creating temporary YAML files.

125-125: LGTM! Clean integration of tag filtering.

The filter is appropriately applied before evaluation, and the path reassignment is straightforward.

130-139: Manually verify the behavior of AgentGoalEval.get_result_summary() before deciding on defensive checks.

The external lsc_agent_eval library is not publicly documented and cannot be inspected from the codebase. While the review comment's suggestion to add defensive checks is sound defensive programming practice, whether it's strictly necessary depends on the library's actual behavior:

Check the library's source code or tests to confirm whether get_result_summary() always returns a non-None dict with "FAIL" and "ERROR" keys.

If the library guarantees this contract, the current code is acceptable.

If the library can return None or incomplete dicts, apply the defensive checks suggested in the review comment.

coderabbitai · 2025-11-11T14:26:22Z

test/evals/eval.py

+def filter_by_tags(path, tags):
+    """Filter YAML data by tags, return filtered path."""
+    if not tags:
+        return path
+    with open(path) as f:
+        data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
+    if not data:
+        sys.exit(f"⚠️  No tests found with tags: {tags}")
+    print(f"📋 Running {len(data)} test(s) with tags: {tags}")
+    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
+    yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
+    tmp.close()
+    return tmp.name
+


⚠️ Potential issue | 🔴 Critical

Critical: Temporary file is never cleaned up.

The temporary file created with delete=False is never removed, causing a resource leak. Each evaluation run with --tags will leave an orphaned temp file in the system's temp directory.

Consider using a context manager or ensuring cleanup. Here are two solutions:

Solution 1: Use context manager and clean up after evaluation

def filter_by_tags(path, tags): """Filter YAML data by tags, return filtered path.""" if not tags: return path with open(path) as f: data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)] if not data: sys.exit(f"⚠️ No tests found with tags: {tags}") print(f"📋 Running {len(data)} test(s) with tags: {tags}") tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) yaml.dump(data, tmp, default_flow_style=False, sort_keys=False) tmp.close() return tmp.name # Parse command line arguments args = parse_args() if os.getenv('UNIQUE_ID') is None: print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.") sys.exit(1) -args.eval_data_yaml = filter_by_tags(args.eval_data_yaml, args.tags) +filtered_yaml = filter_by_tags(args.eval_data_yaml, args.tags) +original_yaml = args.eval_data_yaml +args.eval_data_yaml = filtered_yaml evaluator = AgentGoalEval(args) -# Run Evaluation -evaluator.run_evaluation() +try: + # Run Evaluation + evaluator.run_evaluation() +finally: + # Clean up temp file if we created one + if args.tags and filtered_yaml != original_yaml: + try: + os.unlink(filtered_yaml) + except Exception: + pass

Solution 2: Use delete=True and keep file open (simpler if supported by AgentGoalEval)

Check if AgentGoalEval can accept an already-open file object or reads the path only once. If so:

def filter_by_tags(path, tags): """Filter YAML data by tags, return filtered path.""" if not tags: return path with open(path) as f: data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)] if not data: sys.exit(f"⚠️ No tests found with tags: {tags}") print(f"📋 Running {len(data)} test(s) with tags: {tags}") - tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) + # Note: On Windows, you may need delete=False with manual cleanup + tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=True) yaml.dump(data, tmp, default_flow_style=False, sort_keys=False) - tmp.close() + tmp.flush() # Ensure data is written return tmp.name

However, Solution 2 may have issues on Windows or if the file needs to be reopened by AgentGoalEval.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In test/evals/eval.py around lines 104-117, the NamedTemporaryFile is created with delete=False and never removed, leaking temp files; fix by ensuring the temp file is cleaned up after use — either create the temp file with delete=True and keep the file handle open while passing it to AgentGoalEval (only if AgentGoalEval can read from an open file), or keep delete=False but ensure you remove the file (os.unlink or Path.unlink) in a finally block or teardown after the evaluation completes so every code path deletes the temporary file.

ItzikEzra-rh · 2025-11-11T15:07:14Z

@eranco74 @omertuc done

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

test/evals/eval_data.yaml (2)

246-300: Inconsistent tag coverage: New test groups lack tags while filtering mechanism depends on them.

The new test groups (lines 246–300) have descriptions but no tags, yet the PR adds tag-based filtering to eval.py. Tests like unable_to_create_cluster_with_custom_manifests_conv, create_vsphere_platform_clusters_conv, and others are orphaned from the tagging scheme, making them inaccessible via tag-based filtering. Either assign appropriate tags (e.g., smoke, capabilities, or domain-specific tags) or clarify why these tests are excluded from the filtering system.

Consider organizing new tests with semantic tags that align with their purpose:

Lines 246–300: Could use tags like smoke, capabilities, or integration depending on scope

Lines 302–321: Already tagged [troubleshooting] ✓

Lines 329–363: Already tagged [non-destructive] ✓

Lines 389+: Role protection tests also lack tags; determine appropriate classification

389-466: Role protection test groups lack tags.

The role protection test groups (direct-roleplaying, tone-manipulation, off-topic, mixed-requests, boundary-testing) starting at line 389 have no tags assigned. These should be included in the tagging scheme for consistency with the filtering system introduced in this PR. Consider assigning tags like [smoke], [security], or [role-protection] to enable targeting these tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3905a74 and 923431b.

📒 Files selected for processing (1)

test/evals/eval_data.yaml (11 hunks)

🔇 Additional comments (1)

test/evals/eval_data.yaml (1)

302-363: Past review suggestions successfully integrated.

The [troubleshooting] tag has been correctly applied to ignition, degraded cluster, and console access tests (lines 302–321), and [non-destructive] tags applied to deletion and script tests (lines 329–363), aligning with eranco74's prior feedback. ✓

eranco74 · 2025-11-17T08:59:20Z

/test ?

openshift-ci · 2025-11-17T08:59:23Z

@eranco74: The following commands are available to trigger required jobs:

/test images

The following commands are available to trigger optional jobs:

/test eval-test

/test local-dev-test

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

eranco74 · 2025-11-17T08:59:36Z

/test eval-test
/test local-dev-test

eranco74 · 2025-11-17T09:51:25Z

pull secret issue
/test eval-test

eranco74 · 2025-11-17T13:47:47Z

/ok-to-test

eranco74 · 2025-11-17T13:55:50Z

test/evals/eval_data.yaml

-      eval_types: [response_eval:accuracy]
+      eval_types: [response_eval:accuracy, response_eval:intent]
      expected_response: "Control plane nodes in a multi-node cluster require 4 CPU cores 16 GB of RAM and 100 GB of storage, the compute nodes in this cluster require 2 CPU cores, 8 GB of RAM and 100GB of storage"
+      expected_intent: "Providing hardware requirements for multi-node OpenShift clusters including CPU, RAM, and storage specifications for control plane and worker nodes."


The question is about multinode cluster so the reply is about multinode cluster and not
multi-node OpenShift clusters which make the: response_eval:intent: FAIL!

See: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/rh-ecosystem-edge_assisted-chat/246/pull-ci-rh-ecosystem-edge-assisted-chat-main-eval-test/1990357108888965120

ItzikEzra-rh · 2025-11-20T10:11:37Z

/test eval-test

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/evals/eval_data.yaml (1)
394-471: Missing tags on all role protection security tests.

The role protection test groups (lines 394–471) covering direct role-playing, tone manipulation, off-topic queries, mixed requests, and boundary testing lack tags entirely. These are important security/robustness validations and should be tagged (e.g., [security], [robustness], or a custom tag) to enable selective execution.

Add an appropriate tag to all role protection tests:
- conversation_group: role_protection_direct_roleplaying_conv
  description: Verify rejection of direct role-playing requests
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_tone_manipulation_conv
  description: Verify rejection of tone and style manipulation attempts
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_off_topic_conv
  description: Verify rejection of off-topic queries unrelated to OpenShift
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_mixed_requests_conv
  description: Verify rejection of requests mixing legitimate queries with off-topic content
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_boundary_testing_conv
  description: Verify proper handling of edge cases and legitimate technical queries
+ tags: [robustness]
  conversation:

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 923431b and f7353a9.

📒 Files selected for processing (1)

test/evals/eval_data.yaml (11 hunks)

🔇 Additional comments (2)

test/evals/eval_data.yaml (2)

57-65: Verify multinode cluster intent expectations match query intent.

Based on past review feedback indicating intent mismatch between queries about "multinode cluster" and responses using "multi-node OpenShift clusters" terminology, verify that line 65's expected_intent correctly captures the intent relationship between the query (line 62) and response (line 64).

The current expected_intent references "multi-node clusters"—ensure this accurately reflects what the chat model would infer from the user's "multinode cluster" query, avoiding terminology mismatches that could cause intent evaluation failures.

246-276: Confirm tagging strategy for 14 untagged new tests against PR objective.

Verification confirms the review comment's factual claims: lines 246–276 and additional test groups (totaling 14 untagged tests) lack tags, contradicting the PR objective stating "added tags: [smoke] to all dev suite tests." Current state shows only 16 of 37 conversation groups are smoke-tagged.

The file exhibits a deliberate categorization pattern:

Lines 1–228: 16 tests with [smoke]

Lines 246–294: 7 tests with NO tags (new platform/chatbot/guidance tests)

Lines 302–328: 3 tests with [troubleshooting]

Lines 329–361: 4 tests with [non-destructive]

Lines 370–471: 7 tests with NO tags (API/role protection tests)

This discrepancy between stated objective and actual implementation requires clarification on whether:

All new tests should inherit [smoke] tags to align with the PR objective, or

The untagged tests are intentionally categorized separately with a documented reason.

No tagging strategy documentation was found in commits, PR templates, or contributing guides.

coderabbitai · 2025-11-20T10:16:03Z

test/evals/eval_data.yaml

+- conversation_group: assisted_installer_explanation_conv
+  description: "Assisted Installer explanation test"
+  conversation:
+    - eval_id: assisted_installer_explanation
+      eval_query: What is assisted installer and how does it work?
+      eval_types: [response_eval:intent]
+      expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."
+
+- conversation_group: chatbot_capabilities_conv
+  description: "Chatbot capabilities test"
+  conversation:
+    - eval_id: chatbot_capabilities
+      eval_query: What can you do for me?
+      eval_types: [response_eval:intent]
+      expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."
+
+- conversation_group: first_time_cluster_guidance_conv
+  description: "First time cluster guidance test"
+  conversation:
+    - eval_id: first_time_cluster_guidance
+      eval_query: I want to install a cluster but its my first time, what should i start with?
+      eval_types: [response_eval:intent]
+      expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."


⚠️ Potential issue | 🟠 Major

Missing tags on new general capability tests.

Conversation groups for Assisted Installer explanation, chatbot capabilities, and first-time guidance (lines 278–300) lack tags. These appear to be foundational tests and should likely be tagged to enable organized filtering.

Consider adding appropriate tags to these tests, e.g., [smoke] if they are core dev suite tests, or a new tag like [guidance] if they serve a different purpose:

- conversation_group: assisted_installer_explanation_conv description: "Assisted Installer explanation test" + tags: [smoke] conversation: - conversation_group: chatbot_capabilities_conv description: "Chatbot capabilities test" + tags: [smoke] conversation: - conversation_group: first_time_cluster_guidance_conv description: "First time cluster guidance test" + tags: [smoke] conversation:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- conversation_group: assisted_installer_explanation_conv

description: "Assisted Installer explanation test"

conversation:

- eval_id: assisted_installer_explanation

eval_query: What is assisted installer and how does it work?

eval_types: [response_eval:intent]

expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."

- conversation_group: chatbot_capabilities_conv

description: "Chatbot capabilities test"

conversation:

- eval_id: chatbot_capabilities

eval_query: What can you do for me?

eval_types: [response_eval:intent]

expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."

- conversation_group: first_time_cluster_guidance_conv

description: "First time cluster guidance test"

conversation:

- eval_id: first_time_cluster_guidance

eval_query: I want to install a cluster but its my first time, what should i start with?

eval_types: [response_eval:intent]

expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."

- conversation_group: assisted_installer_explanation_conv

description: "Assisted Installer explanation test"

tags: [smoke]

conversation:

- eval_id: assisted_installer_explanation

eval_query: What is assisted installer and how does it work?

eval_types: [response_eval:intent]

expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."

- conversation_group: chatbot_capabilities_conv

description: "Chatbot capabilities test"

tags: [smoke]

conversation:

- eval_id: chatbot_capabilities

eval_query: What can you do for me?

eval_types: [response_eval:intent]

expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."

- conversation_group: first_time_cluster_guidance_conv

description: "First time cluster guidance test"

tags: [smoke]

conversation:

- eval_id: first_time_cluster_guidance

eval_query: I want to install a cluster but its my first time, what should i start with?

eval_types: [response_eval:intent]

expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."

🤖 Prompt for AI Agents

In test/evals/eval_data.yaml around lines 278 to 300, the three new conversation_group entries lack tags for filtering; add an appropriate tags field to each conversation block (e.g., tags: [smoke] or tags: [guidance]) consistent with existing test tagging conventions, placing the tags at the same indentation level as description and conversation for each group so they are recognized by the test harness.

coderabbitai · 2025-11-20T10:16:03Z

test/evals/eval_data.yaml

+- conversation_group: assisted_service_api_spec_conv
+  description: "Assisted service API spec test"
+  conversation:
+    - eval_id: assisted_service_api_spec
+      eval_query: Can you provide assisted service API spec?
+      eval_types: [response_eval:intent]
+      expected_intent: "Declining to provide the API specification and explaining available capabilities."
+
+- conversation_group: basic_context_conv
+  description: "Conversation with context test"
+  conversation:
+    - eval_id: start_conversation
+      eval_query: I want to create a cluster named test-cluster
+      eval_types: [response_eval:intent]
+      expected_intent: I can help with that.
+    - eval_id: list_openshift_versions
+      eval_query: List the available OpenShift versions
+      eval_types: [response_eval:intent]
+      expected_intent: A list of available versions
+    - eval_id: ask_for_context
+      eval_query: What is the name of the cluster that I want to create?
+      eval_types: [response_eval:intent]
+      expected_intent: test-cluster


⚠️ Potential issue | 🟠 Major

Missing tags on API specification and context management tests.

Conversation groups for Assisted Service API spec, basic context, and related tests (lines 370–392) lack tags. These are important for API contract and state management validation and should be tagged for filtering.

Consider adding tags like [smoke] or a new tag like [api-contract]:

- conversation_group: assisted_service_api_spec_conv description: "Assisted service API spec test" + tags: [smoke] conversation: - conversation_group: basic_context_conv description: "Conversation with context test" + tags: [smoke] conversation:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- conversation_group: assisted_service_api_spec_conv

description: "Assisted service API spec test"

conversation:

- eval_id: assisted_service_api_spec

eval_query: Can you provide assisted service API spec?

eval_types: [response_eval:intent]

expected_intent: "Declining to provide the API specification and explaining available capabilities."

- conversation_group: basic_context_conv

description: "Conversation with context test"

conversation:

- eval_id: start_conversation

eval_query: I want to create a cluster named test-cluster

eval_types: [response_eval:intent]

expected_intent: I can help with that.

- eval_id: list_openshift_versions

eval_query: List the available OpenShift versions

eval_types: [response_eval:intent]

expected_intent: A list of available versions

- eval_id: ask_for_context

eval_query: What is the name of the cluster that I want to create?

eval_types: [response_eval:intent]

expected_intent: test-cluster

- conversation_group: assisted_service_api_spec_conv

description: "Assisted service API spec test"

tags: [smoke]

conversation:

- eval_id: assisted_service_api_spec

eval_query: Can you provide assisted service API spec?

eval_types: [response_eval:intent]

expected_intent: "Declining to provide the API specification and explaining available capabilities."

- conversation_group: basic_context_conv

description: "Conversation with context test"

tags: [smoke]

conversation:

- eval_id: start_conversation

eval_query: I want to create a cluster named test-cluster

eval_types: [response_eval:intent]

expected_intent: I can help with that.

- eval_id: list_openshift_versions

eval_query: List the available OpenShift versions

eval_types: [response_eval:intent]

expected_intent: A list of available versions

- eval_id: ask_for_context

eval_query: What is the name of the cluster that I want to create?

eval_types: [response_eval:intent]

expected_intent: test-cluster

🤖 Prompt for AI Agents

test/evals/eval_data.yaml around lines 370 to 392: the conversation_group entries for assisted_service_api_spec_conv and basic_context_conv are missing tags used for filtering and test categorization; add a tags field to each conversation_group (e.g., tags: [smoke] or tags: [api-contract]) to indicate purpose and enable filtering, and update any related conversation entries if your framework expects per-conversation tags instead of per-group tags.

ItzikEzra-rh · 2025-11-20T11:06:58Z

/restart

ItzikEzra-rh · 2025-11-20T11:20:02Z

/test eval-test

eranco74 · 2025-11-20T12:20:01Z

/lgtm
/approve
/ok-to-test

carbonin · 2025-11-20T21:24:24Z

@ItzikEzra-rh can you just rebase this for konflux? I think that's what it needs

…ected intent, and the test-delte all clusters becaome to be multi turn conv wiht cluster creation first

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

test/evals/eval.py (2)
104-116: Temporary file resource leak remains unresolved.

The temporary file created with delete=False is still never cleaned up, causing a resource leak on every evaluation run with --tags. This issue was previously flagged but has not been addressed.

Please implement one of the solutions from the previous review:

Add a try/finally block to ensure cleanup after evaluation completes

Use a context manager to handle the temp file lifecycle

Ensure os.unlink(filtered_yaml) is called after the evaluation, even if it fails

121-123: Typo in error message remains unfixed.

Line 122 contains a typo: "varialbe" should be "variable". This was previously flagged but has not been corrected.

Apply this diff to fix the typo:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+    print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7353a9 and ed5202c.

📒 Files selected for processing (2)

test/evals/eval.py (2 hunks)
test/evals/eval_data.yaml (12 hunks)

🔇 Additional comments (1)

test/evals/eval.py (1)

81-99: Well-documented tag filtering system.

The --tags argument includes comprehensive documentation describing each tag's purpose (smoke, troubleshooting, non-destructive), which addresses the previous request for verbose tag descriptions.

test/evals/eval_data.yaml

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed5202c and ee947fc.

📒 Files selected for processing (1)

test/evals/eval_data.yaml (11 hunks)

🔇 Additional comments (1)

test/evals/eval_data.yaml (1)

450-472: Verify the broader tagging scope before proceeding.

The claim that these two groups lack tags is factually correct (lines 450 and 458), and the suggested diff is accurate. However, verification reveals that 36 of 37 conversation groups in the file currently lack tags, not just these two. Before addressing this review comment, confirm:

Whether the PR objectives specify that only these two groups should be tagged, or whether systematic tagging of all groups is intended

Why most conversation groups in the file remain untagged despite the apparent focus on adding tags in recent commits

The review's classification and scope cannot be confirmed without clarifying whether this is a targeted fix for two specific tests or part of broader incomplete work.

coderabbitai · 2025-11-23T08:58:04Z

test/evals/eval_data.yaml

+- conversation_group: unable_to_create_cluster_with_custom_manifests_conv
+  description: "Custom manifest cluster creation test"
+  conversation:
+    - eval_id: unable_to_create_cluster_with_custom_manifests
+      eval_query: Can you create a cluster with a custom manifest?
+      eval_types: [response_eval:intent]
+      expected_intent: "Declining to create a cluster with custom manifests."
+
+- conversation_group: create_vsphere_platform_clusters_conv
+  description: "vSphere platform integration test"
+  conversation:
+    - eval_id: create_vsphere_platform_clusters
+      eval_query: Can you create a cluster with vsphere platform integration?
+      eval_types: [response_eval:intent]
+      expected_intent: "Offering to help create a cluster with vSphere platform and requesting necessary information like cluster name, OpenShift version, base domain, and whether it's a single-node cluster."
+
+- conversation_group: create_nutanix_platform_clusters_conv
+  description: "Nutanix platform integration test"
+  conversation:
+    - eval_id: create_nutanix_platform_clusters
+      eval_query: Can you create a cluster with Nutanix platform integration?
+      eval_types: [response_eval:intent]
+      expected_intent: "Offering to help create a cluster with Nutanix platform and requesting necessary information like cluster name, OpenShift version, base domain, and whether it's a single-node cluster."
+
+- conversation_group: create_static_network_clusters_conv
+  description: "Static networking cluster creation test"
+  conversation:
+    - eval_id: create_static_network_clusters
+      eval_query: Can you create a cluster with static networking?
+      eval_types: [response_eval:intent]
+      expected_intent: "Offering to help create a cluster and explaining that static network configuration can be set up for hosts."
+
+- conversation_group: assisted_installer_explanation_conv
+  description: "Assisted Installer explanation test"
+  conversation:
+    - eval_id: assisted_installer_explanation
+      eval_query: What is assisted installer and how does it work?
+      eval_types: [response_eval:intent]
+      expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."
+
+- conversation_group: chatbot_capabilities_conv
+  description: "Chatbot capabilities test"
+  conversation:
+    - eval_id: chatbot_capabilities
+      eval_query: What can you do for me?
+      eval_types: [response_eval:intent]
+      expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."
+
+- conversation_group: first_time_cluster_guidance_conv
+  description: "First time cluster guidance test"
+  conversation:
+    - eval_id: first_time_cluster_guidance
+      eval_query: I want to install a cluster but its my first time, what should i start with?
+      eval_types: [response_eval:intent]
+      expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Add tags to new general capability and platform integration tests.

These conversation groups lack tags but should be tagged to enable filtering per the PR objectives ("added tags: [smoke] to all dev suite tests"). The groups covering custom manifests, platform integrations, assisted installer explanation, chatbot capabilities, and first-time guidance are foundational tests for the dev suite and should be tagged accordingly.

- conversation_group: unable_to_create_cluster_with_custom_manifests_conv description: "Custom manifest cluster creation test" + tags: [smoke] conversation: - conversation_group: create_vsphere_platform_clusters_conv description: "vSphere platform integration test" + tags: [smoke] conversation: - conversation_group: create_nutanix_platform_clusters_conv description: "Nutanix platform integration test" + tags: [smoke] conversation: - conversation_group: create_static_network_clusters_conv description: "Static networking cluster creation test" + tags: [smoke] conversation: - conversation_group: assisted_installer_explanation_conv description: "Assisted Installer explanation test" + tags: [smoke] conversation: - conversation_group: chatbot_capabilities_conv description: "Chatbot capabilities test" + tags: [smoke] conversation: - conversation_group: first_time_cluster_guidance_conv description: "First time cluster guidance test" + tags: [smoke] conversation:

--- 🏁 Script executed: ```shell #!/bin/bash # Inspect the actual file content at lines 326-381 and surrounding areas cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd if [ -f "test/evals/eval_data.yaml" ]; then echo "=== Lines 320-390 of eval_data.yaml ===" sed -n '320,390p' test/evals/eval_data.yaml | cat -n else echo "File not found" fi

Length of output: 4725

🏁 Script executed:

#!/bin/bash # Check lines 450-472 mentioned in scratchpad cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd if [ -f "test/evals/eval_data.yaml" ]; then echo "=== Lines 445-480 of eval_data.yaml ===" sed -n '445,480p' test/evals/eval_data.yaml | cat -n else echo "File not found" fi

Length of output: 1544

🏁 Script executed:

#!/bin/bash # Count total lines and search for "tags:" occurrences with context cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd if [ -f "test/evals/eval_data.yaml" ]; then echo "=== Total lines ===" wc -l test/evals/eval_data.yaml echo "" echo "=== Occurrences of 'tags:' ===" grep -n "tags:" test/evals/eval_data.yaml | head -20 fi

Length of output: 512

🏁 Script executed:

#!/bin/bash # Validate YAML syntax cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd if command -v python3 &> /dev/null && [ -f "test/evals/eval_data.yaml" ]; then python3 << 'PYEOF' import yaml try: with open('test/evals/eval_data.yaml', 'r') as f: yaml.safe_load(f) print("YAML syntax: VALID") except yaml.YAMLError as e: print(f"YAML syntax: INVALID - {e}") PYEOF fi

Length of output: 18

Add tags to untagged conversation groups to enable filtering per PR objectives.

The verification confirms that 9 conversation groups throughout the file lack tags while all other groups have them. Based on the consistent tagging pattern in the file (smoke, troubleshooting, non-destructive) and the PR objective to enable filtering, these groups require tags:

Lines 326–381:

unable_to_create_cluster_with_custom_manifests_conv

create_vsphere_platform_clusters_conv

create_nutanix_platform_clusters_conv

create_static_network_clusters_conv

assisted_installer_explanation_conv

chatbot_capabilities_conv

first_time_cluster_guidance_conv

Lines 450–472:

assisted_service_api_spec_conv

basic_context_conv

Add tags: [smoke] to each (these are general capability/integration tests). The YAML syntax is valid; this is a tagging completeness issue.

🤖 Prompt for AI Agents

In test/evals/eval_data.yaml around lines 326–381 (and also add the two groups at lines 450–472), several conversation_group entries are missing tags; add a tags: [smoke] field to each of the listed groups (unable_to_create_cluster_with_custom_manifests_conv, create_vsphere_platform_clusters_conv, create_nutanix_platform_clusters_conv, create_static_network_clusters_conv, assisted_installer_explanation_conv, chatbot_capabilities_conv, first_time_cluster_guidance_conv, assisted_service_api_spec_conv, basic_context_conv) ensuring proper YAML indentation and syntax so each conversation_group block includes the new tags entry.

ItzikEzra-rh · 2025-11-23T09:13:34Z

/test eval-test

ItzikEzra-rh · 2025-11-23T12:18:30Z

/test eval-test

eranco74 · 2025-11-23T12:44:40Z

/test local-dev-test

eranco74 · 2025-11-23T14:17:15Z

/ok-to-test

andrej1991 · 2025-11-24T20:54:56Z

/test eval-test

ItzikEzra-rh · 2025-11-25T09:49:55Z

/test eval-test

ItzikEzra-rh · 2025-11-25T10:59:25Z

/retest

ItzikEzra-rh · 2025-11-25T14:17:23Z

/retest

ItzikEzra-rh · 2025-11-26T09:59:53Z

/retest

ItzikEzra-rh · 2025-12-01T10:03:20Z

/retest

ItzikEzra-rh · 2025-12-01T11:00:58Z

/test eval-test

eranco74 · 2025-12-01T13:25:16Z

/ok-to-test
/lgtm
/approve

openshift-ci · 2025-12-01T13:25:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eranco74, ItzikEzra-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [eranco74]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-12-01T13:55:06Z

@ItzikEzra-rh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/eval-test	`ee947fc`	link	false	`/test eval-test`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

eranco74 · 2025-12-02T12:28:29Z

/ok-to-test

openshift-ci bot requested review from carbonin and eranco74 November 9, 2025 14:39

openshift-ci bot added the needs-ok-to-test label Nov 9, 2025

coderabbitai bot reviewed Nov 9, 2025

View reviewed changes

omertuc reviewed Nov 10, 2025

View reviewed changes

test/evals/eval.py Outdated Show resolved Hide resolved

eranco74 reviewed Nov 11, 2025

View reviewed changes

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

ItzikEzra-rh requested review from eranco74 and omertuc November 13, 2025 14:45

coderabbitai bot reviewed Nov 13, 2025

View reviewed changes

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Nov 17, 2025

eranco74 reviewed Nov 17, 2025

View reviewed changes

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

openshift-ci bot assigned eranco74 Nov 20, 2025

openshift-ci bot added lgtm approved labels Nov 20, 2025

ItzikEzra-rh added 7 commits November 23, 2025 10:36

Integrate QE into dev suite

ef3bdf3

add filtering system

66baeab

union duplicate test cases with different eval type

d6d239f

union duplicate test cases with different eval type

f8930fd

add tags

dcfc04d

add verbose help message at tags arg

d277012

fix failure tests, fix multu node test to be more accurate in our exp…

ed5202c

…ected intent, and the test-delte all clusters becaome to be multi turn conv wiht cluster creation first

ItzikEzra-rh force-pushed the integrate-qe-into-dev-suite branch from f7353a9 to ed5202c Compare November 23, 2025 08:42

openshift-ci bot removed the lgtm label Nov 23, 2025

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

test/evals/eval_data.yaml Outdated Show resolved Hide resolved

Remove leftover merge conflict marker from eval_data.yaml

ee947fc

coderabbitai bot reviewed Nov 23, 2025

View reviewed changes

openshift-ci bot added the lgtm label Dec 1, 2025

openshift-merge-bot bot merged commit 21c9b92 into rh-ecosystem-edge:main Dec 2, 2025
6 of 8 checks passed

Integrate QE into dev suite and add filtering system #246

Integrate QE into dev suite and add filtering system #246

Uh oh!

Conversation

ItzikEzra-rh commented Nov 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

test/evals/eval_data.yaml

test/evals/eval.py

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

openshift-ci bot commented Nov 9, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eranco74 left a comment

Choose a reason for hiding this comment

Uh oh!

eranco74 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

ItzikEzra-rh commented Nov 11, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

eranco74 commented Nov 17, 2025

Uh oh!

openshift-ci bot commented Nov 17, 2025

Uh oh!

eranco74 commented Nov 17, 2025

Uh oh!

eranco74 commented Nov 17, 2025

Uh oh!

eranco74 commented Nov 17, 2025

Uh oh!

eranco74 Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ItzikEzra-rh commented Nov 20, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 20, 2025

Choose a reason for hiding this comment

ItzikEzra-rh commented Nov 9, 2025 •

edited by coderabbitai bot

Loading

`test/evals/eval_data.yaml`

`test/evals/eval.py`

coderabbitai bot commented Nov 9, 2025 •

edited

Loading