Skip to content

Conversation

@ItzikEzra-rh
Copy link
Contributor

@ItzikEzra-rh ItzikEzra-rh commented Nov 9, 2025

Summary

This PR integrates the QE evaluation tests into the dev suite and adds a tag-based filtering system to allow users to run specific subsets of tests (e.g., smoke tests every pr) instead of running all tests.

test/evals/eval_data.yaml

  • Added tags: [smoke] to all dev suite tests

test/evals/eval.py

  • Added --tags command-line argument (optional) to filter tests by tags
  • Implemented filter_by_tags() function that:
    • Filters YAML evaluation data based on provided tags
    • Creates a temporary filtered YAML file when tags are specified
    • Returns the original path when no tags are provided (runs all tests)
    • Exits with an error message if no tests match the specified tags

Summary by CodeRabbit

  • New Features

    • CLI tag-based filtering for targeted evaluation runs, with notices when no tests match and summarized test counts.
    • Improved evaluation reporting with a result summary and non-zero exit when failures occur.
  • Tests

    • Many new evaluation groups added (platforms, integrations, workflows, guided assistance, troubleshooting).
    • Tests annotated with tags, descriptions, and expected intents for intent-focused scenarios.
  • Bug Fixes

    • Test data contains unresolved merge conflict markers—needs resolution.

✏️ Tip: You can customize this high-level summary in your review settings.

@openshift-ci openshift-ci bot requested review from carbonin and eranco74 November 9, 2025 14:39
@coderabbitai
Copy link

coderabbitai bot commented Nov 9, 2025

Walkthrough

Adds a --tags CLI option and a filter_by_tags(path, tags) helper that writes a temporary filtered YAML and reassigns args.eval_data_yaml. Extends evaluation flow to compute a result summary, count FAIL/ERROR results, and exit non‑zero on failures. Enriches eval_data.yaml with tags, descriptions, expected intents, and many new groups.

Changes

Cohort / File(s) Summary
Evaluation script enhancements
test/evals/eval.py
Added --tags CLI option; imported yaml and tempfile; implemented filter_by_tags(path, tags) to load YAML, select conversation_group entries whose tags intersect the requested tags, error if none match, write filtered YAML to a temporary file and return its path; apply filtering to args.eval_data_yaml before evaluation; compute evaluator.get_result_summary(), count FAIL/ERROR, and exit non‑zero when failures exist.
Test data updates
test/evals/eval_data.yaml
Added tags: [smoke], description, and expected_intent fields to many conversation_group entries; expanded eval_types (e.g., response_eval:intent); appended numerous new conversation groups covering manifests, platform integrations, guided assistance, troubleshooting, and non‑destructive actions. Merge conflict markers present in the file.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script as eval.py
    participant YAML as eval_data.yaml
    participant Evaluator

    User->>Script: run eval.py --tags=smoke
    Script->>Script: parse_args()
    rect rgb(220,235,255)
        Note over Script,YAML: Tag filtering (new)
        Script->>YAML: filter_by_tags(path, tags)
        YAML-->>Script: filtered_yaml_path
        Script->>Script: args.eval_data_yaml = filtered_yaml_path
    end
    Script->>Evaluator: run evaluation with filtered YAML
    Evaluator->>Evaluator: evaluate tests
    rect rgb(235,225,245)
        Note over Evaluator,Script: Result summary (new)
        Evaluator-->>Script: get_result_summary()
        Script->>Script: count FAIL/ERROR
    end
    alt Failures present
        Script->>User: exit(1)
    else All pass
        Script->>User: exit(0)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Inspect filter_by_tags() YAML parsing, tag-intersection logic, error handling when no matches exist.
  • Verify temporary file creation, write correctness, and any cleanup implications.
  • Confirm reassignment of args.eval_data_yaml integrates with evaluator paths.
  • Validate get_result_summary() usage and correct detection/counting of FAIL/ERROR statuses.
  • Review test/evals/eval_data.yaml for schema consistency and address merge conflict markers.

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • carbonin

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the two main changes: integrating QE tests into the dev suite and implementing a tag-based filtering system for test execution.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Nov 9, 2025

Hi @ItzikEzra-rh. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/evals/eval.py (1)

91-103: Consider cleaning up the temporary file.

The temporary file created with delete=False is never explicitly removed, which will accumulate files in the temp directory over time.

Consider using atexit to register cleanup or refactoring to use a context manager:

+import atexit
+
 def filter_by_tags(path, tags):
     """Filter YAML data by tags, return filtered path."""
     if not tags:
         return path
     with open(path) as f:
         data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
     if not data:
         sys.exit(f"⚠️  No tests found with tags: {tags}")
     print(f"📋 Running {len(data)} test(s) with tags: {tags}")
     tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
     yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
     tmp.close()
+    atexit.register(lambda: os.unlink(tmp.name) if os.path.exists(tmp.name) else None)
     return tmp.name

Alternatively, consider using a context manager to ensure cleanup:

import atexit
import os

_temp_files = []

def filter_by_tags(path, tags):
    """Filter YAML data by tags, return filtered path."""
    if not tags:
        return path
    with open(path) as f:
        data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
    if not data:
        sys.exit(f"⚠️  No tests found with tags: {tags}")
    print(f"📋 Running {len(data)} test(s) with tags: {tags}")
    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
    yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
    tmp.close()
    _temp_files.append(tmp.name)
    return tmp.name

def cleanup_temp_files():
    for f in _temp_files:
        try:
            os.unlink(f)
        except OSError:
            pass

atexit.register(cleanup_temp_files)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ed4149 and d762d46.

📒 Files selected for processing (2)
  • test/evals/eval.py (2 hunks)
  • test/evals/eval_data.yaml (14 hunks)
🔇 Additional comments (5)
test/evals/eval_data.yaml (2)

2-186: LGTM! Clean smoke test tagging.

The addition of tags: [smoke] to existing conversation groups is consistent and enables effective smoke test filtering via the new CLI option.


204-378: LGTM! Comprehensive intent test coverage.

The new intent-focused conversation groups effectively extend test coverage for high-level response validation. The use of response_eval:intent with descriptive expected outcomes is appropriate for LLM-based evaluation.

test/evals/eval.py (3)

7-8: LGTM! Required imports for filtering.

The yaml and tempfile imports support the new tag-based filtering functionality.


81-86: LGTM! Well-defined CLI argument.

The --tags argument is properly configured with clear help text and appropriate defaults.


112-126: LGTM! Clean integration and proper exit handling.

The filtering is seamlessly integrated, and the evaluation flow properly exits with appropriate status codes based on test results.

Comment on lines 108 to 123
if os.getenv('UNIQUE_ID') is None:
print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
sys.exit(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in error message.

Line 109 contains a typo: "varialbe" should be "variable".

 if os.getenv('UNIQUE_ID') is None:
-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+    print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
     sys.exit(1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if os.getenv('UNIQUE_ID') is None:
print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
sys.exit(1)
if os.getenv('UNIQUE_ID') is None:
print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
sys.exit(1)
🤖 Prompt for AI Agents
In test/evals/eval.py around lines 108 to 110, the error message printed when
UNIQUE_ID is not set contains a typo ("varialbe"); update the string to correct
the spelling to "variable" so the message reads: The environmental variable
'UNIQUE_ID' has to be set so the cluster creation and removal can happen
properly. Ensure no other text changes are made.

Copy link
Collaborator

@eranco74 eranco74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go over the tests and remove duplicates

eval_types: [response_eval:intent]
expected_intent: "Providing hardware requirements for multi-node OpenShift clusters including CPU, RAM, and storage specifications for control plane and worker nodes."

- conversation_group: sno_hardware_requirements_intent_conv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/evals/eval.py (1)

81-99: Consider externalizing tag documentation to prevent staleness.

While the verbose tag descriptions address the reviewer's request, hardcoding them in the argument help can become stale as tags evolve. If new tags are added to eval_data.yaml, this help text won't automatically reflect them.

Consider one of these approaches:

Option 1: Reference external documentation

     parser.add_argument(
         "--tags",
         nargs="+",
         default=None,
-        help=(
-            "Filter tests by tags. Optional - if not provided, all tests will be run. "
-            "Available tags: "
-            "'smoke' - Basic smoke tests that verify core functionality and should run quickly "
-            "to catch fundamental issues (e.g., cluster creation requests, version listing, "
-            "basic queries). "
-            "'troubleshooting' - Tests that verify the assistant's ability to help diagnose and "
-            "explain common issues users encounter (e.g., ignition download failures, degraded "
-            "cluster states, console access problems). "
-            "'non-destructive' - Tests that verify the assistant correctly refuses or handles "
-            "destructive operations without actually performing them (e.g., refusing to delete "
-            "clusters, declining to create deletion scripts). "
-            "Example: --tags smoke troubleshooting"
-        ),
+        help=(
+            "Filter tests by tags. Optional - if not provided, all tests will be run. "
+            "See eval_data.yaml or docs/test-tags.md for available tags and descriptions. "
+            "Example: --tags smoke troubleshooting"
+        ),
     )

Option 2: Dynamically extract tags from eval_data.yaml (more complex but stays in sync)

Add a function to read available tags:

def get_available_tags(yaml_path):
    """Extract unique tags from eval data YAML."""
    try:
        with open(yaml_path) as f:
            data = yaml.safe_load(f)
            tags = set()
            for group in data:
                tags.update(group.get('tags', []))
            return sorted(tags)
    except Exception:
        return []

Then update the help text:

available_tags = get_available_tags("eval_data.yaml")
tag_list = ", ".join(available_tags) if available_tags else "check eval_data.yaml"

parser.add_argument(
    "--tags",
    nargs="+",
    default=None,
    help=(
        f"Filter tests by tags. Available tags: {tag_list}. "
        "Example: --tags smoke troubleshooting"
    ),
)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0a6584 and 3905a74.

📒 Files selected for processing (1)
  • test/evals/eval.py (2 hunks)
🔇 Additional comments (3)
test/evals/eval.py (3)

7-8: LGTM! Imports support the new filtering functionality.

The yaml and tempfile imports are appropriately used for loading, filtering, and creating temporary YAML files.


125-125: LGTM! Clean integration of tag filtering.

The filter is appropriately applied before evaluation, and the path reassignment is straightforward.


130-139: Manually verify the behavior of AgentGoalEval.get_result_summary() before deciding on defensive checks.

The external lsc_agent_eval library is not publicly documented and cannot be inspected from the codebase. While the review comment's suggestion to add defensive checks is sound defensive programming practice, whether it's strictly necessary depends on the library's actual behavior:

  1. Check the library's source code or tests to confirm whether get_result_summary() always returns a non-None dict with "FAIL" and "ERROR" keys.
  2. If the library guarantees this contract, the current code is acceptable.
  3. If the library can return None or incomplete dicts, apply the defensive checks suggested in the review comment.

Comment on lines +104 to +117
def filter_by_tags(path, tags):
"""Filter YAML data by tags, return filtered path."""
if not tags:
return path
with open(path) as f:
data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
if not data:
sys.exit(f"⚠️ No tests found with tags: {tags}")
print(f"📋 Running {len(data)} test(s) with tags: {tags}")
tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
tmp.close()
return tmp.name

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Temporary file is never cleaned up.

The temporary file created with delete=False is never removed, causing a resource leak. Each evaluation run with --tags will leave an orphaned temp file in the system's temp directory.

Consider using a context manager or ensuring cleanup. Here are two solutions:

Solution 1: Use context manager and clean up after evaluation

 def filter_by_tags(path, tags):
     """Filter YAML data by tags, return filtered path."""
     if not tags:
         return path
     with open(path) as f:
         data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
     if not data:
         sys.exit(f"⚠️  No tests found with tags: {tags}")
     print(f"📋 Running {len(data)} test(s) with tags: {tags}")
     tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
     yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
     tmp.close()
     return tmp.name


 # Parse command line arguments
 args = parse_args()
 if os.getenv('UNIQUE_ID') is None:
     print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
     sys.exit(1)
 
-args.eval_data_yaml = filter_by_tags(args.eval_data_yaml, args.tags)
+filtered_yaml = filter_by_tags(args.eval_data_yaml, args.tags)
+original_yaml = args.eval_data_yaml
+args.eval_data_yaml = filtered_yaml
 
 evaluator = AgentGoalEval(args)
-# Run Evaluation
-evaluator.run_evaluation()
+try:
+    # Run Evaluation
+    evaluator.run_evaluation()
+finally:
+    # Clean up temp file if we created one
+    if args.tags and filtered_yaml != original_yaml:
+        try:
+            os.unlink(filtered_yaml)
+        except Exception:
+            pass

Solution 2: Use delete=True and keep file open (simpler if supported by AgentGoalEval)

Check if AgentGoalEval can accept an already-open file object or reads the path only once. If so:

 def filter_by_tags(path, tags):
     """Filter YAML data by tags, return filtered path."""
     if not tags:
         return path
     with open(path) as f:
         data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
     if not data:
         sys.exit(f"⚠️  No tests found with tags: {tags}")
     print(f"📋 Running {len(data)} test(s) with tags: {tags}")
-    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
+    # Note: On Windows, you may need delete=False with manual cleanup
+    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=True)
     yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
-    tmp.close()
+    tmp.flush()  # Ensure data is written
     return tmp.name

However, Solution 2 may have issues on Windows or if the file needs to be reopened by AgentGoalEval.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In test/evals/eval.py around lines 104-117, the NamedTemporaryFile is created
with delete=False and never removed, leaking temp files; fix by ensuring the
temp file is cleaned up after use — either create the temp file with delete=True
and keep the file handle open while passing it to AgentGoalEval (only if
AgentGoalEval can read from an open file), or keep delete=False but ensure you
remove the file (os.unlink or Path.unlink) in a finally block or teardown after
the evaluation completes so every code path deletes the temporary file.

@ItzikEzra-rh
Copy link
Contributor Author

@eranco74 @omertuc done

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
test/evals/eval_data.yaml (2)

246-300: Inconsistent tag coverage: New test groups lack tags while filtering mechanism depends on them.

The new test groups (lines 246–300) have descriptions but no tags, yet the PR adds tag-based filtering to eval.py. Tests like unable_to_create_cluster_with_custom_manifests_conv, create_vsphere_platform_clusters_conv, and others are orphaned from the tagging scheme, making them inaccessible via tag-based filtering. Either assign appropriate tags (e.g., smoke, capabilities, or domain-specific tags) or clarify why these tests are excluded from the filtering system.

Consider organizing new tests with semantic tags that align with their purpose:

  • Lines 246–300: Could use tags like smoke, capabilities, or integration depending on scope
  • Lines 302–321: Already tagged [troubleshooting]
  • Lines 329–363: Already tagged [non-destructive]
  • Lines 389+: Role protection tests also lack tags; determine appropriate classification

389-466: Role protection test groups lack tags.

The role protection test groups (direct-roleplaying, tone-manipulation, off-topic, mixed-requests, boundary-testing) starting at line 389 have no tags assigned. These should be included in the tagging scheme for consistency with the filtering system introduced in this PR. Consider assigning tags like [smoke], [security], or [role-protection] to enable targeting these tests.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3905a74 and 923431b.

📒 Files selected for processing (1)
  • test/evals/eval_data.yaml (11 hunks)
🔇 Additional comments (1)
test/evals/eval_data.yaml (1)

302-363: Past review suggestions successfully integrated.

The [troubleshooting] tag has been correctly applied to ignition, degraded cluster, and console access tests (lines 302–321), and [non-destructive] tags applied to deletion and script tests (lines 329–363), aligning with eranco74's prior feedback. ✓

@eranco74
Copy link
Collaborator

/test ?

@openshift-ci
Copy link

openshift-ci bot commented Nov 17, 2025

@eranco74: The following commands are available to trigger required jobs:

/test images

The following commands are available to trigger optional jobs:

/test eval-test
/test local-dev-test

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@eranco74
Copy link
Collaborator

/test eval-test
/test local-dev-test

@eranco74
Copy link
Collaborator

pull secret issue
/test eval-test

@eranco74
Copy link
Collaborator

/ok-to-test

eval_types: [response_eval:accuracy]
eval_types: [response_eval:accuracy, response_eval:intent]
expected_response: "Control plane nodes in a multi-node cluster require 4 CPU cores 16 GB of RAM and 100 GB of storage, the compute nodes in this cluster require 2 CPU cores, 8 GB of RAM and 100GB of storage"
expected_intent: "Providing hardware requirements for multi-node OpenShift clusters including CPU, RAM, and storage specifications for control plane and worker nodes."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is about multinode cluster so the reply is about multinode cluster and not
multi-node OpenShift clusters which make the: response_eval:intent: FAIL!

See: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/rh-ecosystem-edge_assisted-chat/246/pull-ci-rh-ecosystem-edge-assisted-chat-main-eval-test/1990357108888965120

@ItzikEzra-rh
Copy link
Contributor Author

/test eval-test

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/evals/eval_data.yaml (1)

394-471: Missing tags on all role protection security tests.

The role protection test groups (lines 394–471) covering direct role-playing, tone manipulation, off-topic queries, mixed requests, and boundary testing lack tags entirely. These are important security/robustness validations and should be tagged (e.g., [security], [robustness], or a custom tag) to enable selective execution.

Add an appropriate tag to all role protection tests:

- conversation_group: role_protection_direct_roleplaying_conv
  description: Verify rejection of direct role-playing requests
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_tone_manipulation_conv
  description: Verify rejection of tone and style manipulation attempts
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_off_topic_conv
  description: Verify rejection of off-topic queries unrelated to OpenShift
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_mixed_requests_conv
  description: Verify rejection of requests mixing legitimate queries with off-topic content
+ tags: [robustness]
  conversation:

- conversation_group: role_protection_boundary_testing_conv
  description: Verify proper handling of edge cases and legitimate technical queries
+ tags: [robustness]
  conversation:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 923431b and f7353a9.

📒 Files selected for processing (1)
  • test/evals/eval_data.yaml (11 hunks)
🔇 Additional comments (2)
test/evals/eval_data.yaml (2)

57-65: Verify multinode cluster intent expectations match query intent.

Based on past review feedback indicating intent mismatch between queries about "multinode cluster" and responses using "multi-node OpenShift clusters" terminology, verify that line 65's expected_intent correctly captures the intent relationship between the query (line 62) and response (line 64).

The current expected_intent references "multi-node clusters"—ensure this accurately reflects what the chat model would infer from the user's "multinode cluster" query, avoiding terminology mismatches that could cause intent evaluation failures.


246-276: Confirm tagging strategy for 14 untagged new tests against PR objective.

Verification confirms the review comment's factual claims: lines 246–276 and additional test groups (totaling 14 untagged tests) lack tags, contradicting the PR objective stating "added tags: [smoke] to all dev suite tests." Current state shows only 16 of 37 conversation groups are smoke-tagged.

The file exhibits a deliberate categorization pattern:

  • Lines 1–228: 16 tests with [smoke]
  • Lines 246–294: 7 tests with NO tags (new platform/chatbot/guidance tests)
  • Lines 302–328: 3 tests with [troubleshooting]
  • Lines 329–361: 4 tests with [non-destructive]
  • Lines 370–471: 7 tests with NO tags (API/role protection tests)

This discrepancy between stated objective and actual implementation requires clarification on whether:

  1. All new tests should inherit [smoke] tags to align with the PR objective, or
  2. The untagged tests are intentionally categorized separately with a documented reason.

No tagging strategy documentation was found in commits, PR templates, or contributing guides.

Comment on lines +278 to +381
- conversation_group: assisted_installer_explanation_conv
description: "Assisted Installer explanation test"
conversation:
- eval_id: assisted_installer_explanation
eval_query: What is assisted installer and how does it work?
eval_types: [response_eval:intent]
expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."

- conversation_group: chatbot_capabilities_conv
description: "Chatbot capabilities test"
conversation:
- eval_id: chatbot_capabilities
eval_query: What can you do for me?
eval_types: [response_eval:intent]
expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."

- conversation_group: first_time_cluster_guidance_conv
description: "First time cluster guidance test"
conversation:
- eval_id: first_time_cluster_guidance
eval_query: I want to install a cluster but its my first time, what should i start with?
eval_types: [response_eval:intent]
expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing tags on new general capability tests.

Conversation groups for Assisted Installer explanation, chatbot capabilities, and first-time guidance (lines 278–300) lack tags. These appear to be foundational tests and should likely be tagged to enable organized filtering.

Consider adding appropriate tags to these tests, e.g., [smoke] if they are core dev suite tests, or a new tag like [guidance] if they serve a different purpose:

- conversation_group: assisted_installer_explanation_conv
  description: "Assisted Installer explanation test"
+ tags: [smoke]
  conversation:

- conversation_group: chatbot_capabilities_conv
  description: "Chatbot capabilities test"
+ tags: [smoke]
  conversation:

- conversation_group: first_time_cluster_guidance_conv
  description: "First time cluster guidance test"
+ tags: [smoke]
  conversation:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- conversation_group: assisted_installer_explanation_conv
description: "Assisted Installer explanation test"
conversation:
- eval_id: assisted_installer_explanation
eval_query: What is assisted installer and how does it work?
eval_types: [response_eval:intent]
expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."
- conversation_group: chatbot_capabilities_conv
description: "Chatbot capabilities test"
conversation:
- eval_id: chatbot_capabilities
eval_query: What can you do for me?
eval_types: [response_eval:intent]
expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."
- conversation_group: first_time_cluster_guidance_conv
description: "First time cluster guidance test"
conversation:
- eval_id: first_time_cluster_guidance
eval_query: I want to install a cluster but its my first time, what should i start with?
eval_types: [response_eval:intent]
expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."
- conversation_group: assisted_installer_explanation_conv
description: "Assisted Installer explanation test"
tags: [smoke]
conversation:
- eval_id: assisted_installer_explanation
eval_query: What is assisted installer and how does it work?
eval_types: [response_eval:intent]
expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."
- conversation_group: chatbot_capabilities_conv
description: "Chatbot capabilities test"
tags: [smoke]
conversation:
- eval_id: chatbot_capabilities
eval_query: What can you do for me?
eval_types: [response_eval:intent]
expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."
- conversation_group: first_time_cluster_guidance_conv
description: "First time cluster guidance test"
tags: [smoke]
conversation:
- eval_id: first_time_cluster_guidance
eval_query: I want to install a cluster but its my first time, what should i start with?
eval_types: [response_eval:intent]
expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."
🤖 Prompt for AI Agents
In test/evals/eval_data.yaml around lines 278 to 300, the three new
conversation_group entries lack tags for filtering; add an appropriate tags
field to each conversation block (e.g., tags: [smoke] or tags: [guidance])
consistent with existing test tagging conventions, placing the tags at the same
indentation level as description and conversation for each group so they are
recognized by the test harness.

Comment on lines +370 to +473
- conversation_group: assisted_service_api_spec_conv
description: "Assisted service API spec test"
conversation:
- eval_id: assisted_service_api_spec
eval_query: Can you provide assisted service API spec?
eval_types: [response_eval:intent]
expected_intent: "Declining to provide the API specification and explaining available capabilities."

- conversation_group: basic_context_conv
description: "Conversation with context test"
conversation:
- eval_id: start_conversation
eval_query: I want to create a cluster named test-cluster
eval_types: [response_eval:intent]
expected_intent: I can help with that.
- eval_id: list_openshift_versions
eval_query: List the available OpenShift versions
eval_types: [response_eval:intent]
expected_intent: A list of available versions
- eval_id: ask_for_context
eval_query: What is the name of the cluster that I want to create?
eval_types: [response_eval:intent]
expected_intent: test-cluster
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing tags on API specification and context management tests.

Conversation groups for Assisted Service API spec, basic context, and related tests (lines 370–392) lack tags. These are important for API contract and state management validation and should be tagged for filtering.

Consider adding tags like [smoke] or a new tag like [api-contract]:

- conversation_group: assisted_service_api_spec_conv
  description: "Assisted service API spec test"
+ tags: [smoke]
  conversation:

- conversation_group: basic_context_conv
  description: "Conversation with context test"
+ tags: [smoke]
  conversation:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- conversation_group: assisted_service_api_spec_conv
description: "Assisted service API spec test"
conversation:
- eval_id: assisted_service_api_spec
eval_query: Can you provide assisted service API spec?
eval_types: [response_eval:intent]
expected_intent: "Declining to provide the API specification and explaining available capabilities."
- conversation_group: basic_context_conv
description: "Conversation with context test"
conversation:
- eval_id: start_conversation
eval_query: I want to create a cluster named test-cluster
eval_types: [response_eval:intent]
expected_intent: I can help with that.
- eval_id: list_openshift_versions
eval_query: List the available OpenShift versions
eval_types: [response_eval:intent]
expected_intent: A list of available versions
- eval_id: ask_for_context
eval_query: What is the name of the cluster that I want to create?
eval_types: [response_eval:intent]
expected_intent: test-cluster
- conversation_group: assisted_service_api_spec_conv
description: "Assisted service API spec test"
tags: [smoke]
conversation:
- eval_id: assisted_service_api_spec
eval_query: Can you provide assisted service API spec?
eval_types: [response_eval:intent]
expected_intent: "Declining to provide the API specification and explaining available capabilities."
- conversation_group: basic_context_conv
description: "Conversation with context test"
tags: [smoke]
conversation:
- eval_id: start_conversation
eval_query: I want to create a cluster named test-cluster
eval_types: [response_eval:intent]
expected_intent: I can help with that.
- eval_id: list_openshift_versions
eval_query: List the available OpenShift versions
eval_types: [response_eval:intent]
expected_intent: A list of available versions
- eval_id: ask_for_context
eval_query: What is the name of the cluster that I want to create?
eval_types: [response_eval:intent]
expected_intent: test-cluster
🤖 Prompt for AI Agents
test/evals/eval_data.yaml around lines 370 to 392: the conversation_group
entries for assisted_service_api_spec_conv and basic_context_conv are missing
tags used for filtering and test categorization; add a tags field to each
conversation_group (e.g., tags: [smoke] or tags: [api-contract]) to indicate
purpose and enable filtering, and update any related conversation entries if
your framework expects per-conversation tags instead of per-group tags.

@ItzikEzra-rh
Copy link
Contributor Author

/restart

@ItzikEzra-rh
Copy link
Contributor Author

/test eval-test

@eranco74
Copy link
Collaborator

/lgtm
/approve
/ok-to-test

@carbonin
Copy link
Collaborator

@ItzikEzra-rh can you just rebase this for konflux? I think that's what it needs

@ItzikEzra-rh ItzikEzra-rh force-pushed the integrate-qe-into-dev-suite branch from f7353a9 to ed5202c Compare November 23, 2025 08:42
@openshift-ci openshift-ci bot removed the lgtm label Nov 23, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
test/evals/eval.py (2)

104-116: Temporary file resource leak remains unresolved.

The temporary file created with delete=False is still never cleaned up, causing a resource leak on every evaluation run with --tags. This issue was previously flagged but has not been addressed.

Please implement one of the solutions from the previous review:

  • Add a try/finally block to ensure cleanup after evaluation completes
  • Use a context manager to handle the temp file lifecycle
  • Ensure os.unlink(filtered_yaml) is called after the evaluation, even if it fails

121-123: Typo in error message remains unfixed.

Line 122 contains a typo: "varialbe" should be "variable". This was previously flagged but has not been corrected.

Apply this diff to fix the typo:

-    print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+    print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7353a9 and ed5202c.

📒 Files selected for processing (2)
  • test/evals/eval.py (2 hunks)
  • test/evals/eval_data.yaml (12 hunks)
🔇 Additional comments (1)
test/evals/eval.py (1)

81-99: Well-documented tag filtering system.

The --tags argument includes comprehensive documentation describing each tag's purpose (smoke, troubleshooting, non-destructive), which addresses the previous request for verbose tag descriptions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed5202c and ee947fc.

📒 Files selected for processing (1)
  • test/evals/eval_data.yaml (11 hunks)
🔇 Additional comments (1)
test/evals/eval_data.yaml (1)

450-472: Verify the broader tagging scope before proceeding.

The claim that these two groups lack tags is factually correct (lines 450 and 458), and the suggested diff is accurate. However, verification reveals that 36 of 37 conversation groups in the file currently lack tags, not just these two. Before addressing this review comment, confirm:

  1. Whether the PR objectives specify that only these two groups should be tagged, or whether systematic tagging of all groups is intended
  2. Why most conversation groups in the file remain untagged despite the apparent focus on adding tags in recent commits

The review's classification and scope cannot be confirmed without clarifying whether this is a targeted fix for two specific tests or part of broader incomplete work.

Comment on lines +326 to +381
- conversation_group: unable_to_create_cluster_with_custom_manifests_conv
description: "Custom manifest cluster creation test"
conversation:
- eval_id: unable_to_create_cluster_with_custom_manifests
eval_query: Can you create a cluster with a custom manifest?
eval_types: [response_eval:intent]
expected_intent: "Declining to create a cluster with custom manifests."

- conversation_group: create_vsphere_platform_clusters_conv
description: "vSphere platform integration test"
conversation:
- eval_id: create_vsphere_platform_clusters
eval_query: Can you create a cluster with vsphere platform integration?
eval_types: [response_eval:intent]
expected_intent: "Offering to help create a cluster with vSphere platform and requesting necessary information like cluster name, OpenShift version, base domain, and whether it's a single-node cluster."

- conversation_group: create_nutanix_platform_clusters_conv
description: "Nutanix platform integration test"
conversation:
- eval_id: create_nutanix_platform_clusters
eval_query: Can you create a cluster with Nutanix platform integration?
eval_types: [response_eval:intent]
expected_intent: "Offering to help create a cluster with Nutanix platform and requesting necessary information like cluster name, OpenShift version, base domain, and whether it's a single-node cluster."

- conversation_group: create_static_network_clusters_conv
description: "Static networking cluster creation test"
conversation:
- eval_id: create_static_network_clusters
eval_query: Can you create a cluster with static networking?
eval_types: [response_eval:intent]
expected_intent: "Offering to help create a cluster and explaining that static network configuration can be set up for hosts."

- conversation_group: assisted_installer_explanation_conv
description: "Assisted Installer explanation test"
conversation:
- eval_id: assisted_installer_explanation
eval_query: What is assisted installer and how does it work?
eval_types: [response_eval:intent]
expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring."

- conversation_group: chatbot_capabilities_conv
description: "Chatbot capabilities test"
conversation:
- eval_id: chatbot_capabilities
eval_query: What can you do for me?
eval_types: [response_eval:intent]
expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting."

- conversation_group: first_time_cluster_guidance_conv
description: "First time cluster guidance test"
conversation:
- eval_id: first_time_cluster_guidance
eval_query: I want to install a cluster but its my first time, what should i start with?
eval_types: [response_eval:intent]
expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type."

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Add tags to new general capability and platform integration tests.

These conversation groups lack tags but should be tagged to enable filtering per the PR objectives ("added tags: [smoke] to all dev suite tests"). The groups covering custom manifests, platform integrations, assisted installer explanation, chatbot capabilities, and first-time guidance are foundational tests for the dev suite and should be tagged accordingly.

- conversation_group: unable_to_create_cluster_with_custom_manifests_conv
  description: "Custom manifest cluster creation test"
+ tags: [smoke]
  conversation:

- conversation_group: create_vsphere_platform_clusters_conv
  description: "vSphere platform integration test"
+ tags: [smoke]
  conversation:

- conversation_group: create_nutanix_platform_clusters_conv
  description: "Nutanix platform integration test"
+ tags: [smoke]
  conversation:

- conversation_group: create_static_network_clusters_conv
  description: "Static networking cluster creation test"
+ tags: [smoke]
  conversation:

- conversation_group: assisted_installer_explanation_conv
  description: "Assisted Installer explanation test"
+ tags: [smoke]
  conversation:

- conversation_group: chatbot_capabilities_conv
  description: "Chatbot capabilities test"
+ tags: [smoke]
  conversation:

- conversation_group: first_time_cluster_guidance_conv
  description: "First time cluster guidance test"
+ tags: [smoke]
  conversation:


---

🏁 Script executed:

```shell
#!/bin/bash
# Inspect the actual file content at lines 326-381 and surrounding areas
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if [ -f "test/evals/eval_data.yaml" ]; then
  echo "=== Lines 320-390 of eval_data.yaml ===" 
  sed -n '320,390p' test/evals/eval_data.yaml | cat -n
else
  echo "File not found"
fi

Length of output: 4725


🏁 Script executed:

#!/bin/bash
# Check lines 450-472 mentioned in scratchpad
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if [ -f "test/evals/eval_data.yaml" ]; then
  echo "=== Lines 445-480 of eval_data.yaml ===" 
  sed -n '445,480p' test/evals/eval_data.yaml | cat -n
else
  echo "File not found"
fi

Length of output: 1544


🏁 Script executed:

#!/bin/bash
# Count total lines and search for "tags:" occurrences with context
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if [ -f "test/evals/eval_data.yaml" ]; then
  echo "=== Total lines ===" 
  wc -l test/evals/eval_data.yaml
  echo ""
  echo "=== Occurrences of 'tags:' ===" 
  grep -n "tags:" test/evals/eval_data.yaml | head -20
fi

Length of output: 512


🏁 Script executed:

#!/bin/bash
# Validate YAML syntax
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if command -v python3 &> /dev/null && [ -f "test/evals/eval_data.yaml" ]; then
  python3 << 'PYEOF'
import yaml
try:
  with open('test/evals/eval_data.yaml', 'r') as f:
    yaml.safe_load(f)
  print("YAML syntax: VALID")
except yaml.YAMLError as e:
  print(f"YAML syntax: INVALID - {e}")
PYEOF
fi

Length of output: 18


Add tags to untagged conversation groups to enable filtering per PR objectives.

The verification confirms that 9 conversation groups throughout the file lack tags while all other groups have them. Based on the consistent tagging pattern in the file (smoke, troubleshooting, non-destructive) and the PR objective to enable filtering, these groups require tags:

Lines 326–381:

  • unable_to_create_cluster_with_custom_manifests_conv
  • create_vsphere_platform_clusters_conv
  • create_nutanix_platform_clusters_conv
  • create_static_network_clusters_conv
  • assisted_installer_explanation_conv
  • chatbot_capabilities_conv
  • first_time_cluster_guidance_conv

Lines 450–472:

  • assisted_service_api_spec_conv
  • basic_context_conv

Add tags: [smoke] to each (these are general capability/integration tests). The YAML syntax is valid; this is a tagging completeness issue.

🤖 Prompt for AI Agents
In test/evals/eval_data.yaml around lines 326–381 (and also add the two groups
at lines 450–472), several conversation_group entries are missing tags; add a
tags: [smoke] field to each of the listed groups
(unable_to_create_cluster_with_custom_manifests_conv,
create_vsphere_platform_clusters_conv, create_nutanix_platform_clusters_conv,
create_static_network_clusters_conv, assisted_installer_explanation_conv,
chatbot_capabilities_conv, first_time_cluster_guidance_conv,
assisted_service_api_spec_conv, basic_context_conv) ensuring proper YAML
indentation and syntax so each conversation_group block includes the new tags
entry.

@ItzikEzra-rh
Copy link
Contributor Author

/test eval-test

1 similar comment
@ItzikEzra-rh
Copy link
Contributor Author

/test eval-test

@eranco74
Copy link
Collaborator

/test local-dev-test

@eranco74
Copy link
Collaborator

/ok-to-test

@andrej1991
Copy link
Collaborator

/test eval-test

1 similar comment
@ItzikEzra-rh
Copy link
Contributor Author

/test eval-test

@ItzikEzra-rh
Copy link
Contributor Author

/retest

3 similar comments
@ItzikEzra-rh
Copy link
Contributor Author

/retest

@ItzikEzra-rh
Copy link
Contributor Author

/retest

@ItzikEzra-rh
Copy link
Contributor Author

/retest

@ItzikEzra-rh
Copy link
Contributor Author

/test eval-test

@eranco74
Copy link
Collaborator

eranco74 commented Dec 1, 2025

/ok-to-test
/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Dec 1, 2025
@openshift-ci
Copy link

openshift-ci bot commented Dec 1, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eranco74, ItzikEzra-rh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link

openshift-ci bot commented Dec 1, 2025

@ItzikEzra-rh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/eval-test ee947fc link false /test eval-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@eranco74
Copy link
Collaborator

eranco74 commented Dec 2, 2025

/ok-to-test

@openshift-merge-bot openshift-merge-bot bot merged commit 21c9b92 into rh-ecosystem-edge:main Dec 2, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants