-
Notifications
You must be signed in to change notification settings - Fork 21
Integrate QE into dev suite and add filtering system #246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate QE into dev suite and add filtering system #246
Conversation
WalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script as eval.py
participant YAML as eval_data.yaml
participant Evaluator
User->>Script: run eval.py --tags=smoke
Script->>Script: parse_args()
rect rgb(220,235,255)
Note over Script,YAML: Tag filtering (new)
Script->>YAML: filter_by_tags(path, tags)
YAML-->>Script: filtered_yaml_path
Script->>Script: args.eval_data_yaml = filtered_yaml_path
end
Script->>Evaluator: run evaluation with filtered YAML
Evaluator->>Evaluator: evaluate tests
rect rgb(235,225,245)
Note over Evaluator,Script: Result summary (new)
Evaluator-->>Script: get_result_summary()
Script->>Script: count FAIL/ERROR
end
alt Failures present
Script->>User: exit(1)
else All pass
Script->>User: exit(0)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @ItzikEzra-rh. Thanks for your PR. I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/evals/eval.py (1)
91-103: Consider cleaning up the temporary file.The temporary file created with
delete=Falseis never explicitly removed, which will accumulate files in the temp directory over time.Consider using
atexitto register cleanup or refactoring to use a context manager:+import atexit + def filter_by_tags(path, tags): """Filter YAML data by tags, return filtered path.""" if not tags: return path with open(path) as f: data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)] if not data: sys.exit(f"⚠️ No tests found with tags: {tags}") print(f"📋 Running {len(data)} test(s) with tags: {tags}") tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) yaml.dump(data, tmp, default_flow_style=False, sort_keys=False) tmp.close() + atexit.register(lambda: os.unlink(tmp.name) if os.path.exists(tmp.name) else None) return tmp.nameAlternatively, consider using a context manager to ensure cleanup:
import atexit import os _temp_files = [] def filter_by_tags(path, tags): """Filter YAML data by tags, return filtered path.""" if not tags: return path with open(path) as f: data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)] if not data: sys.exit(f"⚠️ No tests found with tags: {tags}") print(f"📋 Running {len(data)} test(s) with tags: {tags}") tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) yaml.dump(data, tmp, default_flow_style=False, sort_keys=False) tmp.close() _temp_files.append(tmp.name) return tmp.name def cleanup_temp_files(): for f in _temp_files: try: os.unlink(f) except OSError: pass atexit.register(cleanup_temp_files)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
test/evals/eval.py(2 hunks)test/evals/eval_data.yaml(14 hunks)
🔇 Additional comments (5)
test/evals/eval_data.yaml (2)
2-186: LGTM! Clean smoke test tagging.The addition of
tags: [smoke]to existing conversation groups is consistent and enables effective smoke test filtering via the new CLI option.
204-378: LGTM! Comprehensive intent test coverage.The new intent-focused conversation groups effectively extend test coverage for high-level response validation. The use of
response_eval:intentwith descriptive expected outcomes is appropriate for LLM-based evaluation.test/evals/eval.py (3)
7-8: LGTM! Required imports for filtering.The
yamlandtempfileimports support the new tag-based filtering functionality.
81-86: LGTM! Well-defined CLI argument.The
--tagsargument is properly configured with clear help text and appropriate defaults.
112-126: LGTM! Clean integration and proper exit handling.The filtering is seamlessly integrated, and the evaluation flow properly exits with appropriate status codes based on test results.
| if os.getenv('UNIQUE_ID') is None: | ||
| print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.") | ||
| sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in error message.
Line 109 contains a typo: "varialbe" should be "variable".
if os.getenv('UNIQUE_ID') is None:
- print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
+ print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
sys.exit(1)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if os.getenv('UNIQUE_ID') is None: | |
| print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.") | |
| sys.exit(1) | |
| if os.getenv('UNIQUE_ID') is None: | |
| print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.") | |
| sys.exit(1) |
🤖 Prompt for AI Agents
In test/evals/eval.py around lines 108 to 110, the error message printed when
UNIQUE_ID is not set contains a typo ("varialbe"); update the string to correct
the spelling to "variable" so the message reads: The environmental variable
'UNIQUE_ID' has to be set so the cluster creation and removal can happen
properly. Ensure no other text changes are made.
eranco74
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please go over the tests and remove duplicates
test/evals/eval_data.yaml
Outdated
| eval_types: [response_eval:intent] | ||
| expected_intent: "Providing hardware requirements for multi-node OpenShift clusters including CPU, RAM, and storage specifications for control plane and worker nodes." | ||
|
|
||
| - conversation_group: sno_hardware_requirements_intent_conv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/evals/eval.py (1)
81-99: Consider externalizing tag documentation to prevent staleness.While the verbose tag descriptions address the reviewer's request, hardcoding them in the argument help can become stale as tags evolve. If new tags are added to
eval_data.yaml, this help text won't automatically reflect them.Consider one of these approaches:
Option 1: Reference external documentation
parser.add_argument( "--tags", nargs="+", default=None, - help=( - "Filter tests by tags. Optional - if not provided, all tests will be run. " - "Available tags: " - "'smoke' - Basic smoke tests that verify core functionality and should run quickly " - "to catch fundamental issues (e.g., cluster creation requests, version listing, " - "basic queries). " - "'troubleshooting' - Tests that verify the assistant's ability to help diagnose and " - "explain common issues users encounter (e.g., ignition download failures, degraded " - "cluster states, console access problems). " - "'non-destructive' - Tests that verify the assistant correctly refuses or handles " - "destructive operations without actually performing them (e.g., refusing to delete " - "clusters, declining to create deletion scripts). " - "Example: --tags smoke troubleshooting" - ), + help=( + "Filter tests by tags. Optional - if not provided, all tests will be run. " + "See eval_data.yaml or docs/test-tags.md for available tags and descriptions. " + "Example: --tags smoke troubleshooting" + ), )Option 2: Dynamically extract tags from eval_data.yaml (more complex but stays in sync)
Add a function to read available tags:
def get_available_tags(yaml_path): """Extract unique tags from eval data YAML.""" try: with open(yaml_path) as f: data = yaml.safe_load(f) tags = set() for group in data: tags.update(group.get('tags', [])) return sorted(tags) except Exception: return []Then update the help text:
available_tags = get_available_tags("eval_data.yaml") tag_list = ", ".join(available_tags) if available_tags else "check eval_data.yaml" parser.add_argument( "--tags", nargs="+", default=None, help=( f"Filter tests by tags. Available tags: {tag_list}. " "Example: --tags smoke troubleshooting" ), )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test/evals/eval.py(2 hunks)
🔇 Additional comments (3)
test/evals/eval.py (3)
7-8: LGTM! Imports support the new filtering functionality.The
yamlandtempfileimports are appropriately used for loading, filtering, and creating temporary YAML files.
125-125: LGTM! Clean integration of tag filtering.The filter is appropriately applied before evaluation, and the path reassignment is straightforward.
130-139: Manually verify the behavior ofAgentGoalEval.get_result_summary()before deciding on defensive checks.The external
lsc_agent_evallibrary is not publicly documented and cannot be inspected from the codebase. While the review comment's suggestion to add defensive checks is sound defensive programming practice, whether it's strictly necessary depends on the library's actual behavior:
- Check the library's source code or tests to confirm whether
get_result_summary()always returns a non-None dict with"FAIL"and"ERROR"keys.- If the library guarantees this contract, the current code is acceptable.
- If the library can return None or incomplete dicts, apply the defensive checks suggested in the review comment.
| def filter_by_tags(path, tags): | ||
| """Filter YAML data by tags, return filtered path.""" | ||
| if not tags: | ||
| return path | ||
| with open(path) as f: | ||
| data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)] | ||
| if not data: | ||
| sys.exit(f"⚠️ No tests found with tags: {tags}") | ||
| print(f"📋 Running {len(data)} test(s) with tags: {tags}") | ||
| tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False) | ||
| yaml.dump(data, tmp, default_flow_style=False, sort_keys=False) | ||
| tmp.close() | ||
| return tmp.name | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Temporary file is never cleaned up.
The temporary file created with delete=False is never removed, causing a resource leak. Each evaluation run with --tags will leave an orphaned temp file in the system's temp directory.
Consider using a context manager or ensuring cleanup. Here are two solutions:
Solution 1: Use context manager and clean up after evaluation
def filter_by_tags(path, tags):
"""Filter YAML data by tags, return filtered path."""
if not tags:
return path
with open(path) as f:
data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
if not data:
sys.exit(f"⚠️ No tests found with tags: {tags}")
print(f"📋 Running {len(data)} test(s) with tags: {tags}")
tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
tmp.close()
return tmp.name
# Parse command line arguments
args = parse_args()
if os.getenv('UNIQUE_ID') is None:
print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
sys.exit(1)
-args.eval_data_yaml = filter_by_tags(args.eval_data_yaml, args.tags)
+filtered_yaml = filter_by_tags(args.eval_data_yaml, args.tags)
+original_yaml = args.eval_data_yaml
+args.eval_data_yaml = filtered_yaml
evaluator = AgentGoalEval(args)
-# Run Evaluation
-evaluator.run_evaluation()
+try:
+ # Run Evaluation
+ evaluator.run_evaluation()
+finally:
+ # Clean up temp file if we created one
+ if args.tags and filtered_yaml != original_yaml:
+ try:
+ os.unlink(filtered_yaml)
+ except Exception:
+ passSolution 2: Use delete=True and keep file open (simpler if supported by AgentGoalEval)
Check if AgentGoalEval can accept an already-open file object or reads the path only once. If so:
def filter_by_tags(path, tags):
"""Filter YAML data by tags, return filtered path."""
if not tags:
return path
with open(path) as f:
data = [g for g in yaml.safe_load(f) if any(t in g.get('tags', []) for t in tags)]
if not data:
sys.exit(f"⚠️ No tests found with tags: {tags}")
print(f"📋 Running {len(data)} test(s) with tags: {tags}")
- tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=False)
+ # Note: On Windows, you may need delete=False with manual cleanup
+ tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.yaml', delete=True)
yaml.dump(data, tmp, default_flow_style=False, sort_keys=False)
- tmp.close()
+ tmp.flush() # Ensure data is written
return tmp.nameHowever, Solution 2 may have issues on Windows or if the file needs to be reopened by AgentGoalEval.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In test/evals/eval.py around lines 104-117, the NamedTemporaryFile is created
with delete=False and never removed, leaking temp files; fix by ensuring the
temp file is cleaned up after use — either create the temp file with delete=True
and keep the file handle open while passing it to AgentGoalEval (only if
AgentGoalEval can read from an open file), or keep delete=False but ensure you
remove the file (os.unlink or Path.unlink) in a finally block or teardown after
the evaluation completes so every code path deletes the temporary file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
test/evals/eval_data.yaml (2)
246-300: Inconsistent tag coverage: New test groups lack tags while filtering mechanism depends on them.The new test groups (lines 246–300) have descriptions but no tags, yet the PR adds tag-based filtering to
eval.py. Tests likeunable_to_create_cluster_with_custom_manifests_conv,create_vsphere_platform_clusters_conv, and others are orphaned from the tagging scheme, making them inaccessible via tag-based filtering. Either assign appropriate tags (e.g.,smoke,capabilities, or domain-specific tags) or clarify why these tests are excluded from the filtering system.Consider organizing new tests with semantic tags that align with their purpose:
- Lines 246–300: Could use tags like
smoke,capabilities, orintegrationdepending on scope- Lines 302–321: Already tagged
[troubleshooting]✓- Lines 329–363: Already tagged
[non-destructive]✓- Lines 389+: Role protection tests also lack tags; determine appropriate classification
389-466: Role protection test groups lack tags.The role protection test groups (direct-roleplaying, tone-manipulation, off-topic, mixed-requests, boundary-testing) starting at line 389 have no tags assigned. These should be included in the tagging scheme for consistency with the filtering system introduced in this PR. Consider assigning tags like
[smoke],[security], or[role-protection]to enable targeting these tests.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test/evals/eval_data.yaml(11 hunks)
🔇 Additional comments (1)
test/evals/eval_data.yaml (1)
302-363: Past review suggestions successfully integrated.The
[troubleshooting]tag has been correctly applied to ignition, degraded cluster, and console access tests (lines 302–321), and[non-destructive]tags applied to deletion and script tests (lines 329–363), aligning with eranco74's prior feedback. ✓
|
/test ? |
|
@eranco74: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test eval-test |
|
pull secret issue |
|
/ok-to-test |
test/evals/eval_data.yaml
Outdated
| eval_types: [response_eval:accuracy] | ||
| eval_types: [response_eval:accuracy, response_eval:intent] | ||
| expected_response: "Control plane nodes in a multi-node cluster require 4 CPU cores 16 GB of RAM and 100 GB of storage, the compute nodes in this cluster require 2 CPU cores, 8 GB of RAM and 100GB of storage" | ||
| expected_intent: "Providing hardware requirements for multi-node OpenShift clusters including CPU, RAM, and storage specifications for control plane and worker nodes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is about multinode cluster so the reply is about multinode cluster and not
multi-node OpenShift clusters which make the: response_eval:intent: FAIL!
|
/test eval-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/evals/eval_data.yaml (1)
394-471: Missing tags on all role protection security tests.The role protection test groups (lines 394–471) covering direct role-playing, tone manipulation, off-topic queries, mixed requests, and boundary testing lack tags entirely. These are important security/robustness validations and should be tagged (e.g.,
[security],[robustness], or a custom tag) to enable selective execution.Add an appropriate tag to all role protection tests:
- conversation_group: role_protection_direct_roleplaying_conv description: Verify rejection of direct role-playing requests + tags: [robustness] conversation: - conversation_group: role_protection_tone_manipulation_conv description: Verify rejection of tone and style manipulation attempts + tags: [robustness] conversation: - conversation_group: role_protection_off_topic_conv description: Verify rejection of off-topic queries unrelated to OpenShift + tags: [robustness] conversation: - conversation_group: role_protection_mixed_requests_conv description: Verify rejection of requests mixing legitimate queries with off-topic content + tags: [robustness] conversation: - conversation_group: role_protection_boundary_testing_conv description: Verify proper handling of edge cases and legitimate technical queries + tags: [robustness] conversation:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test/evals/eval_data.yaml(11 hunks)
🔇 Additional comments (2)
test/evals/eval_data.yaml (2)
57-65: Verify multinode cluster intent expectations match query intent.Based on past review feedback indicating intent mismatch between queries about "multinode cluster" and responses using "multi-node OpenShift clusters" terminology, verify that line 65's
expected_intentcorrectly captures the intent relationship between the query (line 62) and response (line 64).The current
expected_intentreferences "multi-node clusters"—ensure this accurately reflects what the chat model would infer from the user's "multinode cluster" query, avoiding terminology mismatches that could cause intent evaluation failures.
246-276: Confirm tagging strategy for 14 untagged new tests against PR objective.Verification confirms the review comment's factual claims: lines 246–276 and additional test groups (totaling 14 untagged tests) lack tags, contradicting the PR objective stating "added
tags: [smoke]to all dev suite tests." Current state shows only 16 of 37 conversation groups are smoke-tagged.The file exhibits a deliberate categorization pattern:
- Lines 1–228: 16 tests with
[smoke]- Lines 246–294: 7 tests with NO tags (new platform/chatbot/guidance tests)
- Lines 302–328: 3 tests with
[troubleshooting]- Lines 329–361: 4 tests with
[non-destructive]- Lines 370–471: 7 tests with NO tags (API/role protection tests)
This discrepancy between stated objective and actual implementation requires clarification on whether:
- All new tests should inherit
[smoke]tags to align with the PR objective, or- The untagged tests are intentionally categorized separately with a documented reason.
No tagging strategy documentation was found in commits, PR templates, or contributing guides.
| - conversation_group: assisted_installer_explanation_conv | ||
| description: "Assisted Installer explanation test" | ||
| conversation: | ||
| - eval_id: assisted_installer_explanation | ||
| eval_query: What is assisted installer and how does it work? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring." | ||
|
|
||
| - conversation_group: chatbot_capabilities_conv | ||
| description: "Chatbot capabilities test" | ||
| conversation: | ||
| - eval_id: chatbot_capabilities | ||
| eval_query: What can you do for me? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting." | ||
|
|
||
| - conversation_group: first_time_cluster_guidance_conv | ||
| description: "First time cluster guidance test" | ||
| conversation: | ||
| - eval_id: first_time_cluster_guidance | ||
| eval_query: I want to install a cluster but its my first time, what should i start with? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing tags on new general capability tests.
Conversation groups for Assisted Installer explanation, chatbot capabilities, and first-time guidance (lines 278–300) lack tags. These appear to be foundational tests and should likely be tagged to enable organized filtering.
Consider adding appropriate tags to these tests, e.g., [smoke] if they are core dev suite tests, or a new tag like [guidance] if they serve a different purpose:
- conversation_group: assisted_installer_explanation_conv
description: "Assisted Installer explanation test"
+ tags: [smoke]
conversation:
- conversation_group: chatbot_capabilities_conv
description: "Chatbot capabilities test"
+ tags: [smoke]
conversation:
- conversation_group: first_time_cluster_guidance_conv
description: "First time cluster guidance test"
+ tags: [smoke]
conversation:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - conversation_group: assisted_installer_explanation_conv | |
| description: "Assisted Installer explanation test" | |
| conversation: | |
| - eval_id: assisted_installer_explanation | |
| eval_query: What is assisted installer and how does it work? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring." | |
| - conversation_group: chatbot_capabilities_conv | |
| description: "Chatbot capabilities test" | |
| conversation: | |
| - eval_id: chatbot_capabilities | |
| eval_query: What can you do for me? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting." | |
| - conversation_group: first_time_cluster_guidance_conv | |
| description: "First time cluster guidance test" | |
| conversation: | |
| - eval_id: first_time_cluster_guidance | |
| eval_query: I want to install a cluster but its my first time, what should i start with? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type." | |
| - conversation_group: assisted_installer_explanation_conv | |
| description: "Assisted Installer explanation test" | |
| tags: [smoke] | |
| conversation: | |
| - eval_id: assisted_installer_explanation | |
| eval_query: What is assisted installer and how does it work? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring." | |
| - conversation_group: chatbot_capabilities_conv | |
| description: "Chatbot capabilities test" | |
| tags: [smoke] | |
| conversation: | |
| - eval_id: chatbot_capabilities | |
| eval_query: What can you do for me? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting." | |
| - conversation_group: first_time_cluster_guidance_conv | |
| description: "First time cluster guidance test" | |
| tags: [smoke] | |
| conversation: | |
| - eval_id: first_time_cluster_guidance | |
| eval_query: I want to install a cluster but its my first time, what should i start with? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type." |
🤖 Prompt for AI Agents
In test/evals/eval_data.yaml around lines 278 to 300, the three new
conversation_group entries lack tags for filtering; add an appropriate tags
field to each conversation block (e.g., tags: [smoke] or tags: [guidance])
consistent with existing test tagging conventions, placing the tags at the same
indentation level as description and conversation for each group so they are
recognized by the test harness.
| - conversation_group: assisted_service_api_spec_conv | ||
| description: "Assisted service API spec test" | ||
| conversation: | ||
| - eval_id: assisted_service_api_spec | ||
| eval_query: Can you provide assisted service API spec? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Declining to provide the API specification and explaining available capabilities." | ||
|
|
||
| - conversation_group: basic_context_conv | ||
| description: "Conversation with context test" | ||
| conversation: | ||
| - eval_id: start_conversation | ||
| eval_query: I want to create a cluster named test-cluster | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: I can help with that. | ||
| - eval_id: list_openshift_versions | ||
| eval_query: List the available OpenShift versions | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: A list of available versions | ||
| - eval_id: ask_for_context | ||
| eval_query: What is the name of the cluster that I want to create? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: test-cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing tags on API specification and context management tests.
Conversation groups for Assisted Service API spec, basic context, and related tests (lines 370–392) lack tags. These are important for API contract and state management validation and should be tagged for filtering.
Consider adding tags like [smoke] or a new tag like [api-contract]:
- conversation_group: assisted_service_api_spec_conv
description: "Assisted service API spec test"
+ tags: [smoke]
conversation:
- conversation_group: basic_context_conv
description: "Conversation with context test"
+ tags: [smoke]
conversation:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - conversation_group: assisted_service_api_spec_conv | |
| description: "Assisted service API spec test" | |
| conversation: | |
| - eval_id: assisted_service_api_spec | |
| eval_query: Can you provide assisted service API spec? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Declining to provide the API specification and explaining available capabilities." | |
| - conversation_group: basic_context_conv | |
| description: "Conversation with context test" | |
| conversation: | |
| - eval_id: start_conversation | |
| eval_query: I want to create a cluster named test-cluster | |
| eval_types: [response_eval:intent] | |
| expected_intent: I can help with that. | |
| - eval_id: list_openshift_versions | |
| eval_query: List the available OpenShift versions | |
| eval_types: [response_eval:intent] | |
| expected_intent: A list of available versions | |
| - eval_id: ask_for_context | |
| eval_query: What is the name of the cluster that I want to create? | |
| eval_types: [response_eval:intent] | |
| expected_intent: test-cluster | |
| - conversation_group: assisted_service_api_spec_conv | |
| description: "Assisted service API spec test" | |
| tags: [smoke] | |
| conversation: | |
| - eval_id: assisted_service_api_spec | |
| eval_query: Can you provide assisted service API spec? | |
| eval_types: [response_eval:intent] | |
| expected_intent: "Declining to provide the API specification and explaining available capabilities." | |
| - conversation_group: basic_context_conv | |
| description: "Conversation with context test" | |
| tags: [smoke] | |
| conversation: | |
| - eval_id: start_conversation | |
| eval_query: I want to create a cluster named test-cluster | |
| eval_types: [response_eval:intent] | |
| expected_intent: I can help with that. | |
| - eval_id: list_openshift_versions | |
| eval_query: List the available OpenShift versions | |
| eval_types: [response_eval:intent] | |
| expected_intent: A list of available versions | |
| - eval_id: ask_for_context | |
| eval_query: What is the name of the cluster that I want to create? | |
| eval_types: [response_eval:intent] | |
| expected_intent: test-cluster |
🤖 Prompt for AI Agents
test/evals/eval_data.yaml around lines 370 to 392: the conversation_group
entries for assisted_service_api_spec_conv and basic_context_conv are missing
tags used for filtering and test categorization; add a tags field to each
conversation_group (e.g., tags: [smoke] or tags: [api-contract]) to indicate
purpose and enable filtering, and update any related conversation entries if
your framework expects per-conversation tags instead of per-group tags.
|
/restart |
|
/test eval-test |
|
/lgtm |
|
@ItzikEzra-rh can you just rebase this for konflux? I think that's what it needs |
…ected intent, and the test-delte all clusters becaome to be multi turn conv wiht cluster creation first
f7353a9 to
ed5202c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
test/evals/eval.py (2)
104-116: Temporary file resource leak remains unresolved.The temporary file created with
delete=Falseis still never cleaned up, causing a resource leak on every evaluation run with--tags. This issue was previously flagged but has not been addressed.Please implement one of the solutions from the previous review:
- Add a
try/finallyblock to ensure cleanup after evaluation completes- Use a context manager to handle the temp file lifecycle
- Ensure
os.unlink(filtered_yaml)is called after the evaluation, even if it fails
121-123: Typo in error message remains unfixed.Line 122 contains a typo: "varialbe" should be "variable". This was previously flagged but has not been corrected.
Apply this diff to fix the typo:
- print("The environmental varialbe 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.") + print("The environmental variable 'UNIQUE_ID' has to be set so the cluster creation and removal can happen properly.")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
test/evals/eval.py(2 hunks)test/evals/eval_data.yaml(12 hunks)
🔇 Additional comments (1)
test/evals/eval.py (1)
81-99: Well-documented tag filtering system.The
--tagsargument includes comprehensive documentation describing each tag's purpose (smoke, troubleshooting, non-destructive), which addresses the previous request for verbose tag descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test/evals/eval_data.yaml(11 hunks)
🔇 Additional comments (1)
test/evals/eval_data.yaml (1)
450-472: Verify the broader tagging scope before proceeding.The claim that these two groups lack tags is factually correct (lines 450 and 458), and the suggested diff is accurate. However, verification reveals that 36 of 37 conversation groups in the file currently lack tags, not just these two. Before addressing this review comment, confirm:
- Whether the PR objectives specify that only these two groups should be tagged, or whether systematic tagging of all groups is intended
- Why most conversation groups in the file remain untagged despite the apparent focus on adding tags in recent commits
The review's classification and scope cannot be confirmed without clarifying whether this is a targeted fix for two specific tests or part of broader incomplete work.
| - conversation_group: unable_to_create_cluster_with_custom_manifests_conv | ||
| description: "Custom manifest cluster creation test" | ||
| conversation: | ||
| - eval_id: unable_to_create_cluster_with_custom_manifests | ||
| eval_query: Can you create a cluster with a custom manifest? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Declining to create a cluster with custom manifests." | ||
|
|
||
| - conversation_group: create_vsphere_platform_clusters_conv | ||
| description: "vSphere platform integration test" | ||
| conversation: | ||
| - eval_id: create_vsphere_platform_clusters | ||
| eval_query: Can you create a cluster with vsphere platform integration? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Offering to help create a cluster with vSphere platform and requesting necessary information like cluster name, OpenShift version, base domain, and whether it's a single-node cluster." | ||
|
|
||
| - conversation_group: create_nutanix_platform_clusters_conv | ||
| description: "Nutanix platform integration test" | ||
| conversation: | ||
| - eval_id: create_nutanix_platform_clusters | ||
| eval_query: Can you create a cluster with Nutanix platform integration? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Offering to help create a cluster with Nutanix platform and requesting necessary information like cluster name, OpenShift version, base domain, and whether it's a single-node cluster." | ||
|
|
||
| - conversation_group: create_static_network_clusters_conv | ||
| description: "Static networking cluster creation test" | ||
| conversation: | ||
| - eval_id: create_static_network_clusters | ||
| eval_query: Can you create a cluster with static networking? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Offering to help create a cluster and explaining that static network configuration can be set up for hosts." | ||
|
|
||
| - conversation_group: assisted_installer_explanation_conv | ||
| description: "Assisted Installer explanation test" | ||
| conversation: | ||
| - eval_id: assisted_installer_explanation | ||
| eval_query: What is assisted installer and how does it work? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Explaining what Assisted Installer is and providing an overview of the installation workflow including cluster definition, discovery ISO, host discovery, configuration, installation, and monitoring." | ||
|
|
||
| - conversation_group: chatbot_capabilities_conv | ||
| description: "Chatbot capabilities test" | ||
| conversation: | ||
| - eval_id: chatbot_capabilities | ||
| eval_query: What can you do for me? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Describing capabilities for helping with OpenShift installation using Assisted Installer, including cluster creation, host management, configuration, monitoring, and troubleshooting." | ||
|
|
||
| - conversation_group: first_time_cluster_guidance_conv | ||
| description: "First time cluster guidance test" | ||
| conversation: | ||
| - eval_id: first_time_cluster_guidance | ||
| eval_query: I want to install a cluster but its my first time, what should i start with? | ||
| eval_types: [response_eval:intent] | ||
| expected_intent: "Offering to guide through cluster creation and requesting necessary information like cluster name, OpenShift version, base domain, and cluster type." | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Add tags to new general capability and platform integration tests.
These conversation groups lack tags but should be tagged to enable filtering per the PR objectives ("added tags: [smoke] to all dev suite tests"). The groups covering custom manifests, platform integrations, assisted installer explanation, chatbot capabilities, and first-time guidance are foundational tests for the dev suite and should be tagged accordingly.
- conversation_group: unable_to_create_cluster_with_custom_manifests_conv
description: "Custom manifest cluster creation test"
+ tags: [smoke]
conversation:
- conversation_group: create_vsphere_platform_clusters_conv
description: "vSphere platform integration test"
+ tags: [smoke]
conversation:
- conversation_group: create_nutanix_platform_clusters_conv
description: "Nutanix platform integration test"
+ tags: [smoke]
conversation:
- conversation_group: create_static_network_clusters_conv
description: "Static networking cluster creation test"
+ tags: [smoke]
conversation:
- conversation_group: assisted_installer_explanation_conv
description: "Assisted Installer explanation test"
+ tags: [smoke]
conversation:
- conversation_group: chatbot_capabilities_conv
description: "Chatbot capabilities test"
+ tags: [smoke]
conversation:
- conversation_group: first_time_cluster_guidance_conv
description: "First time cluster guidance test"
+ tags: [smoke]
conversation:
---
🏁 Script executed:
```shell
#!/bin/bash
# Inspect the actual file content at lines 326-381 and surrounding areas
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if [ -f "test/evals/eval_data.yaml" ]; then
echo "=== Lines 320-390 of eval_data.yaml ==="
sed -n '320,390p' test/evals/eval_data.yaml | cat -n
else
echo "File not found"
fi
Length of output: 4725
🏁 Script executed:
#!/bin/bash
# Check lines 450-472 mentioned in scratchpad
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if [ -f "test/evals/eval_data.yaml" ]; then
echo "=== Lines 445-480 of eval_data.yaml ==="
sed -n '445,480p' test/evals/eval_data.yaml | cat -n
else
echo "File not found"
fiLength of output: 1544
🏁 Script executed:
#!/bin/bash
# Count total lines and search for "tags:" occurrences with context
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if [ -f "test/evals/eval_data.yaml" ]; then
echo "=== Total lines ==="
wc -l test/evals/eval_data.yaml
echo ""
echo "=== Occurrences of 'tags:' ==="
grep -n "tags:" test/evals/eval_data.yaml | head -20
fiLength of output: 512
🏁 Script executed:
#!/bin/bash
# Validate YAML syntax
cd "$(git rev-parse --show-toplevel)" 2>/dev/null || pwd
if command -v python3 &> /dev/null && [ -f "test/evals/eval_data.yaml" ]; then
python3 << 'PYEOF'
import yaml
try:
with open('test/evals/eval_data.yaml', 'r') as f:
yaml.safe_load(f)
print("YAML syntax: VALID")
except yaml.YAMLError as e:
print(f"YAML syntax: INVALID - {e}")
PYEOF
fiLength of output: 18
Add tags to untagged conversation groups to enable filtering per PR objectives.
The verification confirms that 9 conversation groups throughout the file lack tags while all other groups have them. Based on the consistent tagging pattern in the file (smoke, troubleshooting, non-destructive) and the PR objective to enable filtering, these groups require tags:
Lines 326–381:
- unable_to_create_cluster_with_custom_manifests_conv
- create_vsphere_platform_clusters_conv
- create_nutanix_platform_clusters_conv
- create_static_network_clusters_conv
- assisted_installer_explanation_conv
- chatbot_capabilities_conv
- first_time_cluster_guidance_conv
Lines 450–472:
- assisted_service_api_spec_conv
- basic_context_conv
Add tags: [smoke] to each (these are general capability/integration tests). The YAML syntax is valid; this is a tagging completeness issue.
🤖 Prompt for AI Agents
In test/evals/eval_data.yaml around lines 326–381 (and also add the two groups
at lines 450–472), several conversation_group entries are missing tags; add a
tags: [smoke] field to each of the listed groups
(unable_to_create_cluster_with_custom_manifests_conv,
create_vsphere_platform_clusters_conv, create_nutanix_platform_clusters_conv,
create_static_network_clusters_conv, assisted_installer_explanation_conv,
chatbot_capabilities_conv, first_time_cluster_guidance_conv,
assisted_service_api_spec_conv, basic_context_conv) ensuring proper YAML
indentation and syntax so each conversation_group block includes the new tags
entry.
|
/test eval-test |
1 similar comment
|
/test eval-test |
|
/test local-dev-test |
|
/ok-to-test |
|
/test eval-test |
1 similar comment
|
/test eval-test |
|
/retest |
3 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/test eval-test |
|
/ok-to-test |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eranco74, ItzikEzra-rh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@ItzikEzra-rh: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/ok-to-test |
21c9b92
into
rh-ecosystem-edge:main
Summary
This PR integrates the QE evaluation tests into the dev suite and adds a tag-based filtering system to allow users to run specific subsets of tests (e.g., smoke tests every pr) instead of running all tests.
test/evals/eval_data.yamltags: [smoke]to all dev suite teststest/evals/eval.py--tagscommand-line argument (optional) to filter tests by tagsfilter_by_tags()function that:Summary by CodeRabbit
New Features
Tests
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.