MGMT-21690: chatbot installation asks again for host and cluster IDs#195
Conversation
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: eranco74 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
template.yaml (1)
439-439: Template variable substitution bug: use ${PARAM}, not ${{PARAM}}, in OpenShift templates.OpenShift Template processing resolves ${PARAM}. The ${{...}} placeholders will not be substituted, risking invalid ports/replicas at deploy time.
Apply:
- replicas: ${{REPLICAS_COUNT}} + replicas: ${REPLICAS_COUNT} @@ - containerPort: ${{SERVICE_PORT}} + containerPort: ${SERVICE_PORT} @@ - port: ${{SERVICE_PORT}} + port: ${SERVICE_PORT} @@ - port: ${{SERVICE_PORT}} + port: ${SERVICE_PORT} @@ - port: ${{SERVICE_PORT}} - targetPort: ${{SERVICE_PORT}} + port: ${SERVICE_PORT} + targetPort: ${SERVICE_PORT}Also applies to: 455-456, 517-527, 607-608
🧹 Nitpick comments (4)
template.yaml (4)
233-237: Proactive retrieval: add scope/loop guards.Suggest clarifying to attempt each lookup once (per parameter) and fall back to asking the user if unavailable/permission-denied, to avoid repetitive hidden calls and latency.
Apply:
- **Proactive Information Retrieval:** - - Before asking the user for a parameter, such as a cluster ID or host ID, first attempt to find it using your available tools. + **Proactive Information Retrieval:** + - Before asking the user for a parameter (e.g., cluster ID or host ID), attempt a single lookup using available tools. If not found or access is denied, briefly ask the user for the value instead of retrying.
249-252: Don’t hard-stop the entire session for static networking; gracefully continue with supported paths.Recommend declining only the unsupported action, then proposing supported alternatives (e.g., DHCP, or continue with discovery/role assignment) while linking the wizard for static config.
Apply:
- * **Unsupported Network Configuration:** ... you must refuse the entire request. Do not proceed with cluster creation. ... + * **Unsupported Network Configuration:** ... you must decline the unsupported networking step and direct the user to the assisted-installer web-based wizard for static networking. Continue assisting with supported steps (e.g., discovery, role assignment, or DHCP-based configuration).
260-265: Host discovery flow: define cluster-type source and large-list behavior.
- Clarify how to determine SNO vs multi-node (cluster object field vs host count; ask user if ambiguous).
- For large host sets, avoid dumping excessively long lists; show first N with count + offer filters.
Apply:
* When a user indicates that hosts have been booted, first check for discovered hosts. - * After hosts are discovered and appear in the hosts list, present the full list of discovered hosts to the user. + * After hosts are discovered and appear in the hosts list, present the list to the user. If the list is large (e.g., >20), show the first 20 with the total count and offer filtering (by hostname, inventory state, or role). * Proactively offer the next steps based on the cluster type: - * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer to help with **manual role assignment** to a specific host (e.g., master, worker). - * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation. + * Determine the cluster type from cluster metadata; if ambiguous, ask the user to confirm SNO vs multi-node. + * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer **manual role assignment** to a specific host (e.g., master/control-plane, worker). Ensure there are at least 3 control-plane nodes before proceeding with a HA install. + * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.
131-133: Typo in parameter name: LIGHTSSPEED → LIGHTSPEED (consistency).Spelling inconsistency may confuse operators. Consider renaming, with a deprecation note, in a future breaking change.
Apply:
- - name: LIGHTSSPEED_STACK_POSTGRES_SSL_MODE + - name: LIGHTSPEED_STACK_POSTGRES_SSL_MODE @@ - ssl_mode: ${LIGHTSSPEED_STACK_POSTGRES_SSL_MODE} + ssl_mode: ${LIGHTSPEED_STACK_POSTGRES_SSL_MODE}Note: This is breaking; if you proceed now, update all references and communicate the change. Otherwise, keep as-is and schedule a follow-up.
Also applies to: 198-199
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
template.yaml(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
- GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (1)
template.yaml (1)
604-631: Headless Service behind a Route—verify intent.The Service is headless (clusterIP: None) while a Route targets it. Routes typically back a ClusterIP Service. Please confirm this is deliberate and functional in your environment.
| **Mandatory Pre-Flight Checks for Cluster Creation** | ||
| * **Unsupported Network Configuration:** Before attempting to create a cluster, you **must first check** if the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers). | ||
| * **If an unsupported configuration is detected, you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. | ||
| * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. | ||
| * **Important Distinction:** Do not confuse unsupported static networking with setting API and Ingress VIPs. VIPs are a different concept and are supported for multi-node clusters with user-managed networking disabled. | ||
|
|
There was a problem hiding this comment.
Fix grammar/markdown in “Unsupported Network Configuration” bullet.
There’s a duplicated word (“or or”) and an extra closing bold marker that will render incorrectly.
Apply:
- * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
+ * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| **Mandatory Pre-Flight Checks for Cluster Creation** | |
| * **Unsupported Network Configuration:** Before attempting to create a cluster, you **must first check** if the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers). | |
| * **If an unsupported configuration is detected, you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. | |
| * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. | |
| * **Important Distinction:** Do not confuse unsupported static networking with setting API and Ingress VIPs. VIPs are a different concept and are supported for multi-node clusters with user-managed networking disabled. | |
| **Mandatory Pre-Flight Checks for Cluster Creation** | |
| * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. | |
| * **Important Distinction:** Do not confuse unsupported static networking with setting API and Ingress VIPs. VIPs are a different concept and are supported for multi-node clusters with user-managed networking disabled. |
🤖 Prompt for AI Agents
In template.yaml around lines 249 to 252, the "Unsupported Network
Configuration" bullet contains a duplicated word ("or or") and an extra closing
bold marker that breaks markdown; remove the duplicate "or", fix punctuation and
bold markup so the sentence reads cleanly (e.g., list unsupported items as
"static networking, static IP addresses, or node-level network settings (like
MAC addresses, gateways, or DNS servers)" and remove the stray "**" after
"request"), preserving the instruction to refuse static networking and to direct
users to the assisted-installer web-based wizard.
|
Adding eval tests. |
template.yaml
Outdated
| --- | ||
|
|
||
| **Proactive Information Retrieval:** | ||
| - Before asking the user for a parameter, such as a cluster ID or host ID, first attempt to find it using your available tools. |
There was a problem hiding this comment.
I was mostly able to get it to stop asking for cluster id in my other PR by changing this line: https://github.com/rh-ecosystem-edge/assisted-chat/pull/188/files#diff-1363ef5ce8886100842332c97163aad7934237e1fe49b5d40422b45fdc30f38eR303
The issue I was seeing was that it was forgetting what cluster it had been operating on in the prior turn so it wasn't as much a matter of using tools but using the conversation history.
There was a problem hiding this comment.
I see.
This change should solve a diffrent issue, where the agent doesn't know the ID. e.g. First prompt is: "Give me the ISO for cluster named test".
Instead of the agent asking the ID from the user, it can list the clusters and find the ID
There was a problem hiding this comment.
OK, I now notice the "* If the user specifies a cluster by name (not a UUID), map it to its cluster ID by internally listing the clusters..."
OK, I can remove this
|
/hold |
|
Hmm, while the resulting conversation flow seems fine, the actual tool calls are odd, the agent prefer to list all clusters instead of calling |
fa27998 to
eb2ce8a
Compare
|
/unhold |
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
template.yaml (1)
250-250: Fix duplicated word and stray bold marker in “Unsupported Network Configuration” bullet.Removes “or or” and the extra “**”.
Apply:
- * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. + * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (6)
test/prow/entrypoint.sh (1)
23-23: Make sed replacement robust to special characters and quote the path.If UNIQUE_ID contains sed metacharacters (/, &, |), the substitution may break. Also quote the YAML path.
Apply:
-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml +safe_id=${UNIQUE_ID//&/\\&} +safe_id=${safe_id//\//\\/} +sed -i "s/uniq-cluster-name/${safe_id}/g" "$TEST_DIR/eval_data.yaml"test/evals/eval_data.yaml (4)
121-129: Grammar fix in expected response (“hosts hasn’t” → “hosts haven’t”).Minor, but it can trip strict judges.
Apply:
- expected_response: "hosts hasn't been discovered yet." + expected_response: "hosts haven't been discovered yet."Optionally switch this check to sub-string for robustness.
112-113: Fix duplicated words in SSH key response.“The SSH public key is set for the cluster for cluster.”
Apply:
- expected_response: The SSH public key is set for the cluster for cluster + expected_response: The SSH public key has been set for the cluster.
186-186: Typo in eval_id.“create_single_node_cluser” → “create_single_node_cluster”.
Apply:
- - eval_id: create_single_node_cluser + - eval_id: create_single_node_cluster
191-200: Consistency: assert name→ID resolution by adding cluster_infotest/evals/eval_data.yaml — eval_id: cluster_name_tool_call (lines ~190–195) currently only asserts list_clusters; add a second expected_tool_calls entry for cluster_info with a UUID regex for cluster_id to verify name→ID resolution.
template.yaml (1)
233-235: Clarify exception for internal list lookups to avoid unwanted list dumps.“Direct Display of List Outputs” can conflict with name→ID resolution (agent ends up listing clusters instead of proceeding). Add an explicit exception for internal parameter resolution when an exact match exists; only display the list if multiple or zero exact matches.
Apply:
- When a tool provides a list of items (e.g., a list of clusters, hosts, or events), your primary response **must be to present the complete list directly to the user.** Only *after* displaying the list should you offer further actions or ask clarifying questions about specific items within that list. Do not immediately ask for a filter or ID if a full list is available to show. + When a tool provides a list of items (e.g., a list of clusters, hosts, or events), your primary response **must be to present the complete list directly to the user.** Exception: when you call a list internally to resolve a user‑provided exact name to its ID, proceed silently if there is a single exact match. Only display the list if there are zero or multiple exact matches, or if the user explicitly asks to see the list. Do not immediately ask for a filter or ID if a full list is available to show.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
template.yaml(2 hunks)test/evals/eval_data.yaml(5 hunks)test/prow/entrypoint.sh(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml
[error] 149-149: duplication of key "description" in mapping
(key-duplicates)
🔇 Additional comments (1)
template.yaml (1)
260-265: Host-booted flow additions look good.Clear sequencing and role guidance for multi-node vs SNO.
/retest |
eb2ce8a to
7fe6cc0
Compare
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (3)
test/evals/eval_data.yaml (2)
77-77: Fix base domain mismatch in expected_response (example.com vs test.local).The eval_query and expected_tool_calls use base_domain example.com, but expected_response says test.local, which will cause accuracy checks to fail.
- expected_response: I have created a cluster with name eval-test-singlenode-uniq-cluster-name, OpenShift version 4.19.7, base domain test.local. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL? + expected_response: I have created a cluster with name eval-test-singlenode-uniq-cluster-name, OpenShift version 4.19.7, base domain example.com. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
142-150: Remove duplicate “description” key to satisfy YAML spec and YAMLlint.Two description keys exist in the same mapping; keep one and convert the second to a comment (or split into a separate eval).
- eval_id: cluster_info_tool_call - description: Test handling for non-existent cluster name + description: Test handling for non-existent cluster name eval_query: Give me details about cluster named 'abc123' eval_types: [tool_eval, response_eval:accuracy] expected_response: Retrieval failed for cluster 'abc123' because the resource was not found. expected_tool_calls: - - tool_name: list_clusters arguments: {} - description: Test error handling for non-existent cluster ID + # Note: Test error handling for non-existent cluster IDRun this to catch duplicate keys and lint errors:
#!/bin/bash set -euo pipefail pip install --quiet yamllint >/dev/null 2>&1 || true yamllint -d "{extends: default, rules: {line-length: disable}}" test/evals/eval_data.yaml || truetemplate.yaml (1)
249-252: Fix duplicated word and stray bold marker in “Unsupported Network Configuration”.Clean up “or or” and the extra closing **.
- * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. + * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (6)
test/evals/eval_data.yaml (3)
108-113: Fix duplicate phrase in SSH key confirmation message.Message reads “for the cluster for cluster”.
- expected_response: The SSH public key is set for the cluster for cluster + expected_response: The SSH public key is set for the cluster.
121-129: Grammar and brittleness in “hosts not discovered” check.
- Grammar: “hosts hasn't been discovered yet.” → “Hosts haven't been discovered yet.”
- Suggest using sub-string evaluation to avoid exact-string fragility.
- eval_types: [tool_eval, response_eval:accuracy] + eval_types: [tool_eval, response_eval:sub-string] @@ - expected_response: "hosts hasn't been discovered yet." + expected_keywords: ["Hosts", "discovered"]If you prefer accuracy matching, update the agent text and use: “Hosts haven't been discovered yet.”
185-190: Correct eval_id typo and mismatch.eval_id says “single_node cluser” while the eval creates a multi-node cluster; also “cluser” is misspelled.
- - eval_id: create_single_node_cluser + - eval_id: create_multinode_clustertemplate.yaml (3)
284-286: Minor grammar fixes for VIP guidance.- * Cluster with user-managed networking enabled don't need to **set API and Ingress VIPs**. + * Clusters with user-managed networking enabled don't need to **set API and Ingress VIPs**.
208-210: Terminology nit: “on-premises” (not “on-premise”).- - Supported: On-premise OpenShift installs via Assisted Installer on baremetal hosts or VMs (e.g., vSphere, KVM, libvirt). + - Supported: On-premises OpenShift installs via Assisted Installer on bare-metal hosts or VMs (e.g., vSphere, KVM, libvirt).
312-313: Confirm intent: auto-mapping name→ID by listing clusters vs relying on conversation state.This instruction can cause unnecessary list operations when the agent already knows the ID. Consider:
- First attempt: use known/last-used cluster context from conversation.
- Fallback: list clusters and map exact name.
Do you want me to propose wording that prioritizes the conversation state first?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
template.yaml(2 hunks)test/evals/eval_data.yaml(5 hunks)test/prow/entrypoint.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test/prow/entrypoint.sh
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml
[error] 149-149: duplication of key "description" in mapping
(key-duplicates)
🔇 Additional comments (3)
test/evals/eval_data.yaml (1)
91-103: Consistency check: multi-node creation flow looks coherent.Name, version, base_domain, single_node=false, and follow-up ISO step align with goals. No action needed.
Please confirm the system prompt also avoids exposing the ISO URL (matches tests’ omission of URLs).
template.yaml (2)
260-265: Host discovery guidance upgrade LGTM.Clear sequencing (check hosts and status → list hosts → role assignment by cluster type) aligns with PR goals.
131-136: Do not rename LIGHTSSPEED in-place — add an alias or update all callersLIGHTSSPEED_STACK_POSTGRES_SSL_MODE is referenced in:
- template.yaml (lines ~131 and 198)
- template-params.dev.env (line 8)
- scripts/deploy_template.sh (line 82)
Options:
- If breaking change accepted: rename to LIGHTSPEED_STACK_POSTGRES_SSL_MODE and update the files above and any external consumers.
- If backward compatibility required: introduce LIGHTSPEED_STACK_POSTGRES_SSL_MODE and keep LIGHTSSPEED_STACK_POSTGRES_SSL_MODE as a fallback/alias (map new→old or add a fallback) so consumers don’t break.
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/ci_test.sh (1)
40-45: Fix label mismatch and make pod selection robust (handle 0/≥2 pods).
- Error message mentions label app=assisted-chat-eval-test, but selection uses job-name=...; align the message.
- Indexing .items[0] is brittle when there are 0 or multiple pods (e.g., retries). Prefer sorting by creation time and taking the latest; also guard against errexit on empty results.
- POD_NAME=$(oc get pods -n "$NAMESPACE" -l job-name="${JOB_NAME}-${UNIQUE_ID}" -o jsonpath='{.items[0].metadata.name}') + POD_NAME=$(oc get pods -n "$NAMESPACE" -l job-name="${JOB_NAME}-${UNIQUE_ID}" \ + --sort-by=.metadata.creationTimestamp -o name 2>/dev/null | tail -n1 | cut -d/ -f2 || true) if [[ -z "${POD_NAME}" ]]; then - echo "No pod found with label app=assisted-chat-eval-test in namespace ${NAMESPACE}" + echo "No pod found with label job-name=${JOB_NAME}-${UNIQUE_ID} in namespace ${NAMESPACE}" oc get pods -n "$NAMESPACE" exit 1 fi
🧹 Nitpick comments (2)
scripts/ci_test.sh (2)
60-66: Terminology fix: it’s the Job that failed, not necessarily the Pod.Message currently says “Pod … is Failed”. Clarify to “Job … failed” to avoid confusion when pods restart or complete.
- echo "Pod ${POD_NAME} is Failed, exiting" + echo "Job ${JOB_NAME}-${UNIQUE_ID} failed, exiting"
14-19: Typos in user-facing logs.
- “proided” → “provided”
- “provieded” → “provided”
- “PLease” → “Please”
- “were successful” (grammar)
- echo "The variable ASSISTED_CHAT_TEST was proided with the value ${ASSISTED_CHAT_TEST}, using it to create the IMAGE and TAG variables for the template" + echo "The variable ASSISTED_CHAT_TEST was provided with the value ${ASSISTED_CHAT_TEST}, using it to create the IMAGE and TAG variables for the template" - echo "The variable ASSISTED_CHAT_TEST was not provieded, downloading the latest image from ${IMAGE}" + echo "The variable ASSISTED_CHAT_TEST was not provided, downloading the latest image from ${IMAGE}" -echo "Timeout reached. Pod $POD_NAME did not become ready in time. PLease check the logs of the pods under the directory artifacts/eval-test/gather-extra/artifacts/pods/." +echo "Timeout reached. Pod $POD_NAME did not become ready in time. Please check the pod logs under artifacts/eval-test/gather-extra/artifacts/pods/."Also applies to: 74-75
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/ci_test.sh(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
- GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (1)
scripts/ci_test.sh (1)
21-30: Secret bootstrap is fine; verify required env vars are always set in CI.NAMESPACE and SHARED_DIR are required under nounset; confirm CI guarantees both. If not, set sane defaults or fail fast with a clearer message.
Would you like me to add a short preflight to assert these are set and print them for debugging?
| while [ $ELAPSED -lt $TIMEOUT ]; do | ||
| # Check if the pod's status is "Running" | ||
| JOB_SUCCEEDED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null) | ||
| JOB_FAILED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null) | ||
| JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null) | ||
| JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null) | ||
|
|
||
| if [[ "$JOB_SUCCEEDED" -gt 0 ]]; then | ||
| echo "The evaluation test were successful. The logs of the tests are stored in the directory artifacts/eval-test/gather-extra/artifacts/pods/ in the logs of the pod ${POD_NAME}." |
There was a problem hiding this comment.
Harden job status checks against empty jsonpath outputs; fix misleading comment.
- When .status.succeeded/failed are absent, jsonpath returns empty; numeric compare can misbehave. Default to 0 before comparisons. Also, the comment says “pod’s status is Running” but the code checks the Job status; update the comment.
while [ $ELAPSED -lt $TIMEOUT ]; do
- # Check if the pod's status is "Running"
- JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null)
- JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null)
+ # Check Job status (Complete/Failed)
+ JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null || true)
+ JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null || true)
+ JOB_SUCCEEDED=${JOB_SUCCEEDED:-0}
+ JOB_FAILED=${JOB_FAILED:-0}Optional simplification (reduces polling): replace the loop with oc wait:
-# existing loop…
+if oc wait -n "$NAMESPACE" --for=condition=complete "job/${JOB_NAME}-${UNIQUE_ID}" --timeout=600s; then
+ echo "The evaluation tests were successful. Logs in artifacts/eval-test/gather-extra/artifacts/pods/ for pod ${POD_NAME}."
+ exit 0
+fi
+# If not complete in time, surface failure/events below as today.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| while [ $ELAPSED -lt $TIMEOUT ]; do | |
| # Check if the pod's status is "Running" | |
| JOB_SUCCEEDED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null) | |
| JOB_FAILED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null) | |
| JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null) | |
| JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null) | |
| if [[ "$JOB_SUCCEEDED" -gt 0 ]]; then | |
| echo "The evaluation test were successful. The logs of the tests are stored in the directory artifacts/eval-test/gather-extra/artifacts/pods/ in the logs of the pod ${POD_NAME}." | |
| while [ $ELAPSED -lt $TIMEOUT ]; do | |
| # Check Job status (Complete/Failed) | |
| JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null || true) | |
| JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null || true) | |
| JOB_SUCCEEDED=${JOB_SUCCEEDED:-0} | |
| JOB_FAILED=${JOB_FAILED:-0} | |
| if [[ "$JOB_SUCCEEDED" -gt 0 ]]; then | |
| echo "The evaluation test were successful. The logs of the tests are stored in the directory artifacts/eval-test/gather-extra/artifacts/pods/ in the logs of the pod ${POD_NAME}." |
🤖 Prompt for AI Agents
In scripts/ci_test.sh around lines 50 to 56, the loop comment incorrectly refers
to a pod status and the numeric comparisons can fail when oc jsonpath returns
empty strings for .status.succeeded/.status.failed; change the comment to
reference the Job status, normalize empty jsonpath outputs to 0 before numeric
comparison (e.g., fallback to 0 via parameter expansion or an explicit test) so
JOB_SUCCEEDED and JOB_FAILED are always numeric, and keep the existing branch
logic; optionally replace the entire polling loop with oc wait on the Job to
simplify and reduce polling.
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
test/evals/eval_data.yaml (2)
64-76: Base-domain mismatch fix looks good.example.com is now consistent across query, tool args, and expected response.
141-149: YAML parse error: duplicate “description” key in same mapping.This is a hard failure in YAML parsers; remove the second key or split into two evals.
- eval_id: cluster_info_tool_call - description: Test handling for non-existent cluster name + description: Test handling for non-existent cluster name eval_query: Give me details about cluster named 'abc123' eval_types: [tool_eval, response_eval:accuracy] expected_response: Retrieval failed for cluster 'abc123' because the resource was not found. expected_tool_calls: - - tool_name: list_clusters arguments: {} - description: Test error handling for non-existent cluster ID + # Note: Test error handling for non-existent cluster ID (moved to a separate eval or comment)Alternatively, create a second eval entry for the ID case.
🧹 Nitpick comments (3)
test/evals/eval_data.yaml (3)
33-33: Tightening eval to accuracy-only may be brittle.Relying on exact-string accuracy can flap on minor punctuation/spacing. Consider adding sub-string keywords for resilience or keep both accuracy + sub-string.
- eval_types: [response_eval:accuracy, tool_eval] + eval_types: [response_eval:accuracy, response_eval:sub-string, tool_eval] + expected_keywords: ["OpenShift AI", "OpenShift Virtualization"]
111-111: Fix duplicated wording in expected response."The SSH public key is set for the cluster for cluster" has a repeated phrase.
- expected_response: The SSH public key is set for the cluster for cluster + expected_response: The SSH public key is set for the cluster
184-189: Eval ID/name mismatch and typo.eval_id says “single_node cluser” while the query creates a multi-node cluster; also “cluser” is misspelled.
- - eval_id: create_single_node_cluser + - eval_id: create_multinode_cluster
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test/evals/eval_data.yaml(6 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml
[error] 148-148: duplication of key "description" in mapping
(key-duplicates)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
- GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (3)
test/evals/eval_data.yaml (3)
78-84: Good: ID reuse path for ISO retrieval.This encourages resolving the cluster ID once and reusing it.
91-102: Multinode flow updates LGTM.Names, version, and keywords align; keeps ISO mention without leaking the URL.
113-120: ISO retrieval step for multinode: OK.Tool call and keywords are appropriate; no URL disclosure.
| - eval_id: host_booted_but_not_discovered | ||
| eval_query: I booted the hosts | ||
| eval_types: [tool_eval, response_eval:accuracy] | ||
| expected_tool_calls: | ||
| - - tool_name: cluster_info | ||
| arguments: | ||
| cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}" | ||
| expected_response: "hosts hasn't been discovered yet." |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Make the “hosts not discovered” assertion less brittle (and fix grammar).
Exact-phrase accuracy on a short sentence will be fragile; also grammar should be “haven’t.”
- eval_types: [tool_eval, response_eval:accuracy]
+ eval_types: [tool_eval, response_eval:sub-string]
expected_tool_calls:
- - tool_name: cluster_info
arguments:
cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
- expected_response: "hosts hasn't been discovered yet."
+ expected_keywords: ["no hosts discovered", "booted", "waiting"]If you prefer accuracy, then at least:
- expected_response: "hosts hasn't been discovered yet."
+ expected_response: "hosts haven't been discovered yet."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - eval_id: host_booted_but_not_discovered | |
| eval_query: I booted the hosts | |
| eval_types: [tool_eval, response_eval:accuracy] | |
| expected_tool_calls: | |
| - - tool_name: cluster_info | |
| arguments: | |
| cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}" | |
| expected_response: "hosts hasn't been discovered yet." | |
| - eval_id: host_booted_but_not_discovered | |
| eval_query: I booted the hosts | |
| eval_types: [tool_eval, response_eval:sub-string] | |
| expected_tool_calls: | |
| - - tool_name: cluster_info | |
| arguments: | |
| cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}" | |
| expected_keywords: ["no hosts discovered", "booted", "waiting"] |
🤖 Prompt for AI Agents
In test/evals/eval_data.yaml around lines 120-127, the expected_response is
brittle and grammatically incorrect ("hosts hasn't been discovered yet.");
change it to a less fragile assertion and correct the grammar by either
replacing the exact string with a substring/regex match (e.g. a case-insensitive
pattern like "haven't been discovered" or "haven't been discovered yet") or
update the literal to the corrected sentence "hosts haven't been discovered
yet." Ensure the test harness uses substring or regex matching rather than
exact-phrase equality so short response variations don't break the eval.
78db8f1 to
65a0ddf
Compare
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
template.yaml (1)
250-251: Fix typo and stray bold marker in “Unsupported Network Configuration” bullet.Removes duplicated “or” and the extra trailing “**”.
- * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. + * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (1)
template.yaml (1)
260-265: Clarify SNO multi-host discovery edge case.Add guidance when >1 host is discovered for SNO; optional terminology tweak.
* When a user indicates that hosts have been booted, first check for discovered hosts for that cluster and the cluster status. * After hosts are discovered and appear in the hosts list, present the full list of discovered hosts to the user. * Proactively offer the next steps based on the cluster type: * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer to help with **manual role assignment** to a specific host (e.g., master, worker). - * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation. + * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` (control-plane) role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation. + * If more than one host is discovered for a SNO cluster, explain that only a single host is supported and offer to either convert to a multi-node cluster or remove the extra hosts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
template.yaml(2 hunks)test/evals/eval_data.yaml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test/evals/eval_data.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
- GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions. Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps. Signed-off-by: Eran Cohen <eranco@redhat.com>
65a0ddf to
5059678
Compare
|
@eranco74: This pull request references MGMT-21690 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
template.yaml (1)
250-251: Fix duplicated word and stray bold marker in “Unsupported Network Configuration”.There’s a duplicated “or” and an extra closing
**that will break markdown rendering.- * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead. + * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (1)
template.yaml (1)
260-265: Clarify status examples and align role terminology; keep SNO guidance consistent.Minor copy edits to reduce ambiguity and align with inclusive/control‑plane terminology.
- * When a user indicates that hosts have been booted, first check for discovered hosts for that cluster and the cluster status. + * When a user indicates that hosts have been booted, first check for discovered hosts for that cluster and the cluster status (e.g., "pending-for-input", "insufficient", "ready"). @@ - * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer to help with **manual role assignment** to a specific host (e.g., master, worker). - * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation. + * **For a multi-node cluster:** Inform the user that roles can be assigned automatically or manually. Offer to help with **manual role assignment** to a specific host (e.g., control plane (master), worker). + * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `control plane (master)` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
template.yaml(2 hunks)test/evals/eval_data.yaml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test/evals/eval_data.yaml
|
/lgtm |
c31f2ef
into
rh-ecosystem-edge:main
Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.
Summary by CodeRabbit
New Features
Documentation
Tests