Skip to content

MGMT-21690: chatbot installation asks again for host and cluster IDs#195

Merged
openshift-merge-bot[bot] merged 1 commit intorh-ecosystem-edge:mainfrom
eranco74:MGMT-21690
Sep 15, 2025
Merged

MGMT-21690: chatbot installation asks again for host and cluster IDs#195
openshift-merge-bot[bot] merged 1 commit intorh-ecosystem-edge:mainfrom
eranco74:MGMT-21690

Conversation

@eranco74
Copy link
Collaborator

@eranco74 eranco74 commented Sep 10, 2025

Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features

    • Enforced pre-flight validation: requests with unsupported networking are rejected with guidance to use the wizard.
    • Multi-step host discovery flow with cluster status checks and discovered-hosts list.
    • Contextual role assignment: auto/manual for multi-node; automatic master for SNO.
    • Option to monitor host events.
  • Documentation

    • Expanded hardware requirements by cluster type (multi-node masters/workers, SNO) with note that operators may increase needs.
  • Tests

    • Added eval for “hosts booted but not discovered.”
    • Updated cluster retrieval test expectations.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 10, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Proactive Parameter Retrieval: The agent is now instructed to attempt finding necessary parameters (like cluster or host IDs) using its tools before asking the user, reducing conversational friction.
Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link

openshift-ci bot commented Sep 10, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@eranco74 eranco74 marked this pull request as ready for review September 10, 2025 09:39
@coderabbitai
Copy link

coderabbitai bot commented Sep 10, 2025

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "MGMT-21690: chatbot installation asks again for host and cluster IDs" directly reflects the PR's primary objective to stop repeated prompts by proactively retrieving cluster/host IDs and aligns with the PR description; it is concise, specific, and focused on the main user-facing behavioral fix. The JIRA key is included but the wording clearly communicates the change without extraneous detail.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Sep 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eranco74

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 10, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Proactive Parameter Retrieval: The agent is now instructed to attempt finding necessary parameters (like cluster or host IDs) using its tools before asking the user, reducing conversational friction.
Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features

  • Proactive data gathering: the assistant now attempts to locate required parameters before asking.

  • Enhanced host discovery: shows discovered hosts and tailors next steps by cluster type (multi-node role assignment options; SNO auto-assigns master and suggests next action).

  • Discovery ISO guidance offered after cluster creation.

  • Behavior Changes

  • Strict refusal for unsupported network configurations, directing users to the web-based wizard.

  • If a tool returns a Discovery ISO URL, it is omitted from responses.

  • Style

  • Minor text adjustments to network-configuration guidance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
template.yaml (1)

439-439: Template variable substitution bug: use ${PARAM}, not ${{PARAM}}, in OpenShift templates.

OpenShift Template processing resolves ${PARAM}. The ${{...}} placeholders will not be substituted, risking invalid ports/replicas at deploy time.

Apply:

-    replicas: ${{REPLICAS_COUNT}}
+    replicas: ${REPLICAS_COUNT}
@@
-            containerPort: ${{SERVICE_PORT}}
+            containerPort: ${SERVICE_PORT}
@@
-              port: ${{SERVICE_PORT}}
+              port: ${SERVICE_PORT}
@@
-              port: ${{SERVICE_PORT}}
+              port: ${SERVICE_PORT}
@@
-      port: ${{SERVICE_PORT}}
-      targetPort: ${{SERVICE_PORT}}
+      port: ${SERVICE_PORT}
+      targetPort: ${SERVICE_PORT}

Also applies to: 455-456, 517-527, 607-608

🧹 Nitpick comments (4)
template.yaml (4)

233-237: Proactive retrieval: add scope/loop guards.

Suggest clarifying to attempt each lookup once (per parameter) and fall back to asking the user if unavailable/permission-denied, to avoid repetitive hidden calls and latency.

Apply:

-      **Proactive Information Retrieval:**
-      - Before asking the user for a parameter, such as a cluster ID or host ID, first attempt to find it using your available tools.
+      **Proactive Information Retrieval:**
+      - Before asking the user for a parameter (e.g., cluster ID or host ID), attempt a single lookup using available tools. If not found or access is denied, briefly ask the user for the value instead of retrying.

249-252: Don’t hard-stop the entire session for static networking; gracefully continue with supported paths.

Recommend declining only the unsupported action, then proposing supported alternatives (e.g., DHCP, or continue with discovery/role assignment) while linking the wizard for static config.

Apply:

-          * **Unsupported Network Configuration:** ... you must refuse the entire request. Do not proceed with cluster creation. ...
+          * **Unsupported Network Configuration:** ... you must decline the unsupported networking step and direct the user to the assisted-installer web-based wizard for static networking. Continue assisting with supported steps (e.g., discovery, role assignment, or DHCP-based configuration).

260-265: Host discovery flow: define cluster-type source and large-list behavior.

  • Clarify how to determine SNO vs multi-node (cluster object field vs host count; ask user if ambiguous).
  • For large host sets, avoid dumping excessively long lists; show first N with count + offer filters.

Apply:

           * When a user indicates that hosts have been booted, first check for discovered hosts.
-          * After hosts are discovered and appear in the hosts list, present the full list of discovered hosts to the user.
+          * After hosts are discovered and appear in the hosts list, present the list to the user. If the list is large (e.g., >20), show the first 20 with the total count and offer filtering (by hostname, inventory state, or role).
           * Proactively offer the next steps based on the cluster type:
-              * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer to help with **manual role assignment** to a specific host (e.g., master, worker).
-              * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.
+              * Determine the cluster type from cluster metadata; if ambiguous, ask the user to confirm SNO vs multi-node.
+              * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer **manual role assignment** to a specific host (e.g., master/control-plane, worker). Ensure there are at least 3 control-plane nodes before proceeding with a HA install.
+              * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.

131-133: Typo in parameter name: LIGHTSSPEED → LIGHTSPEED (consistency).

Spelling inconsistency may confuse operators. Consider renaming, with a deprecation note, in a future breaking change.

Apply:

- - name: LIGHTSSPEED_STACK_POSTGRES_SSL_MODE
+ - name: LIGHTSPEED_STACK_POSTGRES_SSL_MODE
@@
-          ssl_mode: ${LIGHTSSPEED_STACK_POSTGRES_SSL_MODE}
+          ssl_mode: ${LIGHTSPEED_STACK_POSTGRES_SSL_MODE}

Note: This is breaking; if you proceed now, update all references and communicate the change. Otherwise, keep as-is and schedule a follow-up.

Also applies to: 198-199

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81cc452 and fa27998.

📒 Files selected for processing (1)
  • template.yaml (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
  • GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (1)
template.yaml (1)

604-631: Headless Service behind a Route—verify intent.

The Service is headless (clusterIP: None) while a Route targets it. Routes typically back a ClusterIP Service. Please confirm this is deliberate and functional in your environment.

Comment on lines 249 to 252
**Mandatory Pre-Flight Checks for Cluster Creation**
* **Unsupported Network Configuration:** Before attempting to create a cluster, you **must first check** if the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers).
* **If an unsupported configuration is detected, you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
* **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
* **Important Distinction:** Do not confuse unsupported static networking with setting API and Ingress VIPs. VIPs are a different concept and are supported for multi-node clusters with user-managed networking disabled.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix grammar/markdown in “Unsupported Network Configuration” bullet.

There’s a duplicated word (“or or”) and an extra closing bold marker that will render incorrectly.

Apply:

-          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
+          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
**Mandatory Pre-Flight Checks for Cluster Creation**
* **Unsupported Network Configuration:** Before attempting to create a cluster, you **must first check** if the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers).
* **If an unsupported configuration is detected, you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
* **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
* **Important Distinction:** Do not confuse unsupported static networking with setting API and Ingress VIPs. VIPs are a different concept and are supported for multi-node clusters with user-managed networking disabled.
**Mandatory Pre-Flight Checks for Cluster Creation**
* **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
* **Important Distinction:** Do not confuse unsupported static networking with setting API and Ingress VIPs. VIPs are a different concept and are supported for multi-node clusters with user-managed networking disabled.
🤖 Prompt for AI Agents
In template.yaml around lines 249 to 252, the "Unsupported Network
Configuration" bullet contains a duplicated word ("or or") and an extra closing
bold marker that breaks markdown; remove the duplicate "or", fix punctuation and
bold markup so the sentence reads cleanly (e.g., list unsupported items as
"static networking, static IP addresses, or node-level network settings (like
MAC addresses, gateways, or DNS servers)" and remove the stray "**" after
"request"), preserving the instruction to refuse static networking and to direct
users to the assisted-installer web-based wizard.

@eranco74
Copy link
Collaborator Author

Adding eval tests.
Here are some improvements in the conversation flow following the changes in this PR:

Enter your query (or type 'exit' to quit): name: sno2
Our conversation ID: 2fa7223d-be6f-463e-a03b-32770c31de76
conversation_id: 2fa7223d-be6f-463e-a03b-32770c31de76
response: 'OK. I have the cluster name. I still need the OpenShift version and base
  domain. Also, let me know if you want to create a Single Node OpenShift (SNO) cluster.

  '
Enter your query (or type 'exit' to quit): use the latest stable one
Our conversation ID: 2fa7223d-be6f-463e-a03b-32770c31de76
conversation_id: 2fa7223d-be6f-463e-a03b-32770c31de76
response: 'OK, I see that the latest stable version is 4.19.10.


  I still need the base domain. Also, let me know if you want to create a Single Node
  OpenShift (SNO) cluster.

  '
----------

Enter your query (or type 'exit' to quit): I have booted the node
Our conversation ID: 2fa7223d-be6f-463e-a03b-32770c31de76
conversation_id: 2fa7223d-be6f-463e-a03b-32770c31de76
response: 'OK, I see the cluster "sno2" with ID "eedfff09-8182-4624-b558-9b4642ccd2cb"
  is in "pending-for-input" status.


  Since this is a Single Node OpenShift (SNO) cluster, the host will automatically
  be assigned the `master` role. No further manual role assignment is needed.


  I see that the cluster is waiting for input. This likely means that the host hasn''t
  been discovered yet or doesn''t meet the minimum hardware requirements.

template.yaml Outdated
---

**Proactive Information Retrieval:**
- Before asking the user for a parameter, such as a cluster ID or host ID, first attempt to find it using your available tools.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly able to get it to stop asking for cluster id in my other PR by changing this line: https://github.com/rh-ecosystem-edge/assisted-chat/pull/188/files#diff-1363ef5ce8886100842332c97163aad7934237e1fe49b5d40422b45fdc30f38eR303

The issue I was seeing was that it was forgetting what cluster it had been operating on in the prior turn so it wasn't as much a matter of using tools but using the conversation history.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.
This change should solve a diffrent issue, where the agent doesn't know the ID. e.g. First prompt is: "Give me the ISO for cluster named test".
Instead of the agent asking the ID from the user, it can list the clusters and find the ID

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I now notice the "* If the user specifies a cluster by name (not a UUID), map it to its cluster ID by internally listing the clusters..."
OK, I can remove this

@eranco74
Copy link
Collaborator Author

/hold
tests are WIP

@eranco74
Copy link
Collaborator Author

Hmm, while the resulting conversation flow seems fine, the actual tool calls are odd, the agent prefer to list all clusters instead of calling cluster_info with the specific cluster id.
Looking into it...

@eranco74
Copy link
Collaborator Author

/unhold

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 14, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Proactive Parameter Retrieval: The agent is now instructed to attempt finding necessary parameters (like cluster or host IDs) using its tools before asking the user, reducing conversational friction.
Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features

  • Proactive host discovery flow: shows discovered hosts and tailors next steps by cluster type (multi-node role options; SNO auto-assigns master and suggests next action).

  • Hardware requirement guidance added per cluster type.

  • Behavior Changes

  • Unconditional refusal for requests with unsupported network/node-level configurations; users directed to the web-based installer wizard.

  • Discovery ISO URLs omitted from responses.

  • Tests

  • Test data updated to use a unified cluster name, added host-booted-but-not-discovered and non-existent-cluster scenarios; placeholder replacement adjusted.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
template.yaml (1)

250-250: Fix duplicated word and stray bold marker in “Unsupported Network Configuration” bullet.

Removes “or or” and the extra “**”.

Apply:

-          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
+          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (6)
test/prow/entrypoint.sh (1)

23-23: Make sed replacement robust to special characters and quote the path.

If UNIQUE_ID contains sed metacharacters (/, &, |), the substitution may break. Also quote the YAML path.

Apply:

-sed -i "s/uniq-cluster-name/${UNIQUE_ID}/g" $TEST_DIR/eval_data.yaml
+safe_id=${UNIQUE_ID//&/\\&}
+safe_id=${safe_id//\//\\/}
+sed -i "s/uniq-cluster-name/${safe_id}/g" "$TEST_DIR/eval_data.yaml"
test/evals/eval_data.yaml (4)

121-129: Grammar fix in expected response (“hosts hasn’t” → “hosts haven’t”).

Minor, but it can trip strict judges.

Apply:

-      expected_response: "hosts hasn't been discovered yet."
+      expected_response: "hosts haven't been discovered yet."

Optionally switch this check to sub-string for robustness.


112-113: Fix duplicated words in SSH key response.

“The SSH public key is set for the cluster for cluster.”

Apply:

-      expected_response: The SSH public key is set for the cluster for cluster
+      expected_response: The SSH public key has been set for the cluster.

186-186: Typo in eval_id.

“create_single_node_cluser” → “create_single_node_cluster”.

Apply:

-    - eval_id: create_single_node_cluser
+    - eval_id: create_single_node_cluster

191-200: Consistency: assert name→ID resolution by adding cluster_info

test/evals/eval_data.yaml — eval_id: cluster_name_tool_call (lines ~190–195) currently only asserts list_clusters; add a second expected_tool_calls entry for cluster_info with a UUID regex for cluster_id to verify name→ID resolution.

template.yaml (1)

233-235: Clarify exception for internal list lookups to avoid unwanted list dumps.

“Direct Display of List Outputs” can conflict with name→ID resolution (agent ends up listing clusters instead of proceeding). Add an explicit exception for internal parameter resolution when an exact match exists; only display the list if multiple or zero exact matches.

Apply:

-      When a tool provides a list of items (e.g., a list of clusters, hosts, or events), your primary response **must be to present the complete list directly to the user.** Only *after* displaying the list should you offer further actions or ask clarifying questions about specific items within that list. Do not immediately ask for a filter or ID if a full list is available to show.
+      When a tool provides a list of items (e.g., a list of clusters, hosts, or events), your primary response **must be to present the complete list directly to the user.** Exception: when you call a list internally to resolve a user‑provided exact name to its ID, proceed silently if there is a single exact match. Only display the list if there are zero or multiple exact matches, or if the user explicitly asks to see the list. Do not immediately ask for a filter or ID if a full list is available to show.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa27998 and eb2ce8a.

📒 Files selected for processing (3)
  • template.yaml (2 hunks)
  • test/evals/eval_data.yaml (5 hunks)
  • test/prow/entrypoint.sh (1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml

[error] 149-149: duplication of key "description" in mapping

(key-duplicates)

🔇 Additional comments (1)
template.yaml (1)

260-265: Host-booted flow additions look good.

Clear sequencing and role guidance for multi-node vs SNO.

@eranco74
Copy link
Collaborator Author

Error from server (InternalError): error when creating "STDIN": Internal error occurred: admission plugin "image.openshift.io/ImagePolicy" failed to complete mutation in 13s

/retest

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 14, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Proactive Parameter Retrieval: The agent is now instructed to attempt finding necessary parameters (like cluster or host IDs) using its tools before asking the user, reducing conversational friction.
Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features
  • Proactive host discovery flow: list discovered hosts, offer automatic/manual role assignment; SNO auto-assigns master. Added host event monitoring.
  • Improvements
  • Stricter pre-flight validation: requests with unsupported static networking/IP/MAC/gateway/DNS are rejected and redirected to the assisted-installer wizard.
  • Clarified hardware requirements by cluster type (multi-node vs. SNO).
  • Tests
  • Standardized cluster name in scenarios; added “hosts booted but not discovered” and non-existent cluster cases.
  • Chores
  • Updated CI placeholder substitution for evaluation data.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
test/evals/eval_data.yaml (2)

77-77: Fix base domain mismatch in expected_response (example.com vs test.local).

The eval_query and expected_tool_calls use base_domain example.com, but expected_response says test.local, which will cause accuracy checks to fail.

-      expected_response: I have created a cluster with name eval-test-singlenode-uniq-cluster-name, OpenShift version 4.19.7, base domain test.local. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?
+      expected_response: I have created a cluster with name eval-test-singlenode-uniq-cluster-name, OpenShift version 4.19.7, base domain example.com. Next, you'll need to download the Discovery ISO, then boot your hosts with it. Would you like me to get the Discovery ISO download URL?

142-150: Remove duplicate “description” key to satisfy YAML spec and YAMLlint.

Two description keys exist in the same mapping; keep one and convert the second to a comment (or split into a separate eval).

     - eval_id: cluster_info_tool_call
-      description: Test handling for non-existent cluster name
+      description: Test handling for non-existent cluster name
       eval_query: Give me details about cluster named 'abc123'
       eval_types: [tool_eval, response_eval:accuracy]
       expected_response: Retrieval failed for cluster 'abc123' because the resource was not found.
       expected_tool_calls:
         - - tool_name: list_clusters
             arguments: {}
-      description: Test error handling for non-existent cluster ID
+      # Note: Test error handling for non-existent cluster ID

Run this to catch duplicate keys and lint errors:

#!/bin/bash
set -euo pipefail
pip install --quiet yamllint >/dev/null 2>&1 || true
yamllint -d "{extends: default, rules: {line-length: disable}}" test/evals/eval_data.yaml || true
template.yaml (1)

249-252: Fix duplicated word and stray bold marker in “Unsupported Network Configuration”.

Clean up “or or” and the extra closing **.

-          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
+          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (6)
test/evals/eval_data.yaml (3)

108-113: Fix duplicate phrase in SSH key confirmation message.

Message reads “for the cluster for cluster”.

-      expected_response: The SSH public key is set for the cluster for cluster
+      expected_response: The SSH public key is set for the cluster.

121-129: Grammar and brittleness in “hosts not discovered” check.

  • Grammar: “hosts hasn't been discovered yet.” → “Hosts haven't been discovered yet.”
  • Suggest using sub-string evaluation to avoid exact-string fragility.
-      eval_types: [tool_eval, response_eval:accuracy]
+      eval_types: [tool_eval, response_eval:sub-string]
@@
-      expected_response: "hosts hasn't been discovered yet."
+      expected_keywords: ["Hosts", "discovered"]

If you prefer accuracy matching, update the agent text and use: “Hosts haven't been discovered yet.”


185-190: Correct eval_id typo and mismatch.

eval_id says “single_node cluser” while the eval creates a multi-node cluster; also “cluser” is misspelled.

-    - eval_id: create_single_node_cluser
+    - eval_id: create_multinode_cluster
template.yaml (3)

284-286: Minor grammar fixes for VIP guidance.

-          * Cluster with user-managed networking enabled don't need to **set API and Ingress VIPs**.
+          * Clusters with user-managed networking enabled don't need to **set API and Ingress VIPs**.

208-210: Terminology nit: “on-premises” (not “on-premise”).

-      - Supported: On-premise OpenShift installs via Assisted Installer on baremetal hosts or VMs (e.g., vSphere, KVM, libvirt).
+      - Supported: On-premises OpenShift installs via Assisted Installer on bare-metal hosts or VMs (e.g., vSphere, KVM, libvirt).

312-313: Confirm intent: auto-mapping name→ID by listing clusters vs relying on conversation state.

This instruction can cause unnecessary list operations when the agent already knows the ID. Consider:

  • First attempt: use known/last-used cluster context from conversation.
  • Fallback: list clusters and map exact name.

Do you want me to propose wording that prioritizes the conversation state first?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb2ce8a and 7fe6cc0.

📒 Files selected for processing (3)
  • template.yaml (2 hunks)
  • test/evals/eval_data.yaml (5 hunks)
  • test/prow/entrypoint.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/prow/entrypoint.sh
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml

[error] 149-149: duplication of key "description" in mapping

(key-duplicates)

🔇 Additional comments (3)
test/evals/eval_data.yaml (1)

91-103: Consistency check: multi-node creation flow looks coherent.

Name, version, base_domain, single_node=false, and follow-up ISO step align with goals. No action needed.

Please confirm the system prompt also avoids exposing the ISO URL (matches tests’ omission of URLs).

template.yaml (2)

260-265: Host discovery guidance upgrade LGTM.

Clear sequencing (check hosts and status → list hosts → role assignment by cluster type) aligns with PR goals.


131-136: Do not rename LIGHTSSPEED in-place — add an alias or update all callers

LIGHTSSPEED_STACK_POSTGRES_SSL_MODE is referenced in:

  • template.yaml (lines ~131 and 198)
  • template-params.dev.env (line 8)
  • scripts/deploy_template.sh (line 82)

Options:

  • If breaking change accepted: rename to LIGHTSPEED_STACK_POSTGRES_SSL_MODE and update the files above and any external consumers.
  • If backward compatibility required: introduce LIGHTSPEED_STACK_POSTGRES_SSL_MODE and keep LIGHTSSPEED_STACK_POSTGRES_SSL_MODE as a fallback/alias (map new→old or add a fallback) so consumers don’t break.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 14, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Proactive Parameter Retrieval: The agent is now instructed to attempt finding necessary parameters (like cluster or host IDs) using its tools before asking the user, reducing conversational friction.
Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features
  • Proactive, multi-step host discovery flow with a full hosts list and role assignment options.
  • Option to monitor host events.
  • Improvements
  • Pre-flight checks now block unsupported static networking/IP/MAC/gateway/DNS requests and direct users to the assisted-installer wizard.
  • Clarified hardware requirements by cluster type (multi-node and SNO), noting operators may increase needs.
  • Tests
  • Expanded evaluations for host-booted-not-discovered and non-existent clusters; standardized to uniq-cluster-name.
  • Chores
  • CI updates to use unique IDs for jobs/pods and data substitution.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/ci_test.sh (1)

40-45: Fix label mismatch and make pod selection robust (handle 0/≥2 pods).

  • Error message mentions label app=assisted-chat-eval-test, but selection uses job-name=...; align the message.
  • Indexing .items[0] is brittle when there are 0 or multiple pods (e.g., retries). Prefer sorting by creation time and taking the latest; also guard against errexit on empty results.
- POD_NAME=$(oc get pods -n "$NAMESPACE" -l job-name="${JOB_NAME}-${UNIQUE_ID}" -o jsonpath='{.items[0].metadata.name}')
+ POD_NAME=$(oc get pods -n "$NAMESPACE" -l job-name="${JOB_NAME}-${UNIQUE_ID}" \
+   --sort-by=.metadata.creationTimestamp -o name 2>/dev/null | tail -n1 | cut -d/ -f2 || true)
 if [[ -z "${POD_NAME}" ]]; then
-    echo "No pod found with label app=assisted-chat-eval-test in namespace ${NAMESPACE}"
+    echo "No pod found with label job-name=${JOB_NAME}-${UNIQUE_ID} in namespace ${NAMESPACE}"
     oc get pods -n "$NAMESPACE"
     exit 1
 fi
🧹 Nitpick comments (2)
scripts/ci_test.sh (2)

60-66: Terminology fix: it’s the Job that failed, not necessarily the Pod.

Message currently says “Pod … is Failed”. Clarify to “Job … failed” to avoid confusion when pods restart or complete.

-        echo "Pod ${POD_NAME} is Failed, exiting"
+        echo "Job ${JOB_NAME}-${UNIQUE_ID} failed, exiting"

14-19: Typos in user-facing logs.

  • “proided” → “provided”
  • “provieded” → “provided”
  • “PLease” → “Please”
  • “were successful” (grammar)
-    echo "The variable ASSISTED_CHAT_TEST was proided with the value ${ASSISTED_CHAT_TEST}, using it to create the IMAGE and TAG variables for the template"
+    echo "The variable ASSISTED_CHAT_TEST was provided with the value ${ASSISTED_CHAT_TEST}, using it to create the IMAGE and TAG variables for the template"
-    echo "The variable ASSISTED_CHAT_TEST was not provieded, downloading the latest image from ${IMAGE}"
+    echo "The variable ASSISTED_CHAT_TEST was not provided, downloading the latest image from ${IMAGE}"
-echo "Timeout reached. Pod $POD_NAME did not become ready in time. PLease check the logs of the pods under the directory artifacts/eval-test/gather-extra/artifacts/pods/."
+echo "Timeout reached. Pod $POD_NAME did not become ready in time. Please check the pod logs under artifacts/eval-test/gather-extra/artifacts/pods/."

Also applies to: 74-75

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe6cc0 and 044a747.

📒 Files selected for processing (1)
  • scripts/ci_test.sh (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
  • GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (1)
scripts/ci_test.sh (1)

21-30: Secret bootstrap is fine; verify required env vars are always set in CI.

NAMESPACE and SHARED_DIR are required under nounset; confirm CI guarantees both. If not, set sane defaults or fail fast with a clearer message.

Would you like me to add a short preflight to assert these are set and print them for debugging?

Comment on lines 50 to 56
while [ $ELAPSED -lt $TIMEOUT ]; do
# Check if the pod's status is "Running"
JOB_SUCCEEDED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null)
JOB_FAILED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null)
JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null)
JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null)

if [[ "$JOB_SUCCEEDED" -gt 0 ]]; then
echo "The evaluation test were successful. The logs of the tests are stored in the directory artifacts/eval-test/gather-extra/artifacts/pods/ in the logs of the pod ${POD_NAME}."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Harden job status checks against empty jsonpath outputs; fix misleading comment.

  • When .status.succeeded/failed are absent, jsonpath returns empty; numeric compare can misbehave. Default to 0 before comparisons. Also, the comment says “pod’s status is Running” but the code checks the Job status; update the comment.
 while [ $ELAPSED -lt $TIMEOUT ]; do
-    # Check if the pod's status is "Running"
-    JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null)
-    JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null)
+    # Check Job status (Complete/Failed)
+    JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null || true)
+    JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null || true)
+    JOB_SUCCEEDED=${JOB_SUCCEEDED:-0}
+    JOB_FAILED=${JOB_FAILED:-0}

Optional simplification (reduces polling): replace the loop with oc wait:

-# existing loop…
+if oc wait -n "$NAMESPACE" --for=condition=complete "job/${JOB_NAME}-${UNIQUE_ID}" --timeout=600s; then
+  echo "The evaluation tests were successful. Logs in artifacts/eval-test/gather-extra/artifacts/pods/ for pod ${POD_NAME}."
+  exit 0
+fi
+# If not complete in time, surface failure/events below as today.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while [ $ELAPSED -lt $TIMEOUT ]; do
# Check if the pod's status is "Running"
JOB_SUCCEEDED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null)
JOB_FAILED=$(oc get job "$JOB_NAME" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null)
JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null)
JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null)
if [[ "$JOB_SUCCEEDED" -gt 0 ]]; then
echo "The evaluation test were successful. The logs of the tests are stored in the directory artifacts/eval-test/gather-extra/artifacts/pods/ in the logs of the pod ${POD_NAME}."
while [ $ELAPSED -lt $TIMEOUT ]; do
# Check Job status (Complete/Failed)
JOB_SUCCEEDED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.succeeded}' 2>/dev/null || true)
JOB_FAILED=$(oc get job "${JOB_NAME}-${UNIQUE_ID}" -n "$NAMESPACE" -o=jsonpath='{.status.failed}' 2>/dev/null || true)
JOB_SUCCEEDED=${JOB_SUCCEEDED:-0}
JOB_FAILED=${JOB_FAILED:-0}
if [[ "$JOB_SUCCEEDED" -gt 0 ]]; then
echo "The evaluation test were successful. The logs of the tests are stored in the directory artifacts/eval-test/gather-extra/artifacts/pods/ in the logs of the pod ${POD_NAME}."
🤖 Prompt for AI Agents
In scripts/ci_test.sh around lines 50 to 56, the loop comment incorrectly refers
to a pod status and the numeric comparisons can fail when oc jsonpath returns
empty strings for .status.succeeded/.status.failed; change the comment to
reference the Job status, normalize empty jsonpath outputs to 0 before numeric
comparison (e.g., fallback to 0 via parameter expansion or an explicit test) so
JOB_SUCCEEDED and JOB_FAILED are always numeric, and keep the existing branch
logic; optionally replace the entire polling loop with oc wait on the Job to
simplify and reduce polling.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 14, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features
  • Proactive, multi-step host discovery flow with a full hosts list and role assignment options.
  • Option to monitor host events.
  • Improvements
  • Pre-flight checks now block unsupported static networking/IP/MAC/gateway/DNS requests and direct users to the assisted-installer wizard.
  • Clarified hardware requirements by cluster type (multi-node and SNO), noting operators may increase needs.
  • Tests
  • Expanded evaluations for host-booted-not-discovered and non-existent clusters; standardized to uniq-cluster-name.
  • Chores
  • CI updates to use unique IDs for jobs/pods and data substitution.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 14, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features

  • Streamlined pre-flight checks: requests with unsupported static networking settings are now declined with guidance to the assisted installer.

  • Proactive host discovery flow with status checks, full host list display, and role assignment guidance for multi-node and Single Node OpenShift.

  • Option to monitor host events.

  • Updated hardware requirements by cluster type, with notes on operator impact.

  • Tests

  • Standardized cluster naming in evaluations and expanded scenarios, including host-booted-without-discovery and non-existent cluster handling.

  • Chores

  • CI scripts now use unique IDs to avoid job and pod collisions.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
test/evals/eval_data.yaml (2)

64-76: Base-domain mismatch fix looks good.

example.com is now consistent across query, tool args, and expected response.


141-149: YAML parse error: duplicate “description” key in same mapping.

This is a hard failure in YAML parsers; remove the second key or split into two evals.

     - eval_id: cluster_info_tool_call
-      description: Test handling for non-existent cluster name
+      description: Test handling for non-existent cluster name
       eval_query: Give me details about cluster named 'abc123'
       eval_types: [tool_eval, response_eval:accuracy]
       expected_response: Retrieval failed for cluster 'abc123' because the resource was not found.
       expected_tool_calls:
         - - tool_name: list_clusters
             arguments: {}
-      description: Test error handling for non-existent cluster ID
+      # Note: Test error handling for non-existent cluster ID (moved to a separate eval or comment)

Alternatively, create a second eval entry for the ID case.

🧹 Nitpick comments (3)
test/evals/eval_data.yaml (3)

33-33: Tightening eval to accuracy-only may be brittle.

Relying on exact-string accuracy can flap on minor punctuation/spacing. Consider adding sub-string keywords for resilience or keep both accuracy + sub-string.

-      eval_types: [response_eval:accuracy, tool_eval]
+      eval_types: [response_eval:accuracy, response_eval:sub-string, tool_eval]
+      expected_keywords: ["OpenShift AI", "OpenShift Virtualization"]

111-111: Fix duplicated wording in expected response.

"The SSH public key is set for the cluster for cluster" has a repeated phrase.

-      expected_response: The SSH public key is set for the cluster for cluster
+      expected_response: The SSH public key is set for the cluster

184-189: Eval ID/name mismatch and typo.

eval_id says “single_node cluser” while the query creates a multi-node cluster; also “cluser” is misspelled.

-    - eval_id: create_single_node_cluser
+    - eval_id: create_multinode_cluster
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 044a747 and 78db8f1.

📒 Files selected for processing (1)
  • test/evals/eval_data.yaml (6 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
test/evals/eval_data.yaml

[error] 148-148: duplication of key "description" in mapping

(key-duplicates)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
  • GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (3)
test/evals/eval_data.yaml (3)

78-84: Good: ID reuse path for ISO retrieval.

This encourages resolving the cluster ID once and reusing it.


91-102: Multinode flow updates LGTM.

Names, version, and keywords align; keeps ISO mention without leaking the URL.


113-120: ISO retrieval step for multinode: OK.

Tool call and keywords are appropriate; no URL disclosure.

Comment on lines +120 to +127
- eval_id: host_booted_but_not_discovered
eval_query: I booted the hosts
eval_types: [tool_eval, response_eval:accuracy]
expected_tool_calls:
- - tool_name: cluster_info
arguments:
cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
expected_response: "hosts hasn't been discovered yet."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Make the “hosts not discovered” assertion less brittle (and fix grammar).

Exact-phrase accuracy on a short sentence will be fragile; also grammar should be “haven’t.”

-      eval_types: [tool_eval, response_eval:accuracy]
+      eval_types: [tool_eval, response_eval:sub-string]
       expected_tool_calls:
         - - tool_name: cluster_info
             arguments:
               cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
-      expected_response: "hosts hasn't been discovered yet."
+      expected_keywords: ["no hosts discovered", "booted", "waiting"]

If you prefer accuracy, then at least:

-      expected_response: "hosts hasn't been discovered yet."
+      expected_response: "hosts haven't been discovered yet."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- eval_id: host_booted_but_not_discovered
eval_query: I booted the hosts
eval_types: [tool_eval, response_eval:accuracy]
expected_tool_calls:
- - tool_name: cluster_info
arguments:
cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
expected_response: "hosts hasn't been discovered yet."
- eval_id: host_booted_but_not_discovered
eval_query: I booted the hosts
eval_types: [tool_eval, response_eval:sub-string]
expected_tool_calls:
- - tool_name: cluster_info
arguments:
cluster_id: "[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}"
expected_keywords: ["no hosts discovered", "booted", "waiting"]
🤖 Prompt for AI Agents
In test/evals/eval_data.yaml around lines 120-127, the expected_response is
brittle and grammatically incorrect ("hosts hasn't been discovered yet.");
change it to a less fragile assertion and correct the grammar by either
replacing the exact string with a substring/regex match (e.g. a case-insensitive
pattern like "haven't been discovered" or "haven't been discovered yet") or
update the literal to the corrected sentence "hosts haven't been discovered
yet." Ensure the test harness uses substring or regex matching rather than
exact-phrase equality so short response variations don't break the eval.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 14, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features

  • Proactive host discovery flow with step-by-step guidance.

  • Role assignment options: automatic/manual for multi-node; automatic master for SNO.

  • Ability to monitor host events.

  • Strict pre-flight validation: unsupported static networking settings are rejected with guidance to the installer wizard.

  • Documentation

  • Expanded hardware requirements by cluster type (multi-node and SNO), noting potential increases with operators.

  • Tests

  • Added scenarios for hosts booted but not yet discovered.

  • Improved handling and messaging for non-existent cluster lookups.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
template.yaml (1)

250-251: Fix typo and stray bold marker in “Unsupported Network Configuration” bullet.

Removes duplicated “or” and the extra trailing “**”.

-          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
+          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (1)
template.yaml (1)

260-265: Clarify SNO multi-host discovery edge case.

Add guidance when >1 host is discovered for SNO; optional terminology tweak.

           * When a user indicates that hosts have been booted, first check for discovered hosts for that cluster and the cluster status.
           * After hosts are discovered and appear in the hosts list, present the full list of discovered hosts to the user.
           * Proactively offer the next steps based on the cluster type:
               * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer to help with **manual role assignment** to a specific host (e.g., master, worker).
-              * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.
+              * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` (control-plane) role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.
+              * If more than one host is discovered for a SNO cluster, explain that only a single host is supported and offer to either convert to a multi-node cluster or remove the extra hosts.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 78db8f1 and 65a0ddf.

📒 Files selected for processing (2)
  • template.yaml (2 hunks)
  • test/evals/eval_data.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/evals/eval_data.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
  • GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request

Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Signed-off-by: Eran Cohen <eranco@redhat.com>
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 15, 2025

@eranco74: This pull request references MGMT-21690 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.21.0" version, but no target version was set.

Details

In response to this:

Reactive Network Configuration Checks: The "static networking" check is changed from a proactive query to a reactive one. The chatbot will now only act on unsupported network configurations if the user mentions them, preventing unnecessary questions.
Enhanced Host Discovery and Role Assignment: The chatbot will now present a list of discovered hosts and offer context-aware next steps.

Summary by CodeRabbit

  • New Features

  • Enforced pre-flight validation: requests with unsupported networking are rejected with guidance to use the wizard.

  • Multi-step host discovery flow with cluster status checks and discovered-hosts list.

  • Contextual role assignment: auto/manual for multi-node; automatic master for SNO.

  • Option to monitor host events.

  • Documentation

  • Expanded hardware requirements by cluster type (multi-node masters/workers, SNO) with note that operators may increase needs.

  • Tests

  • Added eval for “hosts booted but not discovered.”

  • Updated cluster retrieval test expectations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
template.yaml (1)

250-251: Fix duplicated word and stray bold marker in “Unsupported Network Configuration”.

There’s a duplicated “or” and an extra closing ** that will break markdown rendering.

-          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or or DNS servers), you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
+          * **Unsupported Network Configuration:** If the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers), you must refuse the entire request. Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
🧹 Nitpick comments (1)
template.yaml (1)

260-265: Clarify status examples and align role terminology; keep SNO guidance consistent.

Minor copy edits to reduce ambiguity and align with inclusive/control‑plane terminology.

-          * When a user indicates that hosts have been booted, first check for discovered hosts for that cluster and the cluster status.
+          * When a user indicates that hosts have been booted, first check for discovered hosts for that cluster and the cluster status (e.g., "pending-for-input", "insufficient", "ready").
@@
-              * **For a multi-node cluster:** Inform the user that roles can be automatically assigned or they can manually assign them. Offer to help with **manual role assignment** to a specific host (e.g., master, worker).
-              * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `master` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.
+              * **For a multi-node cluster:** Inform the user that roles can be assigned automatically or manually. Offer to help with **manual role assignment** to a specific host (e.g., control plane (master), worker).
+              * **For a Single Node OpenShift (SNO) cluster:** Inform the user that the host is automatically assigned the `control plane (master)` role and no further manual role assignment is needed. Propose the next logical step, such as initiating the installation.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65a0ddf and 5059678.

📒 Files selected for processing (2)
  • template.yaml (2 hunks)
  • test/evals/eval_data.yaml (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/evals/eval_data.yaml

@maorfr
Copy link
Collaborator

maorfr commented Sep 15, 2025

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants