Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions assisted-chat-pod.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@ kind: Pod
metadata:
name: assisted-chat-pod
spec:
dnsPolicy: "None"
dnsConfig:
nameservers:
- "8.8.8.8"
- "8.8.4.4"
Comment on lines +6 to +10
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Hardcoding external DNS breaks in-cluster name resolution.

Setting dnsPolicy: "None" with Google DNS disables cluster DNS, so Pod code cannot resolve in-cluster names (Services, pods). This can break calls like http://lightspeed-stack:8090/ and other cluster lookups, and may violate restricted-network policies.

Prefer default cluster DNS, or make external resolvers optional/parametrized. Minimal safe fix:

-  dnsPolicy: "None"
-  dnsConfig:
-    nameservers:
-      - "8.8.8.8"
-      - "8.8.4.4"
+  dnsPolicy: ClusterFirst

If you truly need public resolvers, consider parameterizing them and appending via dnsConfig.options while keeping cluster DNS (and document the trade-offs).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dnsPolicy: "None"
dnsConfig:
nameservers:
- "8.8.8.8"
- "8.8.4.4"
dnsPolicy: ClusterFirst
🤖 Prompt for AI Agents
In assisted-chat-pod.yaml around lines 6 to 10, the pod is forcing dnsPolicy:
"None" and hardcoding public nameservers which disables cluster DNS and breaks
in-cluster name resolution; change to use the cluster DNS (remove dnsPolicy or
set dnsPolicy: ClusterFirst) and make external resolvers optional/parameterized
instead of hardcoding them, or if public resolvers are required, append them via
dnsConfig.options while keeping ClusterFirst so cluster service/pod names still
resolve and document the trade-offs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point. Will this break dns resolution to the mcp server?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing for the UI which has:

        - name: AIUI_CHAT_API_URL
          value: http://lightspeed-stack:8090/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to break anything:

Enter your query (or type 'exit' to quit): list my clusters
Our conversation ID: 1e5decc5-ad48-4448-87eb-a41f7c153099
conversation_id: 1e5decc5-ad48-4448-87eb-a41f7c153099
response: 'Here are your clusters:


  *   `test1` (ID: `7707ed0e-8a28-43a0-b0c6-a0e812cb055e`, OpenShift version: `4.20.0-ec.6`,
  status: `pending-for-input`)

  *   `eran` (ID: `3ec3d3f0-66e2-4277-8c8f-099fb5370d53`, OpenShift version: `4.19.9`,
  status: `pending-for-input`)

  *   `demo` (ID: `6e294cd4-ef3c-472a-8326-64c69b53e06d`, OpenShift version: `4.19.9`,
  status: `pending-for-input`)

  *   `itzik` (ID: `118be539-8302-466f-87ab-dd8731d6e9e1`, OpenShift version: `4.19.7`,
  status: `pending-for-input`)


  What would you like to do next? I can provide more information about a specific
  cluster, or we can proceed with the installation of one of these clusters.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good point though. I asked Gemini how it works:

How It Works
Podman's Internal DNS: When you run podman play kube, it creates a network namespace for the entire Pod. Within this namespace, Podman sets up a small DNS resolver. This resolver is specifically designed to handle the hostnames of containers within the same Pod. So, when the ui container tries to resolve lightspeed-stack, Podman's internal DNS immediately provides the correct internal IP address (typically 127.0.0.1 or the Pod's loopback address, as all containers in a Pod share the same network stack).

DNS Hierarchy: The dnsConfig you added doesn't completely replace Podman's internal DNS functionality; instead, it is appended to the list of resolvers that a container will try to use. The resolution process typically works like this:

A container tries to resolve a name, like lightspeed-stack.

The request first goes to the Pod's internal DNS resolver.

Since lightspeed-stack is a known hostname within the Pod, the internal resolver immediately returns the correct IP address (127.0.0.1), and the connection is successful.

Fallback to External DNS: If a container tries to resolve an external hostname, such as generativelanguage.googleapis.com:

The request goes to the Pod's internal DNS resolver first.

The internal resolver doesn't know about this external domain.

The request then falls back to the external DNS servers you specified in dnsConfig (8.8.8.8 and 8.8.4.4).

These external servers can successfully resolve generativelanguage.googleapis.com to its public IP address, allowing the container to make the external connection.

containers:
- name: lightspeed-stack
image: ${LIGHTSPEED_STACK_IMAGE_OVERRIDE}
Expand Down
4 changes: 4 additions & 0 deletions scripts/generate.sh
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ if [[ ! -f "$PROJECT_ROOT/.env" ]]; then
echo "Exiting. You can copy .env.template to .env and fill it in manually."
exit 1
fi
else
echo "The .env file already exists. Skipping interactive configuration."
fi

source "$PROJECT_ROOT/.env"
Expand All @@ -83,11 +85,13 @@ if [[ -f $OVERRIDE_FILE ]]; then
OVERRIDE_PARAMS="--param-file=$OVERRIDE_FILE"
fi

echo "Generating $PROJECT_ROOT/config/lightspeed-stack.yaml"
oc process --local \
-f "$PROJECT_ROOT/template.yaml" \
"${OVERRIDE_PARAMS-}" \
--param-file="$PROJECT_ROOT/template-params.dev.env" |
yq '.items[] | select(.kind == "ConfigMap" and .metadata.name == "lightspeed-stack-config").data."lightspeed-stack.yaml"' -r \
>"$PROJECT_ROOT/config/lightspeed-stack.yaml"

echo "Generating $PROJECT_ROOT/config/systemprompt.txt"
yq -r '.objects[] | select(.metadata.name == "lightspeed-stack-config") | .data.system_prompt' "$PROJECT_ROOT/template.yaml" >"$PROJECT_ROOT/config/systemprompt.txt"
7 changes: 7 additions & 0 deletions template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,11 @@ objects:
* Prompt for necessary details like **cluster name**, **OpenShift version**, **base domain**, and whether it's a **single-node cluster**.
* Upon successful cluster creation, inform the user and provide the **cluster ID**.

**Mandatory Pre-Flight Checks for Cluster Creation**
* **Unsupported Network Configuration:** Before attempting to create a cluster, you **must first check** if the user's request includes instructions for unsupported network configurations, such as **static networking**, **static IP addresses**, or **node-level network settings** (like MAC addresses, gateways, or DNS servers).
* **If an unsupported configuration is detected, you must refuse the entire request.** Do not proceed with cluster creation. Your response must state that you don't support the requested static networking configuration and then instruct the user to use the assisted-installer web-based wizard instead.
* **Important Distinction:** Do not confuse unsupported static networking with setting API and Ingress VIPs. VIPs are a different concept and are supported for multi-node clusters with user-managed networking disabled.

2. **Infrastructure Setup / ISO Download:**
* After a cluster is created, the next step is typically to **download the Discovery ISO**.
* Proactively offer to provide the Discovery ISO.
Expand Down Expand Up @@ -302,6 +307,8 @@ objects:
* If the user deviates from the standard flow, adapt your suggestions to their current request while still being ready to guide them back to the installation path.
* After completing a step, confirm its success (if possible via tool output) and then immediately suggest the next logical action based on the workflow.
* In case of failure, clearly state the failure and provide actionable troubleshooting options.

---
- apiVersion: v1
kind: ConfigMap
metadata:
Expand Down
7 changes: 7 additions & 0 deletions test/evals/eval_data.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@
eval_types: [response_eval:accuracy]
expected_response: "The operators that can be installed onto clusters are OpenShift AI and OpenShift Virtualization."

- conversation_group: static_networking_support_conv
conversation:
- eval_id: static_networking_support
eval_query: Create a cluster with static networking
eval_types: [response_eval:sub-string]
expected_keywords: ["I do not support creating clusters with static networking", "Please use the assisted-installer web-based wizard"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? Does it require that one of the entire quoted strings is present? That seems fragile.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, and it's not fragile because it's instructed to reply with these strings in the system prompt

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still it might be better to use accuracy test here, it's more flexible. sub-string eval looks for exact matches and is not really forgiving.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I started with an accuracy test, and switched to sub-string because:

  1. it's simple and doesn't require LLM.
  2. I want to know that we get this exact output.
    For instance, I don't want the chat response to say:
    "Would you like me to help you create a cluster using the assisted-installer wizard" which might pass the accuracy test.


- conversation_group: sno_requirements_conv
conversation:
- eval_id: sno_requirements
Expand Down