Fix open webui inference fetch

nwthomas · nwthomas · commit c82d60349bed · 2025-11-29T12:50:40.000-08:00
diff --git a/README.md b/README.md
@@ -1,16 +1,16 @@
-> NOTE: This repository is currently in a state of flux as I finalize details of my cluster and slowly both learn and also move to different architectural patterns. In particular, the Helm and Terraform files will likely be drastically updated later as I migrate files and (eventually) bring [Atlantis](https://www.runatlantis.io) online for applying Terraform changes.
+> NOTE: Terraform files will be drastically updated later as [Atlantis](https://www.runatlantis.io) is brought online.
 
-# GitOps
+# GitOps 🛠️
 
-## 🔎 About
+## About
 
 This repository contains ArgoCD, Helm, and Terraform files for declarative deployments with [Kubernetes](https://kubernetes.io/), specifically [k3s](https://k3s.io/).
 
 You can use these files to stand up your own on-prem Kubernetes cluster. While this repository was built to be run on Raspberry Pi devices, it should be equally valid anywhere Kubernetes can run.
 
 If you want to implement this for yourself, please follow the [setup document](./docs/SETUP.md) (which is actively being updated).
 
-## 🎖️ Features
+## Features
 
 - App-of-apps: A root Argo CD Application deployment schema which recursively manages child apps
 - Namespace deployments: `argocd`, `cert-manager`, `kube-system`, `logging`, `longhorn-system`, `monitoring`, and `applications-eng`
@@ -21,17 +21,18 @@ If you want to implement this for yourself, please follow the [setup document](.
 - n8n: Workflow automation platform with persistent storage
 - vLLM: Runtime for AI models on a GPU node
 - Dashboard UI for:
-  - Argo CD: For controlling deployments and rollbacks
-  - Grafana: For building dashboards against Prometheus data
-  - Longhorn: For controlling the distributed block storage setup
-  - n8n: For creating and managing automated workflows
-  - Prometheus: For querying against raw data from pods/nodes/deployment resources
+  - Argo CD: Controlling deployments and rollbacks
+  - Grafana: Building dashboards against Prometheus data
+  - Longhorn: Controlling the distributed block storage setup
+  - n8n: Creating and managing automated workflows
+  - Open WebUI: A ChatGPT-like interface paired with the vLLM deployment for inference
+  - Prometheus: Querying against raw data from pods/nodes/deployment resources
 
-## 🧱 Project Management
+## Project Management
 
 Work for this repository is housed in this [Trello board](https://trello.com/b/HOJMq7WP/gitops).
 
-## 📁 Project Structure
+## Project Structure
 
 ```bash
 ├── argocd/                                      # ArgoCD application definitions
@@ -52,6 +53,7 @@ Work for this repository is housed in this [Trello board](https://trello.com/b/H
 │   ├── longhorn/                                #
 │   ├── n8n/                                     #
 │   ├── nvidia-device-plugin/                    #
+│   ├── open-webui/                              #
 │   ├── prometheus/                              #
 │   ├── prometheus-operator/                     #
 │   ├── prometheus-service-monitors/             #
@@ -62,46 +64,57 @@ Work for this repository is housed in this [Trello board](https://trello.com/b/H
     └── storage-classes.tf                       # Longhorn storage class definitions
 ```
 
-## 🛠️ Built With
+## Built With
 
 ### Hardware
 
 The cluster this repo's files runs on uses Raspberry Pi 5 devices, specifically the 16gb version.
 
 Here's the hardware list of what each of the control/worker nodes is using:
 
-1. [Raspberry Pi 5](https://www.amazon.com/dp/B0DSPYPKRG)
-2. [NVMe + POE+ Pi 5 Hat and Active Cooler](https://www.amazon.com/dp/B0D8JC3MXQ)
-3. [Samsung 2TB NVMe SSD](https://www.amazon.com/dp/B0DHLCRF91)
-4. [256gb Micro SD Card](https://www.amazon.com/dp/B08TJZDJ4D)
+1. [Raspberry Pi 5](https://amzn.to/4ps5tiR)
+2. [NVMe + POE+ Pi 5 Hat and Active Cooler](https://amzn.to/49HdXNT)
+3. [Samsung 2TB NVMe SSD](https://amzn.to/4onuB8Q)
+4. [256gb Micro SD Card](https://amzn.to/3MtUpCU)
 
-> It's worth noting that one of my nodes is a computer running Ubuntu with a nice GPU, but that's really outside the scope of any guides I'd give for deploying this repository. The only part of this that will impact you is any apps that have node affinity for that setup (like the `nvidia-device-plugin-app` and `vllm-app` deployments), but you can easily remove that from your own deployments.
->
-> The rest of the nodes are Raspberry Pi 5s as described above.
+The GPU node I am running for model inference is quite different and uses the following hardware:
+
+1. [Ncase M3 Case](https://ncased.com/products/m3-round?srsltid=AfmBOoqQs1S0VUqh8MdMqWYxuy4zGDsMXjRNd5H4PEKTIi_6S1WMy2WY)
+2. [MSI B650I Edge Wifi Motherboard](https://amzn.to/4rqMshN)
+3. [AMD 9800x3D CPU](https://amzn.to/3K04gQk)
+4. [128gb DDR5 Corsair RAM](https://amzn.to/489KtXV)
+5. [8TB Western Digital NVMe SSD](https://amzn.to/49OdUju)
+6. [Nvidia RTX 6000 Pro Workstation GPU](https://amzn.to/4amKZTJ)
+7. [Corsair SF1000 PSU](https://amzn.to/4okJn05)
+8. [NZXT Kraken Elite 280mm AIO](https://amzn.to/4okXBxZ)
+9. [Noctua 120mm Fans](https://amzn.to/4okXBxZ)
+
+I built it to be beefy enough to handle inference but also lightweight enough for me to unplug, take with me while traveling, and use as a personal computer.
 
 ### Software
 
-- [Argo CD](https://argo-cd.readthedocs.io/en/stable/)
-- [Cert Manager](https://cert-manager.io/)
-- [Grafana](https://grafana.com/)
-- [Grafana Loki](https://grafana.com/docs/loki/latest/)
-- [Grafana Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/)
-- [Helm](https://helm.sh/docs/)
-- [Kubernetes](https://kubernetes.io/), specifically [K3s](https://k3s.io/)
-- [Longhorn](https://longhorn.io/)
-- [Metal LB](https://metallb.io/)
-- [n8n](https://n8n.io/)
+- [Argo CD](https://argo-cd.readthedocs.io/en/stable)
+- [Cert Manager](https://cert-manager.io)
+- [Grafana](https://grafana.com)
+- [Grafana Loki](https://grafana.com/docs/loki/latest)
+- [Grafana Promtail](https://grafana.com/docs/loki/latest/send-data/promtail)
+- [Helm](https://helm.sh/docs)
+- [Kubernetes](https://kubernetes.io)/[K3s](https://k3s.io)
+- [Longhorn](https://longhorn.io)
+- [Metal LB](https://metallb.io)
+- [n8n](https://n8n.io)
 - [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html)
-- [OpenFaaS](https://www.openfaas.com/) (coming soon)
-- [Prometheus](https://prometheus.io/) and [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
+- [Open WebUI](https://openwebui.com)
+- [OpenFaaS](https://www.openfaas.com) (coming soon)
+- [Prometheus](https://prometheus.io)
+- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
 - [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets)
 - [Terraform](https://developer.hashicorp.com/terraform)
 - [Traefik](https://traefik.io/traefik)
 - [vLLM](https://docs.vllm.ai)
 
-## 🙇🏻‍♂️ Acknowledgements
+## Acknowledgements
 
 - [Edede Oiwoh](https://github.com/ededejr) for inspiring me to build a home cluster and for bouncing ideas around
 - [rpi4cluster.com](https://rpi4cluster.com/) for tips on GitOps with Raspberry Pi setups (even if the notes weren't current and Helm/Argo configurations weren't file-based)
 - [Twitter](https://x.com) (now X), [Loom](https://www.loom.com/), and [Tesla](https://www.tesla.com/) for teaching me proper GitOps processes and giving me a chance to move mountains with them
-- [gitops-patterns repository](https://github.com/cloudogu/gitops-patterns) for what will likely be ongoing sources of truth for modern architecture patterns
diff --git a/helm/open-webui/templates/patch-job.yaml b/helm/open-webui/templates/patch-job.yaml
@@ -43,9 +43,9 @@ spec:
                 exit 1
               fi
               VLLM_URL="http://${VLLM_IP}:8000"
-              echo "vLLM URL: $VLLM_IP"
+              echo "vLLM URL: $VLLM_URL"
 
-              # Patch the StatefulSet with OLLAMA_BASE_URLS environment variable (using set env to merge, not replace)
+              # Patch the StatefulSet with OPENAI_API_BASE_URL environment variable (using set env to merge, not replace)
               kubectl set env statefulset/{{ include "open-webui.name" . }} -n {{ include "open-webui.namespace" . }} OPENAI_API_BASE_URL="$VLLM_URL"
 
               echo "StatefulSet patched with vLLM URL: $VLLM_URL"
diff --git a/helm/open-webui/values.yaml b/helm/open-webui/values.yaml
@@ -15,10 +15,11 @@ ollamaUrls: []
 # @section -- External Tools configuration
 ollamaUrlsFromExtraEnv: false
 
+# Set to true again if custom pipeline middleware is needed
 pipelines:
   # -- Automatically install Pipelines chart to extend Open WebUI functionality using Pipelines: https://github.com/open-webui/pipelines
   # @section -- External Tools configuration
-  enabled: true
+  enabled: false
   # -- This section can be used to pass required environment variables to your pipelines (e.g. Langfuse hostname)
   # @section -- External Tools configuration
   extraEnvVars: []
@@ -428,7 +429,7 @@ openaiBaseApiUrl: "https://api.openai.com/v1"
 # -- OpenAI base API URLs to use. Overwrites the value in openaiBaseApiUrl if set
 # This will be patched during deployment
 # @section -- OpenAI API configuration
-openaiBaseApiUrls: ["http://placeholder-will-be-patched:11434"]
+openaiBaseApiUrls: []
 
 # -- OpenAI API key to use. Default API key value for Pipelines if `openaiBaseApiUrl` is blank. Should be updated in a production deployment, or be changed to the required API key if not using Pipelines
 # @section -- OpenAI API configuration