Skip to content

Commit c82d603

Browse files
committed
Fix open webui inference fetch
1 parent 436a4a1 commit c82d603

File tree

3 files changed

+51
-37
lines changed

3 files changed

+51
-37
lines changed

README.md

Lines changed: 46 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
1-
> NOTE: This repository is currently in a state of flux as I finalize details of my cluster and slowly both learn and also move to different architectural patterns. In particular, the Helm and Terraform files will likely be drastically updated later as I migrate files and (eventually) bring [Atlantis](https://www.runatlantis.io) online for applying Terraform changes.
1+
> NOTE: Terraform files will be drastically updated later as [Atlantis](https://www.runatlantis.io) is brought online.
22
3-
# GitOps
3+
# GitOps 🛠️
44

5-
## 🔎 About
5+
## About
66

77
This repository contains ArgoCD, Helm, and Terraform files for declarative deployments with [Kubernetes](https://kubernetes.io/), specifically [k3s](https://k3s.io/).
88

99
You can use these files to stand up your own on-prem Kubernetes cluster. While this repository was built to be run on Raspberry Pi devices, it should be equally valid anywhere Kubernetes can run.
1010

1111
If you want to implement this for yourself, please follow the [setup document](./docs/SETUP.md) (which is actively being updated).
1212

13-
## 🎖️ Features
13+
## Features
1414

1515
- App-of-apps: A root Argo CD Application deployment schema which recursively manages child apps
1616
- Namespace deployments: `argocd`, `cert-manager`, `kube-system`, `logging`, `longhorn-system`, `monitoring`, and `applications-eng`
@@ -21,17 +21,18 @@ If you want to implement this for yourself, please follow the [setup document](.
2121
- n8n: Workflow automation platform with persistent storage
2222
- vLLM: Runtime for AI models on a GPU node
2323
- Dashboard UI for:
24-
- Argo CD: For controlling deployments and rollbacks
25-
- Grafana: For building dashboards against Prometheus data
26-
- Longhorn: For controlling the distributed block storage setup
27-
- n8n: For creating and managing automated workflows
28-
- Prometheus: For querying against raw data from pods/nodes/deployment resources
24+
- Argo CD: Controlling deployments and rollbacks
25+
- Grafana: Building dashboards against Prometheus data
26+
- Longhorn: Controlling the distributed block storage setup
27+
- n8n: Creating and managing automated workflows
28+
- Open WebUI: A ChatGPT-like interface paired with the vLLM deployment for inference
29+
- Prometheus: Querying against raw data from pods/nodes/deployment resources
2930

30-
## 🧱 Project Management
31+
## Project Management
3132

3233
Work for this repository is housed in this [Trello board](https://trello.com/b/HOJMq7WP/gitops).
3334

34-
## 📁 Project Structure
35+
## Project Structure
3536

3637
```bash
3738
├── argocd/ # ArgoCD application definitions
@@ -52,6 +53,7 @@ Work for this repository is housed in this [Trello board](https://trello.com/b/H
5253
│ ├── longhorn/ #
5354
│ ├── n8n/ #
5455
│ ├── nvidia-device-plugin/ #
56+
│ ├── open-webui/ #
5557
│ ├── prometheus/ #
5658
│ ├── prometheus-operator/ #
5759
│ ├── prometheus-service-monitors/ #
@@ -62,46 +64,57 @@ Work for this repository is housed in this [Trello board](https://trello.com/b/H
6264
└── storage-classes.tf # Longhorn storage class definitions
6365
```
6466

65-
## 🛠️ Built With
67+
## Built With
6668

6769
### Hardware
6870

6971
The cluster this repo's files runs on uses Raspberry Pi 5 devices, specifically the 16gb version.
7072

7173
Here's the hardware list of what each of the control/worker nodes is using:
7274

73-
1. [Raspberry Pi 5](https://www.amazon.com/dp/B0DSPYPKRG)
74-
2. [NVMe + POE+ Pi 5 Hat and Active Cooler](https://www.amazon.com/dp/B0D8JC3MXQ)
75-
3. [Samsung 2TB NVMe SSD](https://www.amazon.com/dp/B0DHLCRF91)
76-
4. [256gb Micro SD Card](https://www.amazon.com/dp/B08TJZDJ4D)
75+
1. [Raspberry Pi 5](https://amzn.to/4ps5tiR)
76+
2. [NVMe + POE+ Pi 5 Hat and Active Cooler](https://amzn.to/49HdXNT)
77+
3. [Samsung 2TB NVMe SSD](https://amzn.to/4onuB8Q)
78+
4. [256gb Micro SD Card](https://amzn.to/3MtUpCU)
7779

78-
> It's worth noting that one of my nodes is a computer running Ubuntu with a nice GPU, but that's really outside the scope of any guides I'd give for deploying this repository. The only part of this that will impact you is any apps that have node affinity for that setup (like the `nvidia-device-plugin-app` and `vllm-app` deployments), but you can easily remove that from your own deployments.
79-
>
80-
> The rest of the nodes are Raspberry Pi 5s as described above.
80+
The GPU node I am running for model inference is quite different and uses the following hardware:
81+
82+
1. [Ncase M3 Case](https://ncased.com/products/m3-round?srsltid=AfmBOoqQs1S0VUqh8MdMqWYxuy4zGDsMXjRNd5H4PEKTIi_6S1WMy2WY)
83+
2. [MSI B650I Edge Wifi Motherboard](https://amzn.to/4rqMshN)
84+
3. [AMD 9800x3D CPU](https://amzn.to/3K04gQk)
85+
4. [128gb DDR5 Corsair RAM](https://amzn.to/489KtXV)
86+
5. [8TB Western Digital NVMe SSD](https://amzn.to/49OdUju)
87+
6. [Nvidia RTX 6000 Pro Workstation GPU](https://amzn.to/4amKZTJ)
88+
7. [Corsair SF1000 PSU](https://amzn.to/4okJn05)
89+
8. [NZXT Kraken Elite 280mm AIO](https://amzn.to/4okXBxZ)
90+
9. [Noctua 120mm Fans](https://amzn.to/4okXBxZ)
91+
92+
I built it to be beefy enough to handle inference but also lightweight enough for me to unplug, take with me while traveling, and use as a personal computer.
8193

8294
### Software
8395

84-
- [Argo CD](https://argo-cd.readthedocs.io/en/stable/)
85-
- [Cert Manager](https://cert-manager.io/)
86-
- [Grafana](https://grafana.com/)
87-
- [Grafana Loki](https://grafana.com/docs/loki/latest/)
88-
- [Grafana Promtail](https://grafana.com/docs/loki/latest/send-data/promtail/)
89-
- [Helm](https://helm.sh/docs/)
90-
- [Kubernetes](https://kubernetes.io/), specifically [K3s](https://k3s.io/)
91-
- [Longhorn](https://longhorn.io/)
92-
- [Metal LB](https://metallb.io/)
93-
- [n8n](https://n8n.io/)
96+
- [Argo CD](https://argo-cd.readthedocs.io/en/stable)
97+
- [Cert Manager](https://cert-manager.io)
98+
- [Grafana](https://grafana.com)
99+
- [Grafana Loki](https://grafana.com/docs/loki/latest)
100+
- [Grafana Promtail](https://grafana.com/docs/loki/latest/send-data/promtail)
101+
- [Helm](https://helm.sh/docs)
102+
- [Kubernetes](https://kubernetes.io)/[K3s](https://k3s.io)
103+
- [Longhorn](https://longhorn.io)
104+
- [Metal LB](https://metallb.io)
105+
- [n8n](https://n8n.io)
94106
- [Nvidia Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/index.html)
95-
- [OpenFaaS](https://www.openfaas.com/) (coming soon)
96-
- [Prometheus](https://prometheus.io/) and [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
107+
- [Open WebUI](https://openwebui.com)
108+
- [OpenFaaS](https://www.openfaas.com) (coming soon)
109+
- [Prometheus](https://prometheus.io)
110+
- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator)
97111
- [Sealed Secrets](https://github.com/bitnami-labs/sealed-secrets)
98112
- [Terraform](https://developer.hashicorp.com/terraform)
99113
- [Traefik](https://traefik.io/traefik)
100114
- [vLLM](https://docs.vllm.ai)
101115

102-
## 🙇🏻‍♂️ Acknowledgements
116+
## Acknowledgements
103117

104118
- [Edede Oiwoh](https://github.com/ededejr) for inspiring me to build a home cluster and for bouncing ideas around
105119
- [rpi4cluster.com](https://rpi4cluster.com/) for tips on GitOps with Raspberry Pi setups (even if the notes weren't current and Helm/Argo configurations weren't file-based)
106120
- [Twitter](https://x.com) (now X), [Loom](https://www.loom.com/), and [Tesla](https://www.tesla.com/) for teaching me proper GitOps processes and giving me a chance to move mountains with them
107-
- [gitops-patterns repository](https://github.com/cloudogu/gitops-patterns) for what will likely be ongoing sources of truth for modern architecture patterns

helm/open-webui/templates/patch-job.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@ spec:
4343
exit 1
4444
fi
4545
VLLM_URL="http://${VLLM_IP}:8000"
46-
echo "vLLM URL: $VLLM_IP"
46+
echo "vLLM URL: $VLLM_URL"
4747
48-
# Patch the StatefulSet with OLLAMA_BASE_URLS environment variable (using set env to merge, not replace)
48+
# Patch the StatefulSet with OPENAI_API_BASE_URL environment variable (using set env to merge, not replace)
4949
kubectl set env statefulset/{{ include "open-webui.name" . }} -n {{ include "open-webui.namespace" . }} OPENAI_API_BASE_URL="$VLLM_URL"
5050
5151
echo "StatefulSet patched with vLLM URL: $VLLM_URL"

helm/open-webui/values.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,11 @@ ollamaUrls: []
1515
# @section -- External Tools configuration
1616
ollamaUrlsFromExtraEnv: false
1717

18+
# Set to true again if custom pipeline middleware is needed
1819
pipelines:
1920
# -- Automatically install Pipelines chart to extend Open WebUI functionality using Pipelines: https://github.com/open-webui/pipelines
2021
# @section -- External Tools configuration
21-
enabled: true
22+
enabled: false
2223
# -- This section can be used to pass required environment variables to your pipelines (e.g. Langfuse hostname)
2324
# @section -- External Tools configuration
2425
extraEnvVars: []
@@ -428,7 +429,7 @@ openaiBaseApiUrl: "https://api.openai.com/v1"
428429
# -- OpenAI base API URLs to use. Overwrites the value in openaiBaseApiUrl if set
429430
# This will be patched during deployment
430431
# @section -- OpenAI API configuration
431-
openaiBaseApiUrls: ["http://placeholder-will-be-patched:11434"]
432+
openaiBaseApiUrls: []
432433

433434
# -- OpenAI API key to use. Default API key value for Pipelines if `openaiBaseApiUrl` is blank. Should be updated in a production deployment, or be changed to the required API key if not using Pipelines
434435
# @section -- OpenAI API configuration

0 commit comments

Comments
 (0)