From c7406384516c5d447da459fad6ecb0d5d23c2907 Mon Sep 17 00:00:00 2001
From: David Breitgand <davidbr@il.ibm.com>
Date: Thu, 25 Dec 2025 20:09:17 +0200
Subject: [PATCH] Updates the architecture description with reference to BBR
 and support for multiple GenAI models and LoRAs to remove confusion about
 llm-d only supporing one model per cluster

---
 docs/architecture.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/architecture.md b/docs/architecture.md
index a6187939fb..0b08be6ff2 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -10,8 +10,7 @@
 
 The design enables:
 
-- Support for **multiple base models** within a shared cluster [Not supported in
-Phase1]
+- Support for **multiple base models** within a shared cluster (see [serving multiple gen AI models and LoRAs](https://gateway-api-inference-extension.sigs.k8s.io/guides/serve-multiple-genai-models/))
 - Efficient routing based on **KV cache locality**, **session affinity**, **load**, and
 **model metadata**
 - Disaggregated **Prefill/Decode (P/D)** execution
@@ -39,6 +38,7 @@ The inference scheduler is built on top of:
 
 - **Envoy** as a programmable data plane
 - **EPP (External Processing Plugin)** using **GIE**
+- **BBR (External Processing Plugin)** using **GIE**
 
 ---