diff --git a/candle-binding/README.md b/candle-binding/README.md index d717f69f98..44a8e3152a 100644 --- a/candle-binding/README.md +++ b/candle-binding/README.md @@ -48,4 +48,4 @@ go test -v ## Notes - The Go tests depend on the native library being present and correctly built. -- Some tests may download data from the internet (e.g., from norvig.com). +- Some tests may download data from the internet (e.g., from norvig.com). diff --git a/deploy/kubernetes/istio/README.md b/deploy/kubernetes/istio/README.md index 802d1284c6..de1a596405 100644 --- a/deploy/kubernetes/istio/README.md +++ b/deploy/kubernetes/istio/README.md @@ -2,7 +2,7 @@ This guide provides step-by-step instructions for deploying the vLLM Semantic Router (vSR) with Istio Gateway on Kubernetes. Istio Gateway uses Envoy under the covers so it is possible to use vSR with it. Istio is a common choice for the gateway when using Kubernetes Gateway API Inference Extension and in the LLM-D project as well as in common Kubernetes distributions such as Red Hat Openshift. In our experience, there are low level differences in how different Envoy based gateways process the ExtProc protocol to assist with LLM inference, hence this guide and some others cover the specific case of vSR working with an Istio based gateway. -There are multiple deployment guides in this repo related to vSR+Istio deployments. This current document describes deployment of vSR with Istio gateway and two local LLMs served using vLLM. Additional deployment guides in this repo build on this deployment to add support for integrating LLM-D and to illustrate support for routing to remote/ public cloud LLMs. Those topics are covered by other followup deployment guides in this repo ([llm-d guide](../llmd-base/README.md) and [public llm routing guide](../llmd-base/llmd+public-llm/README.md). +There are multiple deployment guides in this repo related to vSR+Istio deployments. This current document describes deployment of vSR with Istio gateway and two local LLMs served using vLLM. Additional deployment guides in this repo build on this deployment to add support for integrating LLM-D and to illustrate support for routing to remote/ public cloud LLMs. Those topics are covered by other followup deployment guides in this repo ([llm-d guide](../llmd-base/README.md) and [public llm routing guide](../llmd-base/llmd+public-llm/README.md). With that background context in mind, we now follow this guide to describe the vSR + Istio + locally hosted LLMs use case. After this guide, the reader may then optionally choose to follow up with the additional guides linked above to deploy the more advanced use cases. @@ -20,17 +20,17 @@ The deployment consists of: Before starting, ensure you have the following tools installed: - [Docker](https://docs.docker.com/get-docker/) - Container runtime -- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes +- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes - [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) - Kubernetes in Docker - [kubectl](https://kubernetes.io/docs/tasks/tools/) - Kubernetes CLI -Either minikube or kind works to deploy a local kubernetes cluster needed for this exercise so you only need one of these two. We use minikube in the description below but the same steps should work with a Kind cluster once the cluster is created in Step 1. +Either minikube or kind works to deploy a local kubernetes cluster needed for this exercise so you only need one of these two. We use minikube in the description below but the same steps should work with a Kind cluster once the cluster is created in Step 1. We will also deploy two different LLMs in this exercise to illustrate the semantic routing and model routing function more clearly so you ideally you should run this on a machine that has GPU support to run the two models used in this exercise and adequate memory and storage for these models. You can also use equivalent steps on a smaller server that runs smaller LLMs on a CPU based server without GPUs. ## Step 1: Create Minikube Cluster -Create a local Kubernetes cluster via minikube (or equivalently via Kind). +Create a local Kubernetes cluster via minikube (or equivalently via Kind). ```bash # Create cluster @@ -108,7 +108,7 @@ kubectl get pods -n istio-system ## Step 4: Update vsr config -The file deploy/kubernetes/istio/config.yaml will get used to configure vsr when it is installed in the next step. Ensure that the models in the config file match the models you are using and that the vllm_endpoints in the file match the ip/ port of the llm kubernetes services you are running. It is usually good to start with basic features of vsr such as prompt classification and model routing before experimenting with other features such as PromptGuard or ToolCalling. +The file deploy/kubernetes/istio/config.yaml will get used to configure vsr when it is installed in the next step. Ensure that the models in the config file match the models you are using and that the vllm_endpoints in the file match the ip/ port of the llm kubernetes services you are running. It is usually good to start with basic features of vsr such as prompt classification and model routing before experimenting with other features such as PromptGuard or ToolCalling. ## Step 5: Deploy vLLM Semantic Router @@ -134,7 +134,7 @@ kubectl apply -f deploy/kubernetes/istio/destinationrule.yaml kubectl apply -f deploy/kubernetes/istio/envoyfilter.yaml ``` -## Step 7: Install gateway routes +## Step 7: Install gateway routes Install HTTPRoutes in the Istio gateway. @@ -142,8 +142,9 @@ Install HTTPRoutes in the Istio gateway. kubectl apply -f deploy/kubernetes/istio/httproute-llama3-8b.yaml kubectl apply -f deploy/kubernetes/istio/httproute-phi4-mini.yaml ``` - + ## Step 8: Testing the Deployment + To expose the IP on which the Istio gateway listens to client requests from outside the cluster, you can choose any standard kubernetes option for external load balancing. We tested our feature by [deploying and configuring metallb](https://metallb.universe.tf/installation/) into the cluster to be the LoadBalancer provider. Please refer to metallb documentation for installation procedures if needed. Finally, for the minikube case, we get the external url as shown below. ```bash @@ -151,7 +152,7 @@ minikube service inference-gateway-istio --url http://192.168.49.2:30913 ``` -Now we can send LLM prompts via curl to http://192.168.49.2:30913 to access the Istio gateway which will then use information from vLLM semantic router to dynamically route to one of the two LLMs we are using as backends in this case. +Now we can send LLM prompts via curl to to access the Istio gateway which will then use information from vLLM semantic router to dynamically route to one of the two LLMs we are using as backends in this case. ### Send Test Requests diff --git a/deploy/kubernetes/llmd-base/README.md b/deploy/kubernetes/llmd-base/README.md index 0332034a9a..f4efcba4ee 100644 --- a/deploy/kubernetes/llmd-base/README.md +++ b/deploy/kubernetes/llmd-base/README.md @@ -2,9 +2,9 @@ This guide provides step-by-step instructions for deploying the vLLM Semantic Router (vSR) in combination with [LLM-D](https://github.com/llm-d/llm-d) and a single Inference gateway. This will also illustrate a key design pattern namely the use of the vSR as an automatic model picker in combination with the use of LLM-D as an endpoint picker. -A model picker provides the ability to route an LLM query to one of multiple LLM models that are entirely different from each other, whereas an endpoint picker selects one of multiple endpoints that each serve the same base model in a scale-out deployment for achieving higher performance. Hence this deployment shows how vSR (vLLM Semantic Router) in its role as a model picker based on semantic prompt analysis is perfectly complementary to endpoint picker solutions such as LLM-D. The combined solution enables optimized model serving with N separate base model types that have M endpoints each while relieving the end user/ LLM client of the burden of model selection or endpoint selection. +A model picker provides the ability to route an LLM query to one of multiple LLM models that are entirely different from each other, whereas an endpoint picker selects one of multiple endpoints that each serve the same base model in a scale-out deployment for achieving higher performance. Hence this deployment shows how vSR (vLLM Semantic Router) in its role as a model picker based on semantic prompt analysis is perfectly complementary to endpoint picker solutions such as LLM-D. The combined solution enables optimized model serving with N separate base model types that have M endpoints each while relieving the end user/ LLM client of the burden of model selection or endpoint selection. -Since LLM-D has a number of deployment configurations some of which require a larger hardware setup we will demonstrate a baseline version of LLM-D working in combination with vSR to introduce the core concepts. These same core concepts will also apply when using vSR with more complex LLM-D configurations and production grade well-lit paths as described in the LLM-D repo at [this link](https://github.com/llm-d/llm-d/tree/main/guides). +Since LLM-D has a number of deployment configurations some of which require a larger hardware setup we will demonstrate a baseline version of LLM-D working in combination with vSR to introduce the core concepts. These same core concepts will also apply when using vSR with more complex LLM-D configurations and production grade well-lit paths as described in the LLM-D repo at [this link](https://github.com/llm-d/llm-d/tree/main/guides). Also we will use LLM-D with Istio as the Inference Gateway in order to build on the steps and hardware setup from the [Istio deployment example](../istio/README.md) already documented in this repo. Istio is also commonly used as the default gateway for LLM-D with or without vSR. @@ -13,7 +13,7 @@ Also we will use LLM-D with Istio as the Inference Gateway in order to build on The deployment consists of: - **vLLM Semantic Router (vSR)**: Provides intelligent request routing and processing decisions to Envoy based Gateways -- **LLM-D**: Distributed Inference platform used for scaleout LLM inferencing with SOTA performance. +- **LLM-D**: Distributed Inference platform used for scaleout LLM inferencing with SOTA performance. - **Istio Gateway**: Istio's implementation of Kubernetes Gateway API that uses an Envoy proxy under the covers - **Gateway API Inference Extension**: Additional APIs to extend the Gateway API for Inference via ExtProc servers - **Two instances of vLLM serving 1 model each**: Example backend LLMs for illustrating semantic routing in this topology @@ -23,14 +23,14 @@ The deployment consists of: Before starting, ensure you have the following tools installed: - [Docker](https://docs.docker.com/get-docker/) - Container runtime -- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes +- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes - [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) - Kubernetes in Docker - [kubectl](https://kubernetes.io/docs/tasks/tools/) - Kubernetes CLI - [istioctl](https://istio.io/latest/docs/ops/diagnostic-tools/istioctl/) - Istio CLI We use minikube in the description below. As noted above, this guide builds upon the vsr + Istio [deployment guide]((../istio/README.md)) from this repo hence will point to that guide for the common portions of documentation and add the incremental additional steps here. -As was the case for the Istio guide, you will need a machine that has GPU support with at least 2 GPUs to run this exercise so that we can deploy and test the use of vsr to do model routing between two different LLM base models. +As was the case for the Istio guide, you will need a machine that has GPU support with at least 2 GPUs to run this exercise so that we can deploy and test the use of vsr to do model routing between two different LLM base models. ## Step 1: Common Steps from Istio Guide @@ -60,7 +60,7 @@ kubectl get pods -n istio-system ## Step 3: Deploy LLM models -Now deploy two LLM models similar to the [Istio guide](../istio/README.md) documentation. Note from the manifest file names that these example commands are to be executed from the top folder of the repo. The counterpart of this step from the LLM-D deployment documentation is the setup of the LLM-D Model Service. To keep things simple, we do not need the LLM-D Model service for this guide. +Now deploy two LLM models similar to the [Istio guide](../istio/README.md) documentation. Note from the manifest file names that these example commands are to be executed from the top folder of the repo. The counterpart of this step from the LLM-D deployment documentation is the setup of the LLM-D Model Service. To keep things simple, we do not need the LLM-D Model service for this guide. ```bash kubectl create secret generic hf-token-secret --from-literal=token=$HF_TOKEN @@ -131,7 +131,7 @@ kubectl apply -f deploy/kubernetes/llmd-base/dest-rule-epp-phi4.yaml ## Step 6: Update vSR config -Since this guide is based on using the same backend models as in the [Istio guide](../istio/README.md), we will reuse the same vSR config as from that guide and hence you do not need to update the file deploy/kubernetes/istio/config.yaml. If you were using different backend models as part of the LLM-D deployment, you would need to update this file. +Since this guide is based on using the same backend models as in the [Istio guide](../istio/README.md), we will reuse the same vSR config as from that guide and hence you do not need to update the file deploy/kubernetes/istio/config.yaml. If you were using different backend models as part of the LLM-D deployment, you would need to update this file. ## Step 7: Deploy vLLM Semantic Router @@ -165,8 +165,9 @@ Install HTTPRoutes in the Istio gateway. Note a difference here compared to the kubectl apply -f deploy/kubernetes/llmd-base/httproute-llama-pool.yaml kubectl apply -f deploy/kubernetes/llmd-base/httproute-phi4-pool.yaml ``` - + ## Step 10: Testing the Deployment + To expose the IP on which the Istio gateway listens to client requests from outside the cluster, you can choose any standard kubernetes option for external load balancing. We tested our feature by [deploying and configuring metallb](https://metallb.universe.tf/installation/) into the cluster to be the LoadBalancer provider. Please refer to metallb documentation for installation procedures if needed. Finally, for the minikube case, we get the external url as shown below. ```bash @@ -174,7 +175,7 @@ minikube service inference-gateway-istio --url http://192.168.49.2:32293 ``` -Now we can send LLM prompts via curl to http://192.168.49.2:32293 to access the Istio gateway which will then use information from vLLM semantic router to dynamically route to one of the two LLMs we are using as backends in this case. Use the port number that you get as output from your "minikube service" command when you try the curl examples below. +Now we can send LLM prompts via curl to to access the Istio gateway which will then use information from vLLM semantic router to dynamically route to one of the two LLMs we are using as backends in this case. Use the port number that you get as output from your "minikube service" command when you try the curl examples below. ### Send Test Requests @@ -250,7 +251,7 @@ $ kubectl get pods -n vllm-semantic-router-system NAME READY STATUS RESTARTS AGE semantic-router-bf6cdd5b9-t5hpg 1/1 Running 0 5d23h ``` - + ```bash $ kubectl get pods -n istio-system NAME READY STATUS RESTARTS AGE diff --git a/deploy/kubernetes/llmd-base/llmd+public-llm/README.md b/deploy/kubernetes/llmd-base/llmd+public-llm/README.md index 534eee9ca3..1c88ddc9c8 100644 --- a/deploy/kubernetes/llmd-base/llmd+public-llm/README.md +++ b/deploy/kubernetes/llmd-base/llmd+public-llm/README.md @@ -7,7 +7,7 @@ This guide showcases a deployment in which vSR can selectively route to some loc The deployment consists of: - **vLLM Semantic Router (vSR)**: Provides intelligent request routing and processing decisions to Envoy based Gateways -- **LLM-D**: Distributed Inference platform used for scaleout LLM inferencing with SOTA performance. +- **LLM-D**: Distributed Inference platform used for scaleout LLM inferencing with SOTA performance. - **Istio Gateway**: Istio's implementation of Kubernetes Gateway API that uses an Envoy proxy under the covers - **Gateway API Inference Extension**: Additional APIs to extend the Gateway API for Inference via ExtProc servers - **Two instances of vLLM serving the same local LLM**: Two replicas serving the same local LLM targeted by semantic routing in this topology @@ -18,7 +18,7 @@ The deployment consists of: Before starting, ensure you have the following tools installed: - [Docker](https://docs.docker.com/get-docker/) - Container runtime -- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes +- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes - [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) - Kubernetes in Docker - [kubectl](https://kubernetes.io/docs/tasks/tools/) - Kubernetes CLI - [istioctl](https://istio.io/latest/docs/ops/diagnostic-tools/istioctl/) - Istio CLI @@ -33,7 +33,7 @@ First, follow the steps documented in the [Istio guide](../istio/README.md), to ## Step 2: Install Istio Gateway, Gateway API, Inference Extension CRDs -Install CRDs for the Kubernetes Gateway API, Gateway API Inference Extension, Istio Control plane and an instance of the Istio Gateway exactly as described in the [Istio guide](../istio/README.md). You may also install istio using istioctl directly as described in the istio web site as long as the version is 1.28.0 or newer. +Install CRDs for the Kubernetes Gateway API, Gateway API Inference Extension, Istio Control plane and an instance of the Istio Gateway exactly as described in the [Istio guide](../istio/README.md). You may also install istio using istioctl directly as described in the istio web site as long as the version is 1.28.0 or newer. If installed correctly you should see the api CRDs for gateway api and inference extension as well as pods running for the Istio gateway and Istiod using the commands shown below. @@ -70,14 +70,14 @@ kubectl create secret generic hf-token-secret --from-literal=token=$HF_TOKEN kubectl apply -f deploy/kubernetes/istio/vLlama3.yaml ``` -This may take several (10+) minutes the first time this is run to download the model up until the vLLM pod running this model is in READY state. In this guide we will create two replicas of the same LLM instead of 1 replica each of two separate LLMs, hence scale this deployment to 2 and wait for both LLM pods to be in READY state. +This may take several (10+) minutes the first time this is run to download the model up until the vLLM pod running this model is in READY state. In this guide we will create two replicas of the same LLM instead of 1 replica each of two separate LLMs, hence scale this deployment to 2 and wait for both LLM pods to be in READY state. ```bash # Create a 2nd replica of the same local LLM kubectl scale deploy llama-8b --replicas=2 ``` -At the end of this you should be able to see both your vLLM pods are READY and serving these LLMs using the command below. +At the end of this you should be able to see both your vLLM pods are READY and serving these LLMs using the command below. ```bash # Verify that vLLM pods running the two LLMs are READY and serving @@ -111,7 +111,7 @@ kubectl apply -f deploy/kubernetes/llmd-base/dest-rule-epp-llama.yaml ## Step 6: Update vSR config -For this guide, we use an updated vSR config file which sets two endpoints, one for the local LLM service and a second for the openai backend model, specifically we use the "gpt-4o-mini" in the provided example config. Take a look at deploy/kubernetes/llmd-base/llmd+public/config.yaml, copy it over to the config.yaml in the istio folder so that we can reuse the other manifests and kustomize from that folder to deploy vSR with this config as shown below. +For this guide, we use an updated vSR config file which sets two endpoints, one for the local LLM service and a second for the openai backend model, specifically we use the "gpt-4o-mini" in the provided example config. Take a look at deploy/kubernetes/llmd-base/llmd+public/config.yaml, copy it over to the config.yaml in the istio folder so that we can reuse the other manifests and kustomize from that folder to deploy vSR with this config as shown below. ```bash cp deploy/kubernetes/llmd-base/llmd+public-llm/config.yaml.openai deploy/kubernetes/istio/config.yaml @@ -141,9 +141,9 @@ kubectl apply -f deploy/kubernetes/istio/destinationrule.yaml kubectl apply -f deploy/kubernetes/istio/envoyfilter.yaml ``` -## Step 9: Create a K8S Service and an Istio ServiceEntry to represent the OpenAI target +## Step 9: Create a K8S Service and an Istio ServiceEntry to represent the OpenAI target -vSR's HTTPRoute will need a Kubernetes service representation for the OpenAI connection and since this is an external service, also need an Istio ServiceEntry representation. Set these up using the provided anifests. +vSR's HTTPRoute will need a Kubernetes service representation for the OpenAI connection and since this is an external service, also need an Istio ServiceEntry representation. Set these up using the provided anifests. ```bash kubectl apply -f deploy/kubernetes/llmd-base/llmd+public-llm/svc-openai.yaml @@ -160,7 +160,7 @@ kubectl apply -f deploy/kubernetes/llmd-base/llmd+public-llm/dest-rule-openai.ya ## Step 11: Set up and check API account credentials for OpenAI api access -In order to use the OpenAI API programmatically over the internet, you will need an OpenAI developer account with credentials that allow you to make api calls. Once registered with OpenAI, store your api key into your local environment and perform a manual curl test to access the OpenAI api with an LLM query to the same model to confirm that your account and credentials are setup correctly and there are no access issues. Perform the manual access via an LLM query to the same model that we have setup in our vSR config earlier (the "gpt-40-mini" model in our case). A valid LLM response indicates all is well with the OpenAI account and path and it can be added to the main deployment in the following step. +In order to use the OpenAI API programmatically over the internet, you will need an OpenAI developer account with credentials that allow you to make api calls. Once registered with OpenAI, store your api key into your local environment and perform a manual curl test to access the OpenAI api with an LLM query to the same model to confirm that your account and credentials are setup correctly and there are no access issues. Perform the manual access via an LLM query to the same model that we have setup in our vSR config earlier (the "gpt-40-mini" model in our case). A valid LLM response indicates all is well with the OpenAI account and path and it can be added to the main deployment in the following step. ```bash ## Once registered, confirm that you have your OpenAI key in your env. @@ -177,7 +177,7 @@ curl https://api.openai.com/v1/chat/completions -H "Content-Type: application/ }' ``` -## Step 12: Move the OpenAI api key into Kubernetes and Istio Env +## Step 12: Move the OpenAI api key into Kubernetes and Istio Env First create a Kubernetes secret using the OpenAI api key from the environment and then move it into the Istio-proxy container environment as shown next. @@ -201,18 +201,18 @@ kubectl patch deployment inference-gateway-istio --type='json' -p='[ kubectl exec -it deploy/inference-gateway-istio -- printenv | grep OPENAI_API_KEY ``` -## Step 13: Patch the OPENAI_API_KEY into the HTTPRoute for OpenAI +## Step 13: Patch the OPENAI_API_KEY into the HTTPRoute for OpenAI -Patch the OPEN_AI_API_KEY from your environment into a template file to generate the manifest for the HTTPRoute representing the OpenAI target. Note that you can skip step 12 by doing this step but for now we also listed step 12 in case you have other automation options for generating the httproute manifest while templating in the value of the OPENAI_API_KEY. +Patch the OPEN_AI_API_KEY from your environment into a template file to generate the manifest for the HTTPRoute representing the OpenAI target. Note that you can skip step 12 by doing this step but for now we also listed step 12 in case you have other automation options for generating the httproute manifest while templating in the value of the OPENAI_API_KEY. ```bash ## Patch the OPENAI_API_KEY into the template to create the httproute manifest file sed "s/{{OPENAI_API_KEY}}/$OPENAI_API_KEY/g" deploy/kubernetes/llmd-base/llmd+public-llm/httproute-openai.template > deploy/kubernetes/llmd-base/llmd+public-llm/httproute-openai.yaml ``` -## Step 14: Create HTTPRoutes for Local LLM and for the OpenAI target +## Step 14: Create HTTPRoutes for Local LLM and for the OpenAI target -Now deploy the HTTPRoute manifest for the openai route destination. In the manifest note again that we match on the contents of the x-selected-model and also setup the injection of the OpenAI api key as a bearer token for enabling the access into OpenAI api for this route. For the local LLM we use a route similar to the llm-d guide since we want the prompt query to also get routed via the inferencepool and LLM-D scheduler for the Llama pool which will then pick one of the multiple endpoints in the pool serving the Llama LLM in this example. +Now deploy the HTTPRoute manifest for the openai route destination. In the manifest note again that we match on the contents of the x-selected-model and also setup the injection of the OpenAI api key as a bearer token for enabling the access into OpenAI api for this route. For the local LLM we use a route similar to the llm-d guide since we want the prompt query to also get routed via the inferencepool and LLM-D scheduler for the Llama pool which will then pick one of the multiple endpoints in the pool serving the Llama LLM in this example. ```bash ## HTTpRoute for OpenAI @@ -226,6 +226,7 @@ kubectl apply -f deploy/kubernetes/llmd-base/httproute-llama-pool.yaml ``` ## Step 15: Testing the Deployment + To expose the IP on which the Istio gateway listens to client requests from outside the cluster, you can choose any standard kubernetes option for external load balancing. We tested our feature by [deploying and configuring metallb](https://metallb.universe.tf/installation/) into the cluster to be the LoadBalancer provider. Please refer to metallb documentation for installation procedures if needed. Finally, for the minikube case, we get the external url as shown below. ```bash @@ -233,7 +234,7 @@ minikube service inference-gateway-istio --url http://192.168.49.2:31275 ``` -Now we can send LLM prompts via curl to http://192.168.49.2:31275 to access the Istio gateway which will then use information from vLLM semantic router to dynamically route to one of the two LLMs we are using as backends in this case. Use the port number that you get as output from your "minikube service" command when you try the curl examples below. +Now we can send LLM prompts via curl to to access the Istio gateway which will then use information from vLLM semantic router to dynamically route to one of the two LLMs we are using as backends in this case. Use the port number that you get as output from your "minikube service" command when you try the curl examples below. ### Send Test Requests @@ -318,7 +319,7 @@ $ kubectl get pods -n vllm-semantic-router-system NAME READY STATUS RESTARTS AGE semantic-router-bf6cdd5b9-t5hpg 1/1 Running 0 5d23h ``` - + ```bash $ kubectl get pods -n istio-system NAME READY STATUS RESTARTS AGE @@ -350,7 +351,7 @@ $ kubectl get httproute vsr-openai-g4 -o yaml | grep -A 1 "reason: ResolvedRefs" status: "True" ``` -Also as noted previously in Step 11 verify your OpenAI account credentials and api access separately prior to accessing via the vSR + Istio setup. +Also as noted previously in Step 11 verify your OpenAI account credentials and api access separately prior to accessing via the vSR + Istio setup. ### Common Issues @@ -404,7 +405,7 @@ minikube delete ## Next Steps - Test/ experiment with different features of vLLM Semantic Router -- Test/ experiment with different public hosted models and model providers +- Test/ experiment with different public hosted models and model providers - Test/ experiment with more complex LLM-D configurations and well-lit paths - Set up monitoring and observability - Implement authentication and authorization diff --git a/deploy/openshift/README-DYNAMIC-IPS.md b/deploy/openshift/README-DYNAMIC-IPS.md index f004ea1142..962dc3c27d 100644 --- a/deploy/openshift/README-DYNAMIC-IPS.md +++ b/deploy/openshift/README-DYNAMIC-IPS.md @@ -100,6 +100,7 @@ oc exec deployment/semantic-router -c semantic-router -- \ ## Configuration Files ### Template: config-split.yaml + Contains **placeholder IPs** that get replaced: ```yaml @@ -113,6 +114,7 @@ vllm_endpoints: ``` ### Generated: ConfigMap + Contains **actual ClusterIPs** discovered during deployment: ```yaml diff --git a/e2e-tests/hallucination-demo/README.md b/e2e-tests/hallucination-demo/README.md index 05232d4daa..3680213c51 100644 --- a/e2e-tests/hallucination-demo/README.md +++ b/e2e-tests/hallucination-demo/README.md @@ -45,12 +45,14 @@ make demo-hallucination ## Components ### mock_vllm_toolcall.py + Mock LLM server that: - Returns `tool_calls` on first request (to invoke web_search) - Returns hallucinated responses on follow-up (with tool results) ### mock_web_search.py + Mock search service that returns ground truth context: - Eiffel Tower facts @@ -58,6 +60,7 @@ Mock search service that returns ground truth context: - Apple Inc. founding info ### chat_client.py + Interactive CLI client that: - Sends questions through the semantic router diff --git a/examples/candle-binding/README.md b/examples/candle-binding/README.md index bed79a65a6..94e7739e26 100644 --- a/examples/candle-binding/README.md +++ b/examples/candle-binding/README.md @@ -540,7 +540,7 @@ ls ../../models/qwen3_generative_classifier_r16/ ``` 2. **Batch processing**: For large datasets, process in batches - + 3. **Adapter preloading**: Load all adapters once at startup 4. **Cache results**: Cache classifications for repeated queries diff --git a/examples/mcp-classifier-server/README.md b/examples/mcp-classifier-server/README.md index d4dccaefed..4e5530bb7f 100644 --- a/examples/mcp-classifier-server/README.md +++ b/examples/mcp-classifier-server/README.md @@ -132,7 +132,7 @@ github.com/vllm-project/semantic-router/src/semantic-router/pkg/mcp/api } } ``` - + The `category_system_prompts` and `category_descriptions` fields are optional but recommended. Per-category system prompts allow the MCP server to provide specialized instructions for each category that the router can inject when processing queries in that specific category. @@ -220,7 +220,7 @@ python3 server_embedding.py --http --port 8090 ### Features - **Qwen3-Embedding-0.6B** model with 1024-dimensional embeddings -- **Milvus vector database** for fast similarity search +- **Milvus vector database** for fast similarity search - **RAG-style classification** using 95 training examples - **Same MCP protocol** as regex server (drop-in replacement) - **Higher accuracy** - Understands semantic meaning, not just patterns diff --git a/perf/testdata/examples/README.md b/perf/testdata/examples/README.md index 00d294c54c..1c1b17fcb2 100644 --- a/perf/testdata/examples/README.md +++ b/perf/testdata/examples/README.md @@ -5,6 +5,7 @@ This directory contains example outputs showing what you'll see when running per ## 📁 Files in This Directory ### 1. **benchmark-output-example.txt** + Raw benchmark output from `make perf-bench-quick` **Shows:** @@ -24,6 +25,7 @@ BenchmarkClassifyBatch_Size1-8 100 10245678 ns/op 10.25 ms/op 2456 B/op 4 --- ### 2. **comparison-example.txt** + Baseline comparison output from `make perf-compare` **Shows:** @@ -45,6 +47,7 @@ Baseline comparison output from `make perf-compare` --- ### 3. **example-report.json** + Machine-readable JSON report **Use for:** @@ -68,6 +71,7 @@ Machine-readable JSON report --- ### 4. **example-report.md** + Human-readable Markdown report **Use for:** @@ -87,6 +91,7 @@ Human-readable Markdown report --- ### 5. **example-report.html** + Beautiful HTML report with styling **Features:** @@ -106,6 +111,7 @@ open perf/testdata/examples/example-report.html --- ### 6. **pr-comment-example.md** + GitHub PR comment format **Shows:** @@ -121,6 +127,7 @@ GitHub PR comment format --- ### 7. **pprof-example.txt** + CPU profiling output and interpretation **Shows:** diff --git a/src/training/dual_classifier/DUAL_CLASSIFIER_SYSTEM_TEST_SUMMARY.md b/src/training/dual_classifier/DUAL_CLASSIFIER_SYSTEM_TEST_SUMMARY.md index 3aafe64be1..93d6e54d60 100644 --- a/src/training/dual_classifier/DUAL_CLASSIFIER_SYSTEM_TEST_SUMMARY.md +++ b/src/training/dual_classifier/DUAL_CLASSIFIER_SYSTEM_TEST_SUMMARY.md @@ -1,6 +1,7 @@ # Task 2 Testing Summary: Dual-Head Architecture POC with Training ## Overview + Task 2 successfully implemented and tested a complete dual-purpose DistilBERT classifier with comprehensive training infrastructure for both category classification and PII detection using a shared model architecture. ## Test Coverage @@ -157,6 +158,7 @@ dual_classifier/ ✅ **Test Coverage**: Comprehensive test suite with 14 passing tests ## Next Steps + Task 2 is fully complete and validated. The implementation provides a solid foundation for: - Task 3: Data Pipeline Implementation (real dataset integration) @@ -168,4 +170,4 @@ Task 2 is fully complete and validated. The implementation provides a solid foun - Training completes in under 20 seconds for 50 samples - Model achieves 45% category accuracy and 91% PII F1-score on small synthetic dataset - Memory usage is efficient for laptop deployment -- No GPU required for development and testing +- No GPU required for development and testing diff --git a/website/README.md b/website/README.md index d0a11c0990..e5128516f4 100644 --- a/website/README.md +++ b/website/README.md @@ -6,7 +6,7 @@ This directory contains the Docusaurus-based documentation website for the vLLM ### Prerequisites -- Node.js 18+ +- Node.js 18+ - npm or yarn ### Development @@ -21,7 +21,7 @@ make docs-dev cd website && npm start ``` -The site will be available at http://localhost:3000 +The site will be available at ### Production Build @@ -88,6 +88,7 @@ website/ ## 🛠️ Customization ### Themes and Colors + Edit `src/css/custom.css` to modify: - Color scheme and gradients @@ -96,6 +97,7 @@ Edit `src/css/custom.css` to modify: - Animations and effects ### Navigation + Update `sidebars.js` to modify: - Documentation structure @@ -103,6 +105,7 @@ Update `sidebars.js` to modify: - Page ordering ### Site Configuration + Modify `docusaurus.config.js` for: - Site metadata @@ -121,6 +124,6 @@ Modify `docusaurus.config.js` for: ## 🔗 Links -- **Live Preview**: http://localhost:3000 (when running) -- **Docusaurus Docs**: https://docusaurus.io/docs +- **Live Preview**: (when running) +- **Docusaurus Docs**: - **Main Project**: ../README.md diff --git a/website/blog/2025-10-20-q4-roadmap-iris.md b/website/blog/2025-10-20-q4-roadmap-iris.md index 82cc1803bd..47d0480d05 100644 --- a/website/blog/2025-10-20-q4-roadmap-iris.md +++ b/website/blog/2025-10-20-q4-roadmap-iris.md @@ -260,6 +260,7 @@ While vLLM Semantic Router works well for experimental deployments, production a **The Deliverables** #### Helm Chart Support + Professional Kubernetes deployment with: - Templated manifests for all resources @@ -268,6 +269,7 @@ Professional Kubernetes deployment with: - Best practices for security, scaling, and resource management #### Modern Management Dashboard + A comprehensive web-based control plane featuring: - **Visual Route Builder**: Drag-and-drop interface for creating SemanticRoute configurations diff --git a/website/docs/api/classification.md b/website/docs/api/classification.md index 584010c989..5dc5b31346 100644 --- a/website/docs/api/classification.md +++ b/website/docs/api/classification.md @@ -105,6 +105,7 @@ curl -X GET http://localhost:8080/info/classifier Classify user queries into routing categories. ### Endpoint + `POST /classify/intent` ### Request Format @@ -166,6 +167,7 @@ The current model supports the following 14 categories: Detect personally identifiable information in text. ### Endpoint + `POST /classify/pii` ### Request Format @@ -216,6 +218,7 @@ Detect personally identifiable information in text. Detect potential jailbreak attempts and adversarial prompts. ### Endpoint + `POST /classify/security` ### Request Format @@ -254,6 +257,7 @@ Detect potential jailbreak attempts and adversarial prompts. Perform multiple classification tasks in a single request. ### Endpoint + `POST /classify/combined` ### Request Format @@ -312,6 +316,7 @@ Perform multiple classification tasks in a single request. Process multiple texts in a single request using **high-confidence LoRA models** for maximum accuracy and efficiency. The API automatically discovers and uses the best available models (BERT, RoBERTa, or ModernBERT) with LoRA fine-tuning, delivering confidence scores of 0.99+ for in-domain texts. ### Endpoint + `POST /classify/batch` ### Request Format @@ -495,6 +500,7 @@ The API automatically scans the `./models/` directory and selects the best avail Get information about loaded classification models. #### Endpoint + `GET /info/models` ### Response Format @@ -563,6 +569,7 @@ When models are not loaded, the API will return placeholder responses for testin Get detailed information about classifier capabilities and configuration. #### Generic Categories via MMLU-Pro Mapping + You can now use free-style, generic category names in your config and map them to the MMLU-Pro categories used by the classifier. The classifier will translate its MMLU predictions into your generic categories for routing and reasoning decisions. Example configuration: @@ -637,6 +644,7 @@ Notes: - When no mapping is found for a predicted MMLU category, the original MMLU name is used as-is. #### Endpoint + `GET /info/classifier` #### Response Format @@ -725,6 +733,7 @@ Notes: Get real-time classification performance metrics. ### Endpoint + `GET /metrics/classification` ### Response Format @@ -756,6 +765,7 @@ Get real-time classification performance metrics. ## Configuration Management ### Get Current Configuration + `GET /config/classification` ```json @@ -779,6 +789,7 @@ Get real-time classification performance metrics. ``` ### Update Configuration + `PUT /config/classification` ```json @@ -1019,6 +1030,7 @@ const api = new ClassificationAPI(); Development and testing endpoints for model validation: #### Test Classification Accuracy + `POST /test/accuracy` ```json @@ -1032,6 +1044,7 @@ Development and testing endpoints for model validation: ``` #### Benchmark Performance + `POST /test/benchmark` ```json diff --git a/website/docs/installation/k8s/gateway-api-inference-extension.md b/website/docs/installation/k8s/gateway-api-inference-extension.md index f9523fe6fb..ba60540543 100644 --- a/website/docs/installation/k8s/gateway-api-inference-extension.md +++ b/website/docs/installation/k8s/gateway-api-inference-extension.md @@ -15,18 +15,23 @@ The deployment consists of three main components: Integrating vSR with Istio and GIE provides a robust, Kubernetes-native solution for serving LLMs with several key benefits: ### 1. **Kubernetes-Native LLM Management** + Manage your models, routing, and scaling policies directly through `kubectl` using familiar Custom Resource Definitions (CRDs). ### 2. **Intelligent Model and Replica Routing** + Combine vSR's prompt-based model routing with GIE's smart, load-aware replica selection. This ensures requests are sent not only to the right model but also to the healthiest replica, all in a single, efficient hop. ### 3. **Protect Your Models from Overload** + The built-in scheduler tracks GPU load and request queues, automatically shedding traffic to prevent your model servers from crashing under high demand. ### 4. **Deep Observability** + Gain insights from both high-level Gateway metrics and detailed vSR performance data (like token usage and classification accuracy) to monitor and troubleshoot your entire AI stack. ### 5. **Secure Multi-Tenancy** + Isolate tenant workloads using standard Kubernetes namespaces and `HTTPRoutes`. Apply rate limits and other policies while sharing a common, secure gateway infrastructure. ## Supported Backend Models diff --git a/website/docs/overview/categories/overview.md b/website/docs/overview/categories/overview.md index 9976b1ca5e..e93330f427 100644 --- a/website/docs/overview/categories/overview.md +++ b/website/docs/overview/categories/overview.md @@ -103,15 +103,19 @@ Different categories can trigger specific processing filters: ## Benefits of Category-Based Routing ### 🎯 **Precision** + Route queries to models specifically optimized for their domain ### ⚡ **Performance** + Reduce latency by avoiding over-powered models for simple queries ### 💰 **Cost Optimization** + Use expensive reasoning models only when necessary ### 🔧 **Flexibility** + Easy configuration of domain-specific behaviors and rules ### 📊 **Observability** diff --git a/website/docs/overview/semantic-router-overview.md b/website/docs/overview/semantic-router-overview.md index 7cd920541e..c0e5548b21 100644 --- a/website/docs/overview/semantic-router-overview.md +++ b/website/docs/overview/semantic-router-overview.md @@ -117,7 +117,7 @@ graph TB **Key Innovations:** - **Human Preference Training**: Uses Chatbot Arena data where users compare model outputs -- **Multiple Router Architectures**: +- **Multiple Router Architectures**: - Similarity-weighted ranking - Matrix factorization - BERT classifiers @@ -161,7 +161,7 @@ import ZoomableMermaid from '@site/src/components/ZoomableMermaid'; participant Code as Code Specialist participant Creative as Creative Writer participant General as General Model - + User->>Router: "Solve this calculus problem..." Router->>Router: Analyze query intent Router->>Math: Route to math specialist diff --git a/website/docs/tutorials/intelligent-route/domain-routing.md b/website/docs/tutorials/intelligent-route/domain-routing.md index 612ab5d3e1..bb1d3ca454 100644 --- a/website/docs/tutorials/intelligent-route/domain-routing.md +++ b/website/docs/tutorials/intelligent-route/domain-routing.md @@ -217,26 +217,31 @@ curl -X POST http://localhost:8801/v1/chat/completions \ ## Real-World Use Cases ### 1. Multi-Task Classification with LoRA (Efficient) + **Problem**: Need domain classification + PII detection + jailbreak detection on every request **Solution**: LoRA adapters run all 3 tasks with one base model pass instead of 3 separate models **Impact**: 3x faster than running 3 full models, <1% parameter overhead per task ### 2. Long Document Analysis (Specialized - Qwen3) + **Problem**: Research papers and legal documents exceed 8K token limit of ModernBERT **Solution**: Qwen3-Embedding supports up to 32K tokens without truncation **Impact**: Accurate classification on full documents, no information loss from truncation ### 3. Multilingual Education Platform (Specialized - Qwen3) + **Problem**: Students ask questions in 100+ languages, ModernBERT limited to English **Solution**: Qwen3-Embedding trained on 100+ languages handles multilingual routing **Impact**: Single model serves global users, consistent quality across languages ### 4. Edge Deployment (Specialized - Gemma) + **Problem**: Mobile/IoT devices can't run large classification models **Solution**: EmbeddingGemma-300M with Matryoshka embeddings (128-768 dims) **Impact**: 5x smaller model, runs on edge devices with <100MB memory ### 5. STEM Tutoring Platform (Efficient Reasoning Control) + **Problem**: Math/physics need reasoning, but history/literature don't **Solution**: Domain classifier routes STEM → reasoning models, humanities → fast models **Impact**: 2x better STEM accuracy, 60% cost savings on non-STEM queries diff --git a/website/docs/tutorials/intelligent-route/embedding-routing.md b/website/docs/tutorials/intelligent-route/embedding-routing.md index 2c7d0ac981..4fe9922efb 100644 --- a/website/docs/tutorials/intelligent-route/embedding-routing.md +++ b/website/docs/tutorials/intelligent-route/embedding-routing.md @@ -132,26 +132,31 @@ curl -X POST http://localhost:8801/v1/chat/completions \ ## Real-World Use Cases ### 1. Customer Support (Scalable Categories) + **Problem**: Need to add new support categories weekly without retraining models **Solution**: Add new categories by updating keyword lists, embeddings handle semantic matching **Impact**: Deploy new categories in minutes vs weeks for model retraining ### 2. E-commerce Support (Fast Semantic Matching) + **Problem**: "Where's my order?" vs "track package" vs "shipping status" all mean the same **Solution**: Gemma embeddings (10-20ms) route all variations to order tracking category **Impact**: 95% accuracy with 10-20ms latency, handles 5K+ queries/sec ### 3. SaaS Product Inquiries (Flexible Routing) + **Problem**: Users ask about pricing in 100+ different ways **Solution**: Semantic similarity matches all variations to "pricing information" keywords **Impact**: Single category handles all pricing queries without explicit rules ### 4. Startup Iteration (Rapid Category Updates) + **Problem**: Product evolves rapidly, need to adjust categories daily **Solution**: Update embedding keywords in config, no model retraining required **Impact**: Category updates in seconds vs days for fine-tuning ### 5. Multilingual Platform (Semantic Understanding) + **Problem**: Same question in English, Spanish, Chinese needs same routing **Solution**: Embeddings capture cross-lingual semantics automatically **Impact**: Single category definition works across languages diff --git a/website/docs/tutorials/intelligent-route/keyword-routing.md b/website/docs/tutorials/intelligent-route/keyword-routing.md index 6ccc3aeb43..c0e02b9d05 100644 --- a/website/docs/tutorials/intelligent-route/keyword-routing.md +++ b/website/docs/tutorials/intelligent-route/keyword-routing.md @@ -108,26 +108,31 @@ curl -X POST http://localhost:8801/v1/chat/completions \ ## Real-World Use Cases ### 1. Financial Services (Transparent Compliance) + **Problem**: Regulators require explainable routing decisions for audit trails **Solution**: Keyword rules provide clear "why" for each routing decision (e.g., "SSN" keyword → secure handler) **Impact**: Passed SOC2 audit, complete decision transparency ### 2. Healthcare Platform (Compliant PII Detection) + **Problem**: HIPAA requires deterministic, auditable PII detection **Solution**: AND operator detects multiple PII indicators with documented rules **Impact**: 100% deterministic, full audit trail for compliance ### 3. High-Frequency Trading (Sub-millisecond Routing) + **Problem**: Need <1ms classification for real-time market data routing **Solution**: Keyword matching provides instant classification without ML overhead **Impact**: 0.1ms latency, handles 100K+ requests/sec ### 4. Government Services (Interpretable Rules) + **Problem**: Citizens need to understand why requests were routed/rejected **Solution**: Clear keyword rules can be explained in plain language **Impact**: Reduced complaints, transparent decision-making ### 5. Enterprise Security (Transparent Threat Detection) + **Problem**: Security team needs to understand why queries were flagged **Solution**: Explicit keyword/regex rules for threat patterns with clear documentation **Impact**: Security team can validate and update rules confidently diff --git a/website/docs/tutorials/intelligent-route/lora-routing.md b/website/docs/tutorials/intelligent-route/lora-routing.md index 285968f95c..86d6d98548 100644 --- a/website/docs/tutorials/intelligent-route/lora-routing.md +++ b/website/docs/tutorials/intelligent-route/lora-routing.md @@ -193,26 +193,31 @@ Check the router logs to confirm the correct LoRA adapter is selected for each q ## Real-World Use Cases ### 1. Healthcare Platform (Domain Routing + LoRA) + **Problem**: Medical queries need specialized adapters, but users don't know which to use **Solution**: Domain routing classifies into diagnosis/pharmacy/mental-health, routes to corresponding LoRA adapters **Impact**: Automatic adapter selection, 70GB memory vs 210GB for 3 full models ### 2. Legal Tech (Keyword Routing + LoRA for Compliance) + **Problem**: Compliance requires auditable routing to jurisdiction-specific legal adapters **Solution**: Keyword routing detects "US law"/"EU law"/"contract" keywords, routes to compliant LoRA adapters **Impact**: 100% auditable routing decisions, 95% citation accuracy with specialized adapters ### 3. Customer Support (Embedding Routing + LoRA) + **Problem**: Support queries span IT/HR/finance, users phrase questions in many ways **Solution**: Embedding routing matches semantic intent, routes to department-specific LoRA adapters **Impact**: Handles paraphrases, single endpoint serves all departments with <10ms adapter switching ### 4. EdTech Platform (Domain Routing + LoRA) + **Problem**: Students ask math/science/literature questions, need subject-specific tutors **Solution**: Domain routing classifies academic subject, routes to subject-specific LoRA adapters **Impact**: 4 specialized tutors for cost of 1.2 base models, 70% cost savings ### 5. Multi-Tenant SaaS (MCP Routing + LoRA) + **Problem**: Each tenant has custom LoRA adapters, need dynamic routing based on tenant ID **Solution**: MCP routing queries tenant database, returns tenant-specific LoRA adapter name **Impact**: 1000+ tenants with custom adapters, private routing logic, A/B testing support diff --git a/website/docs/tutorials/intelligent-route/mcp-routing.md b/website/docs/tutorials/intelligent-route/mcp-routing.md index b52a2aaebe..d78f45a996 100644 --- a/website/docs/tutorials/intelligent-route/mcp-routing.md +++ b/website/docs/tutorials/intelligent-route/mcp-routing.md @@ -161,31 +161,37 @@ curl -X POST http://localhost:8801/v1/chat/completions \ ## Real-World Use Cases ### 1. Complex Domain Classification (High Accuracy) + **Problem**: Nuanced legal/medical queries need better accuracy than BERT/embeddings **Solution**: MCP uses GPT-4 with in-context examples for classification **Impact**: 98% accuracy vs 85% with BERT, baseline for quality comparison ### 2. Proprietary Classification Logic (Private) + **Problem**: Classification logic contains trade secrets, can't use external services **Solution**: MCP server runs in private VPC, keeps all logic and data internal **Impact**: Full data privacy, no external API calls ### 3. Custom Business Rules (Extensible) + **Problem**: Need to route based on user tier, location, time, A/B tests **Solution**: MCP combines LLM classification with database queries and business logic **Impact**: Flexible routing without modifying router code ### 4. Rapid Experimentation (Extensible) + **Problem**: Data science team needs to test new classification approaches daily **Solution**: MCP server updated independently, router unchanged **Impact**: Deploy new classification logic in minutes vs days ### 5. Multi-Tenant Platform (Extensible + Private) + **Problem**: Each customer needs custom classification, data must stay isolated **Solution**: MCP loads tenant-specific models/rules, enforces data isolation **Impact**: 1000+ tenants with custom logic, full data privacy ### 6. Hybrid Approach (High Accuracy + Extensible) + **Problem**: Need LLM accuracy for edge cases, fast routing for common queries **Solution**: MCP uses cached responses for common patterns, LLM for novel queries **Impact**: 95% cache hit rate, LLM accuracy on long tail diff --git a/website/docs/tutorials/observability/overview.md b/website/docs/tutorials/observability/overview.md index 0ddf2412f6..bb08fe155d 100644 --- a/website/docs/tutorials/observability/overview.md +++ b/website/docs/tutorials/observability/overview.md @@ -15,6 +15,7 @@ Provides health endpoints for monitoring service and dependency status. ### Structured Logging Comprehensive logging for request tracing, security events, and performance analysis. + ## Key Features - **Prometheus Integration**: Exposes detailed metrics on port 9190 diff --git a/website/docs/tutorials/semantic-cache/in-memory-cache.md b/website/docs/tutorials/semantic-cache/in-memory-cache.md index d4d211cc9b..1c2fc770d4 100644 --- a/website/docs/tutorials/semantic-cache/in-memory-cache.md +++ b/website/docs/tutorials/semantic-cache/in-memory-cache.md @@ -33,6 +33,7 @@ graph TB ## How It Works ### Write Path + When caching a response: 1. Generate embedding for the query using the configured embedding model @@ -41,6 +42,7 @@ When caching a response: 4. Evict oldest/least-used entries if max_entries limit is reached ### Read Path + When searching for a cached response: 1. Generate embedding for the incoming query @@ -49,6 +51,7 @@ When searching for a cached response: 4. Otherwise, forward to LLM and cache the new response (cache miss) ### Search Methods + The cache supports two search methods: - **Linear Search**: Compares query embedding against all cached embeddings