Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ To learn more about configuring and using the Envoy AI Gateway with these endpoi

- **[Supported Providers](./supported-providers.md)** - Complete list of supported AI providers and their configurations
- **[Usage-Based Rate Limiting](../traffic/usage-based-ratelimiting.md)** - Configure token-based rate limiting and cost controls
- **[Provider Fallback](../traffic/fallback.md)** - Set up automatic failover between providers for high availability
- **[Provider Fallback](../traffic/provider-fallback.md)** - Set up automatic failover between providers for high availability
- **[Metrics and Monitoring](../observability/metrics.md)** - Monitor usage, costs, and performance metrics

[issue#609]: https://github.com/envoyproxy/ai-gateway/issues/609
87 changes: 87 additions & 0 deletions site/docs/capabilities/traffic/model-virtualization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
id: model-name-virtualization
title: Model Name Virtualization
sidebar_position: 7
---

import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

Envoy AI Gateway provides an advanced model name virtualization capability that allows you to manage and route requests to different AI models seamlessly.
This guide covers the key feature and configuration for model virtualization.

## Motivation

It is not uncommon for multiple AI providers to offer a similar or identical model, such as Llama-3-70b, etc.
However, each provider tends to have its own unique naming convention for the same model.
For example, `Claude 3.5 Sonnet` is hosted both on GCP and AWS Bedrock, but they have different model names:
* GCP: `claude-3-5-sonnet-v2@20241022`, etc.
* AWS Bedrock: `arn:aws:bedrock:us-west-2:123456789012:provisioned-model/abc123xyz`

From downstream GenAI applications' perspective, it is beneficial to have a unified model name that abstracts away these differences.

## Virtualization with modelNameOverride API

In our top level AIGatewayRoute configuration, you can specify a `modelNameOverride` inside [AIGatewayRouteBackendRef](/api/api.mdx#aigatewayrouterulebackendref) on each route rule to override the model name that is sent to the upstream AI provider.
This feature is primarily designed for scenarios where you want to dynamically change the model name based on the actual AI provider the request is being sent to.

The example configuration looks like this:

```yaml
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: test-route
spec:
targetRefs: [...]
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: claude-3-5-sonnet-v2
backendRefs:
- name: aws-backend
modelNameOverride: arn:aws:bedrock:us-west-2:123456789012:provisioned-model/abc123xyz
weight: 50
- name: gcp-backend
modelNameOverride: claude-3-5-sonnet-v2@20241022
weight: 50
```

This configuration allows downstream applications to use a unified model name `claude-3-5-sonnet-v2` while splitting traffic between the AWS Bedrock and GCP AI providers based on the specified `modelNameOverride`.
This is what the word "Virtualization" means in this context: abstracting away the differences in model names across different AI providers and providing a unified interface for downstream applications.
It also can be thought of as "one-to-many" aliasing of model names, where one unified model name can map to multiple different model names on different providers depending on the routing path.

## Virtualization for fallback scenarios

As we see in the [Provider Fallback](./provider-fallback) page, Envoy AI Gateway allows you to fallback to a different AI provider if the primary one fails.
However, sometimes we want to fallback to a different model on the same provider.
For example, it is natural to set up the Envoy AI Gateway in a way that if the primary expensive model fails (rate limit, etc), Envoy retries the request to a less expensive model on the same provider.
More concretely, if the request to `gpt-4` fails, we want to retry it with `gpt-3.5-turbo` on the same OpenAI provider.

`modelNameOverride` can also be used in this scenario to achieve the desired behavior. The configuration would look like this:

```yaml
apiVersion: aigateway.envoyproxy.io/v1alpha1
kind: AIGatewayRoute
metadata:
name: test-route
spec:
targetRefs: [...]
rules:
- matches:
- headers:
- type: Exact
name: x-ai-eg-model
value: gpt-4
backendRefs:
- name: openai-backend
# This doesn't specify modelNameOverride, so it will use the default model name `gpt-4` in the request.
priority: 0
- name: openai-backend
modelNameOverride: gpt-3.5-turbo
priority: 1
```

With this configuration, assuming the retry is properly configured as per the [Provider Fallback](./provider-fallback) page, if the request to `gpt-4` fails, Envoy AI Gateway will automatically retry the request to `gpt-3.5-turbo` on the same OpenAI provider without requiring any changes to the downstream application.
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
---
id: provider-fallback
title: Provider Fallback
sidebar_position: 6
---

# Provider Fallback

Envoy AI Gateway supports provider fallback to ensure high availability and reliability for AI/LLM workloads. With fallback, you can configure multiple upstream providers for a single route, so that if the primary provider fails (due to network errors, 5xx responses, or other health check failures), traffic is automatically routed to a healthy fallback provider.
Expand Down