Conversation
mattklein123
approved these changes
Mar 20, 2017
Member
|
+1 |
mathetake
added a commit
that referenced
this pull request
Mar 3, 2026
**Commit Message**
This commit is a relatively large refactoring of internals to make Envoy
AI Gateawy's API more aligned with Envoy Gateway's BackendTrafficPolicy
as well as HTTPRoute. Specifically, the main objective here to allow
failover and retires to work well across multiple AIServiceBackend.
One of the most notable changes in this commit is that we split the
extproc's logic into two phases; one is executed at the normal router
level that selects a route (as opposed to the backend selection
previously) and the other as the upstream filter that performs auth and
transformation. In other words, Envoy AI Gateway configures two external
processing filters.
As a result, users are now able to configure failover as well as the
retry/fallback using Envoy Gateway's BackendTrafficPolicy attached to
HTTPRoute generated by the Envoy AI Gateway. For example, this allows us
to support the case where primary cluster is an Azure OpenAI and when
it's failing, the AI Gateway fallbacks to AWS Bedrock with the standard
Envoy Gateway configuration.
**Background**
At the Envoy configuration level, Envoy Gateway translates multiple
backends in a single HTTPRoute's Rule into a single Envoy cluster whose
endpoints consists of multiple Endpoint set (called
`LocalityLbEndpoints` in Envoy API [1]) and each set corresponds to a
Backend with priority configured. For example, very roughly speaking,
the following pseudo HTTPRoute
```yaml
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata
name: provider-fallback
spec:
rules:
- backendRefs:
- group: gateway.envoyproxy.io
kind: Backend
name: primary-backend
- group: gateway.envoyproxy.io
kind: Backend
name: secondary-backend
matches:
- path:
type: PathPrefix
value: /
```
will be translated as, when `secondary-backend` is marked as `fallback:
true` in its Backend definition ([2]):
```yaml
- cluster:
'@type': type.googleapis.com/envoy.config.cluster.v3.Cluster
loadAssignment:
clusterName: httproute/default/provider-fallback/rule/0
endpoints:
- lbEndpoints:
- endpoint:
address:
socketAddress:
address: primary.com
portValue: 443
priority: 0
- lbEndpoints:
- endpoint:
address:
socketAddress:
address: secondary.com
portValue: 443
priority: 1
```
where priority is configured 0 and 1 for each primary and secondary
backend. When retry or passive health check is configured, Envoy will
retry or fallback into the secondary cluster.
In our API, transformation as well as upstream authentication must be
performed per Backend so these logic must be inserted after this
endpoint set (or LocalityLbEndpoints to be precise) is chosen by Envoy.
For example, primary.com and secondary.com might have different API
schema, authentication etc. Since Envoy has a specific HTTP filter chain
that will be executed at this stage, which is called "upstream filters",
if we insert the extproc that performs these logic, we can properly do
authn/z and transformation in response to the retry attempts by Envoy
natively.
From the upstream filter level external processor's perspective, it
needs to know which exactly backend is chosen by the Envoy's cluster
load balancing logic. We add some additional metadata information into
the endpoint with EG's extension server so that the extproc can retrieve
these information. We also use the extension server to insert the
upstream extproc filter since currently it's not supported by EG. These
logic in our extension server can be eliminated when the corresponding
functionality become available in EG ([3],[4]).
**Caveats**
* Due to the limitation of EG's extension server API, AIBackendService
that references k8s Service cannot be supported so we have to drop the
support for it. Since there's a workaround for it, it should be fine
plus EG can be fixed easily so the version after the next release should
be able to revive the support.
* `aigw run` temporarily disabled until [5] is resolved
* Infernce Extension support temporarily disabled but will be revived
before the next release.
[1]
https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto
[2]
https://gateway.envoyproxy.io/latest/api/extension_types/#backendspec
[3] envoyproxy/gateway#5523
[4] envoyproxy/gateway#5351
[5] envoyproxy/gateway#5918
**Related Issues/PRs (if applicable)**
Partially resolves the provider level fallbacks for #34
---------
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
mathetake
added a commit
that referenced
this pull request
Mar 3, 2026
**Commit Message** This deprecates the AIServiceBackend.Timeouts configuration that has started working not well with the refactored use of HTTPRoute since #599. Instead, this adds `timeouts` into AIGatewayRouteRule to matche the one of HTTPRoute in GWAPI. --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
mathetake
added a commit
that referenced
this pull request
Mar 3, 2026
**Commit Message** This fixes `aigw run` command which has been disabled since the refactoring in #599. This requires a couple bug fixes in Envoy Gateway side, so this commit includes the upgrade of the EG as a dependency. **Related Issues/PRs (if applicable)** * Closes #607 * Includes envoyproxy/gateway/pull/5984 * Includes envoyproxy/gateway/pull/6020 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
mathetake
added a commit
that referenced
this pull request
Mar 3, 2026
**Commit Message** This commit refactors the internal on how the ext proc is deployed. Specifically, this switches to insert the ext proc container as a sidecar container of Envoy pods created by Envoy Gateway. This is another large refactoring that turned out necessary for #599. This utilizes the mutating webhook to insert the extproc container Envoy pods. Making the extproc as as sidecar means that we now have a one-to-one mapping between Gateway and the extproc hence this naturally resolves the previously known limitation #509 and now users can attach multiple AIGatewayRoute(s) to one Gateway. Implementation note: since the volume mounts only work in the namespace-scoped way, use-created secrets (like API Keys) cannot be mounted by the extproc as it runs in "envoy-gateway-system" namespace. To resolve this, now the controller reads the secret and embed the read credentials into the "extproc secret" (which is previously known as "extproc configmap") together with routing, matching and backend information. That secret is written in the "envoy-gateway-system" namespace hence it can be mounted by the extproc container. **Related Issues/PRs (if applicable)** Resolves #509 Resolves #621 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
mathetake
added a commit
that referenced
this pull request
Mar 3, 2026
**Description** This commit removes the handwritten header matching code from the extproc, and instead starts utilizing the hardened envoy native router. Historically, we had only one giant extproc filter where we did all logics including model name extraction, routing and then body transformation & upstream authorization. Since #599, we split into two external processor filters; one sits at the normal HTTP router and the other is configured at the per-cluster upstream HTTP filter. In theory, the one at HTTP router has only one job on request path: extracting model name from the request body. However, due to the historical reason, the handwritten router logic component remained, and that comes with not only a maintenance cost (forcing a complex extproc & control plane orchestration) but also a potential security vulnerability. In fact, writing header matching logic can be an easy attack surface, so if it's possible, we should avoid writing our own header matching (routing logic) but should rely on the battle-tested hardened envoy native router. With this commit, now a regex matching is available as well as there's no difference between HTTPRoute's matching and AIGatewayRoute's matching implementation. This also opens up a possibility to support path matching in our rule. **Related Issues/PRs (if applicable)** Ref #612 Ref #73 --------- Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.