feat: add support for endpoint picker by Xunzhuo · Pull Request #823 · envoyproxy/ai-gateway

Xunzhuo · 2025-07-04T04:24:12Z

Description

This PR addes support for inferencePool, which allows Envoy AI Gateway to integrate with ANY endpoint picker who is supported the inferencePool.

By integrating with the Endpoint Picker like Gateway API Inference Extenstion or the non-GIE EPP, it can expand Envoy AI Gateway`s abilities to advanced scheduleing algorithm to optimize inference.

Related Issues/PRs (if applicable)

Fixes: #423
Fixes #604
Fixes: #648

Some follow-up: #911

internal/controller/gateway.go

mathetake · 2025-07-04T18:19:15Z

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

yuzisun · 2025-07-04T22:46:26Z

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

That’s a very good point. We need to make sure normal routes do not go through epp.

Xunzhuo · 2025-07-05T00:52:26Z

Yep, that is reasonable :)

mathetake · 2025-07-06T19:05:43Z

having said that, one concern about the per route is that it might not work well with ClearRouteCache: true which is set by our AI Gateway extproc. The EPP extproc must come after the ai gateway extproc since until then envoy doesn't know the destination. However, per route filter config might not work well with the deferred route calculation. Maybe it's not the case but something i am worried about it now...

internal/extensionserver/post_cluster_modify.go

Xunzhuo · 2025-07-18T09:48:53Z

e2e passed

mathetake

thank you for your hard work and multiple iterations here. This looks really good especially it can both support the direct HTTPRoute as well as AIGateawyRoute in a way that it doesn't affect the other normal routes.

api/v1alpha1/ai_gateway_route_helper.go

internal/internalapi/internalapi.go

internal/extproc/server_test.go

mathetake · 2025-07-18T16:35:15Z

internal/extproc/chatcompletion_processor.go

+	if c.modelNameOverride == "" && isEndpointPicker {
+		c.modelNameOverride = c.requestHeaders[c.config.modelNameHeaderKey]
+	}


i think you don't need this one. modelNameOverride is optional

this is only left one review i dont make a change. the reason is when i have tested inferencepool with no model override, i got a 400 no Content-Length header, using request body length from the testupstream. the reason i think is the route level epp extproc removed the content-length header, so set modelNameOverride explicitly when it is empty can trigger the content-length header addition in upstream level filter processing header. which prevents the 400 from the testupstream

Why does epp extproc remove the content length header if it is set? that does not sound right.

yeah that doesn't seem right, and having this solely for content-length header workaround is not the right way to at least. The reason it's missing content-length header is that the upstream AI Gateawy Filter uses REPLACE_AND_CONTINUE option at header mutation phase. That's why we are having header mutation filter after the ai gateway upstream filter (see #818). so could you remove this line and add the header mutating filter like this

ai-gateway/internal/extensionserver/extensionserver.go

Lines 276 to 297 in 45c18d3

headerMutFilter := &httpconnectionmanagerv3.HttpFilter{

Name: "envoy.filters.http.header_mutation",

ConfigType: &httpconnectionmanagerv3.HttpFilter_TypedConfig{

TypedConfig: mustToAny(&header_mutationv3.HeaderMutation{

Mutations: &header_mutationv3.Mutations{

RequestMutations: []*mutation_rulesv3.HeaderMutation{

{

Action: &mutation_rulesv3.HeaderMutation_Append{

Append: &corev3.HeaderValueOption{

AppendAction: corev3.HeaderValueOption_ADD_IF_ABSENT,

Header: &corev3.HeaderValue{

Key: "content-length",

Value: `%DYNAMIC_METADATA(` + aigv1a1.AIGatewayFilterMetadataNamespace + `:content_length)%`,

},

},

},

},

},

},

}),

},

}

done, rebased with main

internal/extensionserver/post_cluster_modify.go

internal/extensionserver/post_route_modify.go

internal/extensionserver/post_translate_modify.go

mathetake

OK now i got to understand 100% what's going on, and matches my suggested implementation. this is all good. I left lost of comments but all of them are not that important and do not require large code change. After you address them, i think we are good to go!

internal/controller/ai_gateway_route.go

internal/controller/gateway.go

internal/extensionserver/post_cluster_modify.go

internal/extensionserver/post_route_modify.go

internal/extensionserver/post_translate_modify.go

api/v1alpha1/ai_gateway_route.go

internal/extensionserver/post_cluster_modify.go

yuzisun · 2025-07-21T13:31:31Z

@Xunzhuo @mathetake could you hold off the merging, we will run our integration testing suite today to make sure nothing is breaking.

Xunzhuo · 2025-07-21T14:45:17Z

@yuzisun sure, how long will it take? Can you send a ping when it's done : )

internal/extensionserver/post_cluster_modify.go

internal/extproc/chatcompletion_processor.go

Signed-off-by: bitliu <bitliu@tencent.com>

mathetake · 2025-07-21T22:18:39Z

LGTM and let's wait for the extensive tests by Dan's team before landing... exciting!!

examples/inference-pool/base.yaml

Signed-off-by: bitliu <bitliu@tencent.com>

Xunzhuo changed the title ~~feat: add support for InferencePool based endpoint picker~~ feat: add support for InferencePool Jul 4, 2025

Xunzhuo force-pushed the feat-epp-integration branch 12 times, most recently from 8889ce9 to b24d435 Compare July 4, 2025 13:23

yuzisun reviewed Jul 4, 2025

View reviewed changes

internal/controller/gateway.go Outdated Show resolved Hide resolved

internal/controller/gateway.go Outdated Show resolved Hide resolved

internal/controller/gateway.go Outdated Show resolved Hide resolved

Xunzhuo force-pushed the feat-epp-integration branch 3 times, most recently from c8dc839 to afed30f Compare July 4, 2025 14:34

Xunzhuo force-pushed the feat-epp-integration branch from 4943f6e to b676e5e Compare July 7, 2025 07:31

github-advanced-security bot found potential problems Jul 7, 2025

View reviewed changes

internal/extensionserver/post_cluster_modify.go Fixed Show fixed Hide fixed

Xunzhuo force-pushed the feat-epp-integration branch 7 times, most recently from d9db2fe to 662b2be Compare July 7, 2025 12:53

Xunzhuo force-pushed the feat-epp-integration branch 3 times, most recently from 65fd367 to 41ff073 Compare July 18, 2025 09:45

Xunzhuo force-pushed the feat-epp-integration branch 2 times, most recently from 493fa0f to 1140e60 Compare July 18, 2025 14:28

Xunzhuo marked this pull request as ready for review July 18, 2025 14:51

Xunzhuo requested a review from a team as a code owner July 18, 2025 14:51

mathetake self-assigned this Jul 18, 2025

mathetake reviewed Jul 18, 2025

View reviewed changes

Xunzhuo force-pushed the feat-epp-integration branch 2 times, most recently from 39e39a1 to d851d49 Compare July 19, 2025 09:10

yuzisun reviewed Jul 20, 2025

View reviewed changes

api/v1alpha1/ai_gateway_route.go Show resolved Hide resolved

yuzisun reviewed Jul 20, 2025

View reviewed changes

internal/extensionserver/post_cluster_modify.go Outdated Show resolved Hide resolved

mathetake reviewed Jul 21, 2025

View reviewed changes

internal/extensionserver/post_cluster_modify.go Outdated Show resolved Hide resolved

mathetake reviewed Jul 21, 2025

View reviewed changes

internal/extproc/chatcompletion_processor.go Outdated Show resolved Hide resolved

Xunzhuo added 5 commits July 22, 2025 06:14

feat: add support for endpoint picker

3cfa662

Signed-off-by: bitliu <bitliu@tencent.com>

resolve feedbacks

db4e17a

Signed-off-by: bitliu <bitliu@tencent.com>

adjust connection timeout

7566fbc

Signed-off-by: bitliu <bitliu@tencent.com>

update: resolve feedbacks

6a87a48

Signed-off-by: bitliu <bitliu@tencent.com>

update: resolve feedbacks

1745e68

Signed-off-by: bitliu <bitliu@tencent.com>

yuzisun reviewed Jul 22, 2025

View reviewed changes

examples/inference-pool/base.yaml Outdated Show resolved Hide resolved

yuzisun reviewed Jul 22, 2025

View reviewed changes

examples/inference-pool/base.yaml Outdated Show resolved Hide resolved

yuzisun reviewed Jul 22, 2025

View reviewed changes

examples/inference-pool/base.yaml Outdated Show resolved Hide resolved

update example

05fe38d

Signed-off-by: bitliu <bitliu@tencent.com>

yuzisun approved these changes Jul 22, 2025

View reviewed changes

	headerMutFilter := &httpconnectionmanagerv3.HttpFilter{
	Name: "envoy.filters.http.header_mutation",
	ConfigType: &httpconnectionmanagerv3.HttpFilter_TypedConfig{
	TypedConfig: mustToAny(&header_mutationv3.HeaderMutation{
	Mutations: &header_mutationv3.Mutations{
	RequestMutations: []*mutation_rulesv3.HeaderMutation{
	{
	Action: &mutation_rulesv3.HeaderMutation_Append{
	Append: &corev3.HeaderValueOption{
	AppendAction: corev3.HeaderValueOption_ADD_IF_ABSENT,
	Header: &corev3.HeaderValue{
	Key: "content-length",
	Value: `%DYNAMIC_METADATA(` + aigv1a1.AIGatewayFilterMetadataNamespace + `:content_length)%`,
	},
	},
	},
	},
	},
	},
	}),
	},
	}

Conversation

Xunzhuo commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathetake commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuzisun commented Jul 4, 2025

Uh oh!

Xunzhuo commented Jul 5, 2025

Uh oh!

mathetake commented Jul 6, 2025

Uh oh!

Uh oh!

Xunzhuo commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mathetake left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathetake Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Xunzhuo Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuzisun Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

mathetake Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Xunzhuo Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mathetake left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuzisun commented Jul 21, 2025

Uh oh!

Xunzhuo commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

mathetake commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Xunzhuo commented Jul 4, 2025 •

edited

Loading

mathetake commented Jul 4, 2025 •

edited

Loading

Xunzhuo commented Jul 18, 2025 •

edited

Loading

Xunzhuo Jul 19, 2025 •

edited

Loading

Xunzhuo Jul 21, 2025 •

edited

Loading