Skip to content

feat: add support for endpoint picker#823

Merged
Xunzhuo merged 6 commits intoenvoyproxy:mainfrom
Xunzhuo:feat-epp-integration
Jul 22, 2025
Merged

feat: add support for endpoint picker#823
Xunzhuo merged 6 commits intoenvoyproxy:mainfrom
Xunzhuo:feat-epp-integration

Conversation

@Xunzhuo
Copy link
Member

@Xunzhuo Xunzhuo commented Jul 4, 2025

Description

This PR addes support for inferencePool, which allows Envoy AI Gateway to integrate with ANY endpoint picker who is supported the inferencePool.

By integrating with the Endpoint Picker like Gateway API Inference Extenstion or the non-GIE EPP, it can expand Envoy AI Gateway`s abilities to advanced scheduleing algorithm to optimize inference.

Related Issues/PRs (if applicable)

Fixes: #423
Fixes #604
Fixes: #648

Some follow-up: #911

@Xunzhuo Xunzhuo changed the title feat: add support for InferencePool based endpoint picker feat: add support for InferencePool Jul 4, 2025
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 12 times, most recently from 8889ce9 to b24d435 Compare July 4, 2025 13:23
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 3 times, most recently from c8dc839 to afed30f Compare July 4, 2025 14:34
@mathetake
Copy link
Member

mathetake commented Jul 4, 2025

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

@yuzisun
Copy link
Contributor

yuzisun commented Jul 4, 2025

instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess?

That’s a very good point. We need to make sure normal routes do not go through epp.

@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 5, 2025

Yep, that is reasonable :)

@mathetake
Copy link
Member

having said that, one concern about the per route is that it might not work well with ClearRouteCache: true which is set by our AI Gateway extproc. The EPP extproc must come after the ai gateway extproc since until then envoy doesn't know the destination. However, per route filter config might not work well with the deferred route calculation. Maybe it's not the case but something i am worried about it now...

@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch from 4943f6e to b676e5e Compare July 7, 2025 07:31
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 7 times, most recently from d9db2fe to 662b2be Compare July 7, 2025 12:53
@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 3 times, most recently from 65fd367 to 41ff073 Compare July 18, 2025 09:45
@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 18, 2025

Clipboard_Screenshot_1752832117

e2e passed

@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 2 times, most recently from 493fa0f to 1140e60 Compare July 18, 2025 14:28
@Xunzhuo Xunzhuo marked this pull request as ready for review July 18, 2025 14:51
@Xunzhuo Xunzhuo requested a review from a team as a code owner July 18, 2025 14:51
@mathetake mathetake self-assigned this Jul 18, 2025
Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for your hard work and multiple iterations here. This looks really good especially it can both support the direct HTTPRoute as well as AIGateawyRoute in a way that it doesn't affect the other normal routes.

Comment on lines +348 to +350
if c.modelNameOverride == "" && isEndpointPicker {
c.modelNameOverride = c.requestHeaders[c.config.modelNameHeaderKey]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think you don't need this one. modelNameOverride is optional

Copy link
Member Author

@Xunzhuo Xunzhuo Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only left one review i dont make a change. the reason is when i have tested inferencepool with no model override, i got a 400 no Content-Length header, using request body length from the testupstream. the reason i think is the route level epp extproc removed the content-length header, so set modelNameOverride explicitly when it is empty can trigger the content-length header addition in upstream level filter processing header. which prevents the 400 from the testupstream

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does epp extproc remove the content length header if it is set? that does not sound right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that doesn't seem right, and having this solely for content-length header workaround is not the right way to at least. The reason it's missing content-length header is that the upstream AI Gateawy Filter uses REPLACE_AND_CONTINUE option at header mutation phase. That's why we are having header mutation filter after the ai gateway upstream filter (see #818). so could you remove this line and add the header mutating filter like this

headerMutFilter := &httpconnectionmanagerv3.HttpFilter{
Name: "envoy.filters.http.header_mutation",
ConfigType: &httpconnectionmanagerv3.HttpFilter_TypedConfig{
TypedConfig: mustToAny(&header_mutationv3.HeaderMutation{
Mutations: &header_mutationv3.Mutations{
RequestMutations: []*mutation_rulesv3.HeaderMutation{
{
Action: &mutation_rulesv3.HeaderMutation_Append{
Append: &corev3.HeaderValueOption{
AppendAction: corev3.HeaderValueOption_ADD_IF_ABSENT,
Header: &corev3.HeaderValue{
Key: "content-length",
Value: `%DYNAMIC_METADATA(` + aigv1a1.AIGatewayFilterMetadataNamespace + `:content_length)%`,
},
},
},
},
},
},
}),
},
}

Copy link
Member Author

@Xunzhuo Xunzhuo Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, rebased with main

Copy link
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK now i got to understand 100% what's going on, and matches my suggested implementation. this is all good. I left lost of comments but all of them are not that important and do not require large code change. After you address them, i think we are good to go!

@Xunzhuo Xunzhuo force-pushed the feat-epp-integration branch 2 times, most recently from 39e39a1 to d851d49 Compare July 19, 2025 09:10
@yuzisun
Copy link
Contributor

yuzisun commented Jul 21, 2025

@Xunzhuo @mathetake could you hold off the merging, we will run our integration testing suite today to make sure nothing is breaking.

@Xunzhuo
Copy link
Member Author

Xunzhuo commented Jul 21, 2025

@yuzisun sure, how long will it take? Can you send a ping when it's done : )

Xunzhuo added 5 commits July 22, 2025 06:14
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
@mathetake
Copy link
Member

LGTM and let's wait for the extensive tests by Dan's team before landing... exciting!!

Signed-off-by: bitliu <bitliu@tencent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants