feat: add support for endpoint picker#823
Conversation
8889ce9 to
b24d435
Compare
c8dc839 to
afed30f
Compare
|
instead of creating eep, how about adding extproc into the specific route which routes to the inference pool lb policy cluster? that way other normal routes won't need to talk to eep unnecessarily. This could be done in extension server i guess? |
That’s a very good point. We need to make sure normal routes do not go through epp. |
|
Yep, that is reasonable :) |
|
having said that, one concern about the per route is that it might not work well with ClearRouteCache: true which is set by our AI Gateway extproc. The EPP extproc must come after the ai gateway extproc since until then envoy doesn't know the destination. However, per route filter config might not work well with the deferred route calculation. Maybe it's not the case but something i am worried about it now... |
4943f6e to
b676e5e
Compare
d9db2fe to
662b2be
Compare
65fd367 to
41ff073
Compare
493fa0f to
1140e60
Compare
mathetake
left a comment
There was a problem hiding this comment.
thank you for your hard work and multiple iterations here. This looks really good especially it can both support the direct HTTPRoute as well as AIGateawyRoute in a way that it doesn't affect the other normal routes.
| if c.modelNameOverride == "" && isEndpointPicker { | ||
| c.modelNameOverride = c.requestHeaders[c.config.modelNameHeaderKey] | ||
| } |
There was a problem hiding this comment.
i think you don't need this one. modelNameOverride is optional
There was a problem hiding this comment.
this is only left one review i dont make a change. the reason is when i have tested inferencepool with no model override, i got a 400 no Content-Length header, using request body length from the testupstream. the reason i think is the route level epp extproc removed the content-length header, so set modelNameOverride explicitly when it is empty can trigger the content-length header addition in upstream level filter processing header. which prevents the 400 from the testupstream
There was a problem hiding this comment.
Why does epp extproc remove the content length header if it is set? that does not sound right.
There was a problem hiding this comment.
yeah that doesn't seem right, and having this solely for content-length header workaround is not the right way to at least. The reason it's missing content-length header is that the upstream AI Gateawy Filter uses REPLACE_AND_CONTINUE option at header mutation phase. That's why we are having header mutation filter after the ai gateway upstream filter (see #818). so could you remove this line and add the header mutating filter like this
ai-gateway/internal/extensionserver/extensionserver.go
Lines 276 to 297 in 45c18d3
There was a problem hiding this comment.
done, rebased with main
mathetake
left a comment
There was a problem hiding this comment.
OK now i got to understand 100% what's going on, and matches my suggested implementation. this is all good. I left lost of comments but all of them are not that important and do not require large code change. After you address them, i think we are good to go!
39e39a1 to
d851d49
Compare
|
@Xunzhuo @mathetake could you hold off the merging, we will run our integration testing suite today to make sure nothing is breaking. |
|
@yuzisun sure, how long will it take? Can you send a ping when it's done : ) |
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
Signed-off-by: bitliu <bitliu@tencent.com>
|
LGTM and let's wait for the extensive tests by Dan's team before landing... exciting!! |
Signed-off-by: bitliu <bitliu@tencent.com>

Description
This PR addes support for inferencePool, which allows Envoy AI Gateway to integrate with ANY endpoint picker who is supported the inferencePool.
By integrating with the Endpoint Picker like Gateway API Inference Extenstion or the non-GIE EPP, it can expand Envoy AI Gateway`s abilities to advanced scheduleing algorithm to optimize inference.
Related Issues/PRs (if applicable)
Fixes: #423
Fixes #604
Fixes: #648
Some follow-up: #911