feat: implement GCP Gemini request and response translation#819
feat: implement GCP Gemini request and response translation#819mathetake merged 20 commits intoenvoyproxy:mainfrom
Conversation
…I messages Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
mathetake
left a comment
There was a problem hiding this comment.
Can you all add e2e test cases here: https://github.com/envoyproxy/ai-gateway/blob/main/tests/extproc/testupstream_test.go#L140
- Add period in comments. - fix unnecessary variable exports. - remove role from systemInstruction. Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
| devMsg := systemMsgToDeveloperMsg(msg) | ||
| inst, err := fromDeveloperMsg(devMsg) |
There was a problem hiding this comment.
Understand you are trying to use fromDeveloperMsg, but looks a bit unnecessary that it goes from system message -> developer message -> system instruction.
There was a problem hiding this comment.
true, the alternative is to have two functions fromDeveloperMsg and fromSystemMsg which have pretty much identical function body.
prefer that approach?
There was a problem hiding this comment.
yes, can you go ahead and have two different functions instead of obfuscate the actual logic by going through unnecessary code path
- Remove unnecessary comment updates - avoid extra copy when json-parsing req-body Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
# Conflicts: # internal/controller/rotators/gcp_oidc_token_rotator.go # internal/extproc/translator/gemini_helper.go # internal/extproc/translator/openai_gcpvertexai.go
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
|
#752 @sukumargaonkar could you fix the remaining comments in this PR as well before this PR? it shouldn't take much cycles |
yes, working on addressing #819 (review) was having issues setting it up locally |
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
|
I am not going through the detail but one generic question: how are you going to translate "reasoning_effort" portion ? It is either low, medium or high (https://platform.openai.com/docs/api-reference/responses-streaming/response/incomplete) and I think it should be translated to the corresponding FWIW, Gemini on AI Studio's openai compatible endpoint translates
according to their documentation here (https://ai.google.dev/gemini-api/docs/openai). Can we do exactly the same thing or do you have different idea? We would love to see the parameter supported |
This PR only handles basic text and image input broke it down to make reviewing easier But good point, will keep your comment in mind for future PRs |
| var openAIRespBytes []byte | ||
| if len(gcpResp.Candidates) > 0 { |
There was a problem hiding this comment.
I guess if we have the if block here, openAIRespBytes has the zero-length, hence the resulting the body mutation is also nil, which results in the raw GCP response will be returned to the downstream client who is likely to be using OpenAI SDK? I think that seems like problematic. So the question would be like
- When
len(gcpResp.Candidates)== 0 happens? - If the case can happen or we cannot be certain, should be make sure that the empty openai response will be constructed in the else block here so that the downstream OpenAI SDK client won't receive the GCP raw response?
There was a problem hiding this comment.
good point.
removed the if condition
updated the corresponding test-case
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
| // buildGCPRequestMutations creates header and body mutations for GCP requests | ||
| // It sets the ":path" header, the "content-length" header and the request body. | ||
| func buildGCPRequestMutations(path string, reqBody []byte) (*ext_procv3.HeaderMutation, *ext_procv3.BodyMutation) { | ||
| func buildGCPRequestMutations(path *string, reqBody []byte) (*ext_procv3.HeaderMutation, *ext_procv3.BodyMutation) { |
There was a problem hiding this comment.
people usually do not use a pointer to the string. This unnecessarily results in escaping the string header (a pair of the length of and pointer to the buffer) to heap. You can just use len(path) != 0 to check if the string empty or not since I believe the empty path is invalid anyways.
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
| // TODO: Parse GCP error response and convert to OpenAI error format. | ||
| // For now, just return error response as-is. |
There was a problem hiding this comment.
Can we prioritize this TODO, i think it is important translation to deliver user error response.
There was a problem hiding this comment.
plan to do this in the next PR
| // 1. Obtaining an OIDC token from the configured provider. | ||
| // 2. Exchanging the OIDC token for a GCP STS token. | ||
| // 3. Using the STS token to impersonate a GCP service account. | ||
| // 4. Storing the resulting access token in a Kubernetes secret. |
|
almost there! |
There was a problem hiding this comment.
Pull Request Overview
Adds support for translating OpenAI ChatCompletion requests and responses to and from GCP Gemini (Vertex AI) models.
- Introduces tests for GCP Vertex AI backend in
tests/extproc - Implements translation logic in
internal/extproc/translator - Updates Envoy test configuration to route to the new GCP Vertex AI upstream
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/extproc/testupstream_test.go | Adds GCP Vertex AI test cases and expected headers/host handling |
| tests/extproc/extproc_test.go | Defines fakeGCPAuthToken and GCP Vertex AI schema/backend |
| tests/extproc/envoy.yaml | Configures new testupstream-gcp-vertexai cluster and routes |
| internal/extproc/translator/util.go | Extracted parseDataURI helper for image handling |
| internal/extproc/translator/openai_gcpvertexai.go | Implements request/response translation for GCP Gemini |
| internal/extproc/translator/gemini_helper.go | Helper functions for building and parsing Gemini messages |
Comments suppressed due to low confidence (2)
tests/extproc/testupstream_test.go:17
- The test uses fmt.Sprintf but 'fmt' is not imported. Please add
"fmt"to the import block.
"strconv"
internal/extproc/translator/gemini_helper.go:487
- The package alias
ext_procv3is not imported; the import should use the same alias (extprocv3) as elsewhere. Update the import toextprocv3 "github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3"and adjust references accordingly.
func buildGCPRequestMutations(path string, reqBody []byte) (*ext_procv3.HeaderMutation, *ext_procv3.BodyMutation) {
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
|
on second thought it seems like there would be a conflict with @alexagriffith's PR #838, so I am going ahead and merging to unlock you guys to work on subsequent stuff |
…xy#819) **Description** This PR request and response translation for gcp-gemini models. **Related Issues/PRs (if applicable)** Issue: envoyproxy#609 --------- Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net> Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Description
This PR request and response translation for gcp-gemini models
Related Issues/PRs (if applicable)
Issue: #609
Special notes for reviewers (if applicable)
This PR only support basic requests with text and images
Future PRs will add support for tools, streaming-requests etc.