Skip to content

feat: implement GCP Gemini request and response translation#819

Merged
mathetake merged 20 commits intoenvoyproxy:mainfrom
sukumargaonkar:gcp-basic-requests
Jul 9, 2025
Merged

feat: implement GCP Gemini request and response translation#819
mathetake merged 20 commits intoenvoyproxy:mainfrom
sukumargaonkar:gcp-basic-requests

Conversation

@sukumargaonkar
Copy link
Copy Markdown
Contributor

@sukumargaonkar sukumargaonkar commented Jul 2, 2025

Description

This PR request and response translation for gcp-gemini models

Related Issues/PRs (if applicable)

Issue: #609

Special notes for reviewers (if applicable)

This PR only support basic requests with text and images
Future PRs will add support for tools, streaming-requests etc.

…I messages

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
@sukumargaonkar sukumargaonkar requested a review from a team as a code owner July 2, 2025 21:12
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sukumargaonkar sukumargaonkar mentioned this pull request Jul 2, 2025
- Add period in comments.
- fix unnecessary variable exports.
- remove role from systemInstruction.

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Comment on lines +58 to +59
devMsg := systemMsgToDeveloperMsg(msg)
inst, err := fromDeveloperMsg(devMsg)
Copy link
Copy Markdown
Contributor

@yuzisun yuzisun Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understand you are trying to use fromDeveloperMsg, but looks a bit unnecessary that it goes from system message -> developer message -> system instruction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, the alternative is to have two functions fromDeveloperMsg and fromSystemMsg which have pretty much identical function body.

prefer that approach?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, can you go ahead and have two different functions instead of obfuscate the actual logic by going through unnecessary code path

- Remove unnecessary comment updates
- avoid extra copy when json-parsing req-body

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
# Conflicts:
#	internal/controller/rotators/gcp_oidc_token_rotator.go
#	internal/extproc/translator/gemini_helper.go
#	internal/extproc/translator/openai_gcpvertexai.go
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
@mathetake
Copy link
Copy Markdown
Member

#752 @sukumargaonkar could you fix the remaining comments in this PR as well before this PR? it shouldn't take much cycles

@sukumargaonkar
Copy link
Copy Markdown
Contributor Author

#752 @sukumargaonkar could you fix the remaining comments in this PR as well before this PR? it shouldn't take much cycles

yes, working on addressing #819 (review)

was having issues setting it up locally

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
@mathetake
Copy link
Copy Markdown
Member

I am not going through the detail but one generic question: how are you going to translate "reasoning_effort" portion ? It is either low, medium or high (https://platform.openai.com/docs/api-reference/responses-streaming/response/incomplete) and I think it should be translated to the corresponding thinking_budget parameter in Vertex AI Gemini.

FWIW, Gemini on AI Studio's openai compatible endpoint translates

"low", "medium", and "high", which map to 1,024, 8,192, and 24,576 tokens, respectively.

according to their documentation here (https://ai.google.dev/gemini-api/docs/openai).

Can we do exactly the same thing or do you have different idea? We would love to see the parameter supported

@sukumargaonkar
Copy link
Copy Markdown
Contributor Author

I am not going through the detail but one generic question: how are you going to translate "reasoning_effort" portion ? It is either low, medium or high (https://platform.openai.com/docs/api-reference/responses-streaming/response/incomplete) and I think it should be translated to the corresponding thinking_budget parameter in Vertex AI Gemini.

FWIW, Gemini on AI Studio's openai compatible endpoint translates

"low", "medium", and "high", which map to 1,024, 8,192, and 24,576 tokens, respectively.

according to their documentation here (https://ai.google.dev/gemini-api/docs/openai).

Can we do exactly the same thing or do you have different idea? We would love to see the parameter supported

This PR only handles basic text and image input
future PRs will address tools and thinking/reasoning requests

broke it down to make reviewing easier

But good point, will keep your comment in mind for future PRs

Comment on lines +80 to +81
var openAIRespBytes []byte
if len(gcpResp.Candidates) > 0 {
Copy link
Copy Markdown
Member

@mathetake mathetake Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we have the if block here, openAIRespBytes has the zero-length, hence the resulting the body mutation is also nil, which results in the raw GCP response will be returned to the downstream client who is likely to be using OpenAI SDK? I think that seems like problematic. So the question would be like

  • When len(gcpResp.Candidates) == 0 happens?
  • If the case can happen or we cannot be certain, should be make sure that the empty openai response will be constructed in the else block here so that the downstream OpenAI SDK client won't receive the GCP raw response?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
removed the if condition
updated the corresponding test-case

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
// buildGCPRequestMutations creates header and body mutations for GCP requests
// It sets the ":path" header, the "content-length" header and the request body.
func buildGCPRequestMutations(path string, reqBody []byte) (*ext_procv3.HeaderMutation, *ext_procv3.BodyMutation) {
func buildGCPRequestMutations(path *string, reqBody []byte) (*ext_procv3.HeaderMutation, *ext_procv3.BodyMutation) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

people usually do not use a pointer to the string. This unnecessarily results in escaping the string header (a pair of the length of and pointer to the buffer) to heap. You can just use len(path) != 0 to check if the string empty or not since I believe the empty path is invalid anyways.

@mathetake mathetake requested review from wengyao04 and yuzisun July 8, 2025 21:25
@mathetake mathetake requested a review from aabchoo July 8, 2025 21:25
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Comment on lines +67 to +68
// TODO: Parse GCP error response and convert to OpenAI error format.
// For now, just return error response as-is.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we prioritize this TODO, i think it is important translation to deliver user error response.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plan to do this in the next PR

Comment on lines +68 to +71
// 1. Obtaining an OIDC token from the configured provider.
// 2. Exchanging the OIDC token for a GCP STS token.
// 3. Using the STS token to impersonate a GCP service account.
// 4. Storing the resulting access token in a Kubernetes secret.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert this ?

@mathetake mathetake requested a review from Copilot July 9, 2025 15:52
@mathetake
Copy link
Copy Markdown
Member

almost there!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for translating OpenAI ChatCompletion requests and responses to and from GCP Gemini (Vertex AI) models.

  • Introduces tests for GCP Vertex AI backend in tests/extproc
  • Implements translation logic in internal/extproc/translator
  • Updates Envoy test configuration to route to the new GCP Vertex AI upstream

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/extproc/testupstream_test.go Adds GCP Vertex AI test cases and expected headers/host handling
tests/extproc/extproc_test.go Defines fakeGCPAuthToken and GCP Vertex AI schema/backend
tests/extproc/envoy.yaml Configures new testupstream-gcp-vertexai cluster and routes
internal/extproc/translator/util.go Extracted parseDataURI helper for image handling
internal/extproc/translator/openai_gcpvertexai.go Implements request/response translation for GCP Gemini
internal/extproc/translator/gemini_helper.go Helper functions for building and parsing Gemini messages
Comments suppressed due to low confidence (2)

tests/extproc/testupstream_test.go:17

  • The test uses fmt.Sprintf but 'fmt' is not imported. Please add "fmt" to the import block.
	"strconv"

internal/extproc/translator/gemini_helper.go:487

  • The package alias ext_procv3 is not imported; the import should use the same alias (extprocv3) as elsewhere. Update the import to extprocv3 "github.com/envoyproxy/go-control-plane/envoy/service/ext_proc/v3" and adjust references accordingly.
func buildGCPRequestMutations(path string, reqBody []byte) (*ext_procv3.HeaderMutation, *ext_procv3.BodyMutation) {

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not going through the actual translation logic but LGTM on the other parts generally! Thanks! I will defer to @yuzisun for the final stamping

@mathetake
Copy link
Copy Markdown
Member

on second thought it seems like there would be a conflict with @alexagriffith's PR #838, so I am going ahead and merging to unlock you guys to work on subsequent stuff

@mathetake mathetake merged commit 7d293fa into envoyproxy:main Jul 9, 2025
24 checks passed
alexagriffith added a commit to sukumargaonkar/ai-gateway that referenced this pull request Jul 11, 2025
…xy#819)

**Description**

This PR request and response translation for gcp-gemini models.

**Related Issues/PRs (if applicable)**

Issue: envoyproxy#609

---------

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Signed-off-by: Alexa Griffith <agriffith50@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants