Fix: Vertex ai Batch Output File Download Fails with 500#23718
Fix: Vertex ai Batch Output File Download Fails with 500#23718Sameerlite merged 1 commit intomainfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR fixes a 500 error when downloading Vertex AI batch output files through the LiteLLM proxy. Two root causes are addressed: (1) the unified output file ID was built with a blank Key changes:
Remaining concern: The Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| enterprise/litellm_enterprise/proxy/hooks/managed_files.py | Adds model-name recovery fallback from managed input file ID when model_name is absent in _hidden_params. The fix correctly passes resolved_model_name to get_unified_output_file_id, but the parallel else branch for litellm.afile_retrieve still uses the original model_name variable, leaving a partial fix for the Vertex AI case when _llm_router is unavailable. |
| litellm/llms/vertex_ai/batches/transformation.py | Fixes _get_output_file_id_from_vertex_ai_batch_response to avoid pre-emptively appending /predictions.jsonl before checking for an empty GCS directory, and consistently appends the suffix in both the primary and fallback code paths. Logic is sound; the residual != "/predictions.jsonl" guard is technically correct but slightly confusing after the refactor. |
| tests/enterprise/litellm_enterprise/proxy/hooks/test_managed_files.py | Adds a regression test that verifies target_model_names are preserved in the unified output file ID when model_name is absent from _hidden_params. Uses mocks correctly; no real network calls. |
| tests/test_litellm/llms/vertex_ai/test_vertex_ai_batch_transformation.py | New unit tests covering the primary (outputInfo.gcsOutputDirectory) and fallback (outputConfig.gcsDestination.outputUriPrefix) paths for output file ID generation. Missing a test for the outputUriPrefix that already ends with /predictions.jsonl. |
Sequence Diagram
sequenceDiagram
participant Client
participant Proxy as LiteLLM Proxy
participant Hook as _PROXY_LiteLLMManagedFiles
participant Vertex as Vertex AI
participant Cache as Unified File Cache
Client->>Proxy: POST /v1/batches (with managed file ID)
Proxy->>Vertex: Submit batch job
Vertex-->>Proxy: batch response (no model_name in hidden_params)
Proxy->>Hook: async_post_call_success_hook(response)
Note over Hook: model_name is None from _hidden_params
Hook->>Hook: Decode unified_file_id (base64)
Hook->>Hook: get_models_from_unified_file_id()<br/>→ ["gemini-2.5-pro"]
Hook->>Hook: resolved_model_name = "gemini-2.5-pro"
Hook->>Hook: get_unified_output_file_id(<br/>output_file_id, model_id,<br/>resolved_model_name)
Note over Hook: Unified ID now contains<br/>target_model_names,gemini-2.5-pro
Hook->>Vertex: afile_retrieve(original_file_id, **creds)
Vertex-->>Hook: file object
Hook->>Cache: store_unified_file_id(unified_file_id, file_object)
Proxy-->>Client: batch response with correct unified output_file_id
Client->>Proxy: GET /v1/files/{unified_output_file_id}/content
Proxy->>Cache: lookup unified_output_file_id
Cache-->>Proxy: file object with routing metadata
Proxy-->>Client: 200 OK (file content)
Comments Outside Diff (1)
-
enterprise/litellm_enterprise/proxy/hooks/managed_files.py, line 956-959 (link)Incomplete fix:
model_namestill used in fallback branchThis
elsebranch (reached when_llm_routerisNone) still referencesmodel_name— the original, potentially-Nonevariable — instead ofresolved_model_name, which was introduced by this PR to recover the provider from the managed input file ID.When
model_nameisNone(exactly the Vertex AI case this PR fixes) and_llm_routerhappens to beNone,custom_llm_providerwill silently fall back to"openai", causing a failed file-retrieve attempt on a GCS-backed file.
Last reviewed commit: 22b333c
| if output_file_id: | ||
| output_file_id = output_file_id.rstrip("/") + "/predictions.jsonl" | ||
| if output_file_id and output_file_id != "/predictions.jsonl": | ||
| return output_file_id |
There was a problem hiding this comment.
Redundant guard after rstrip logic
After the if output_file_id: block runs, output_file_id is always stripped_dir + "/predictions.jsonl" where stripped_dir is the non-empty gcsOutputDirectory with trailing slashes removed. It can only equal "/predictions.jsonl" if the original directory value was literally "/" or a sequence of slashes — an essentially invalid GCS path.
The condition is technically harmless (and does handle that theoretical edge case), but it reads confusingly after the refactor. Consider a brief comment to clarify its purpose:
| if output_file_id: | |
| output_file_id = output_file_id.rstrip("/") + "/predictions.jsonl" | |
| if output_file_id and output_file_id != "/predictions.jsonl": | |
| return output_file_id | |
| if output_file_id: | |
| output_file_id = output_file_id.rstrip("/") + "/predictions.jsonl" | |
| # Guard against a bare "/" directory producing "/predictions.jsonl" | |
| if output_file_id and output_file_id != "/predictions.jsonl": | |
| return output_file_id |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| from litellm.llms.vertex_ai.batches.transformation import VertexAIBatchTransformation | ||
|
|
||
|
|
||
| def test_output_file_id_uses_predictions_jsonl_with_output_info(): | ||
| response = { | ||
| "outputInfo": { | ||
| "gcsOutputDirectory": "gs://test-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-pro/prediction-model-123" | ||
| } | ||
| } | ||
|
|
||
| output_file_id = VertexAIBatchTransformation._get_output_file_id_from_vertex_ai_batch_response( | ||
| response | ||
| ) | ||
|
|
||
| assert ( | ||
| output_file_id | ||
| == "gs://test-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-pro/prediction-model-123/predictions.jsonl" | ||
| ) | ||
|
|
||
|
|
||
| def test_output_file_id_falls_back_to_output_uri_prefix_with_predictions_jsonl(): | ||
| response = { | ||
| "outputInfo": {}, | ||
| "outputConfig": { | ||
| "gcsDestination": { | ||
| "outputUriPrefix": "gs://test-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-pro/prediction-model-456" | ||
| } | ||
| }, | ||
| } | ||
|
|
||
| output_file_id = VertexAIBatchTransformation._get_output_file_id_from_vertex_ai_batch_response( | ||
| response | ||
| ) | ||
|
|
||
| assert ( | ||
| output_file_id | ||
| == "gs://test-bucket/litellm-vertex-files/publishers/google/models/gemini-2.5-pro/prediction-model-456/predictions.jsonl" | ||
| ) |
There was a problem hiding this comment.
Missing test for outputUriPrefix that already ends with /predictions.jsonl
The new guard if output_uri_prefix.endswith("/predictions.jsonl"): return output_uri_prefix is not exercised by any test. Without a test, a future refactor could accidentally double-append /predictions.jsonl and go undetected.
Consider adding:
def test_output_file_id_does_not_double_append_predictions_jsonl():
response = {
"outputInfo": {},
"outputConfig": {
"gcsDestination": {
"outputUriPrefix": "gs://test-bucket/litellm-vertex-files/prediction-model-789/predictions.jsonl"
}
},
}
output_file_id = VertexAIBatchTransformation._get_output_file_id_from_vertex_ai_batch_response(
response
)
assert output_file_id == "gs://test-bucket/litellm-vertex-files/prediction-model-789/predictions.jsonl"
Relevant issues
When using LiteLLM managed batches with a Vertex AI Gemini model, the batch job completes successfully and the proxy correctly updates the batch status to
completedand populatesoutput_file_id. However, callingclient.files.content(output_file_id)to download the results returns a 500 error.Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
🧹 Refactoring
Changes
Primary root cause (empty target_model_names)
In the managed-files post-processing hook (_PROXY_LiteLLMManagedFiles.async_post_call_success_hook), LiteLLM created unified output file IDs using model_name.
For Vertex batch retrieve path, the response often had model_id but no model_name in _hidden_params.
The unified ID builder (get_unified_output_file_id) used model_name or "", so it wrote:
target_model_names, (empty)
That’s why your decoded output_file_id had blank target model names.