fix: merge annotations from all streaming chunks in stream_chunk_builder by bbarwik · Pull Request #23663 · BerriAI/litellm

bbarwik · 2026-03-14T20:56:05Z

Relevant issues

Fixes stream_chunk_builder losing annotations from all but the first annotation-bearing streaming chunk.

Pre-Submission checklist

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

Bug Fix

Changes

Problem

In litellm/main.py, stream_chunk_builder() collects all streaming chunks that contain annotations, but then only uses annotations from the first chunk:

annotations = annotation_chunks[0]["choices"][0]["delta"]["annotations"]

This means if annotations arrive across multiple streaming chunks (e.g. Gemini sends grounding metadata in the final chunk while other providers may include citations in intermediate chunks), all but the first set of annotations are silently lost.

Fix

Merge annotations from ALL annotation-bearing chunks by extending a list:

all_annotations: list = []
for ac in annotation_chunks:
    all_annotations.extend(ac["choices"][0]["delta"]["annotations"])
response["choices"][0]["message"]["annotations"] = all_annotations

Tests added

tests/test_litellm/test_stream_chunk_builder_annotations.py:

test_stream_chunk_builder_merges_annotations_from_multiple_chunks — verifies annotations from chunk 1 and chunk 3 are both present in the assembled response
test_stream_chunk_builder_single_annotation_chunk_still_works — verifies existing single-chunk behavior (no regression)
test_stream_chunk_builder_no_annotations — verifies no annotations attribute when none are present (no regression)

) Fixes BerriAI#23502 The huggingface_embed.embedding() call was not receiving the headers parameter, causing extra_headers (e.g., X-HF-Bill-To) to be silently dropped. Other providers (openrouter, vercel_ai_gateway, bedrock) already pass headers correctly. This fix adds headers=headers to match the behavior of other providers. Co-authored-by: Jah-yee <sparklab@outlook.com>

…fix z-index issue (BerriAI#23516) The model dropdown menus in the Add Fallbacks modal were rendering behind the modal overlay because Ant Design portals Select dropdowns to document.body by default. By setting getPopupContainer to attach the dropdown to its parent element, the dropdown inherits the modal's stacking context and renders above the modal. Fixes BerriAI#17895

…nd Azur… (BerriAI#23183) * PR BerriAI#22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR BerriAI#22867 added _remove_scope_from_cache_control for Bedrock and Azure AI but omitted Vertex AI. This applies the same pattern to VertexAIPartnerModelsAnthropicMessagesConfig." * PR BerriAI#22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. * PR BerriAI#22867 added _remove_scope_from_cache_control to AzureAnthropicMessagesConfig but missed VertexAIPartnerModelsAnthropicMessagesConfi Rather than duplicating the method again, moved it up to the base AnthropicMessagesConfig so all providers inherit it, and removed the now-redundant copy from the Azure AI subclass. --------- Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

…n multi-turn tool calling (BerriAI#23580)

…erriAI#23492) * Handle response.failed, response.incomplete, and response.cancelled terminal events in background streaming Previously the background streaming task only handled response.completed and hardcoded the final status to "completed". This missed three other terminal event types from the OpenAI streaming spec, causing failed/incomplete/cancelled responses to be incorrectly marked as completed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove unused terminal_response_data variable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Address code review: derive fallback status from event type, rewrite tests as integration tests 1. Replace hardcoded "completed" fallback in response_data.get("status") with _event_to_status lookup so that response.incomplete and response.cancelled events get the correct fallback if the response body ever omits the status field. 2. Replace duplicated-logic unit tests with integration tests that exercise background_streaming_task directly using mocked streaming responses and assert on the final update_state call arguments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove dead mock_processor and unused mock_response parameter from test helper Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Remove FastAPI and UserAPIKeyAuth imports from test file These types were only used as Mock(spec=...) arguments. Drop the spec constraints and remove the top-level imports to avoid pulling FastAPI into test files outside litellm/proxy/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude * Log warning when streaming response has no body_iterator If base_process_llm_request returns a non-streaming response (no body_iterator), log a warning since this likely indicates a misconfiguration or provider error rather than a successful completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Committed-By-Agent: claude --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): bump tar to 7.5.11 and tornado to 6.5.5 - tar >=7.5.11: fixes CVE-2026-31802 (HIGH) in node-pkg - tornado >=6.5.5: fixes CVE-2026-31958 (HIGH) and GHSA-78cv-mqj4-43f7 (MEDIUM) in python-pkg Addresses vulnerabilities found in ghcr.io/berriai/litellm:main-v1.82.0-stable Trivy scan. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: document tar override is enforced via Dockerfile, not npm * fix: revert invalid JSON comment in package.json tar override --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(bedrock): respect s3_region_name for batch file uploads (BerriAI#23569) * fix(bedrock): respect s3_region_name for batch file uploads (GovCloud fix) * fix: s3_region_name always wins over aws_region_name for S3 signing (Greptile feedback) * fix: _filter_headers_for_aws_signature - Bedrock KB (BerriAI#23571) * fix: _filter_headers_for_aws_signature * fix: filter None header values in all post-signing re-merge paths Addresses Greptile feedback: None-valued headers were being filtered during SigV4 signing but re-merged back into the final headers dict afterward, which would cause downstream HTTP client failures. Made-with: Cursor * feat(router): tag_regex routing — route by User-Agent regex without per-developer tag config (BerriAI#23594) * feat(router): add tag_regex support for header-based routing Adds a new `tag_regex` field to litellm_params that lets operators route requests based on regex patterns matched against request headers — primarily User-Agent — without requiring per-developer tag configuration. Use case: route all Claude Code traffic (User-Agent: claude-code/x.y.z) to a dedicated deployment by setting: tag_regex: - "^User-Agent: claude-code\\/" in the deployment's litellm_params. Works alongside existing `tags` routing; exact tag match takes precedence over regex match. Unmatched requests fall through to deployments tagged `default`. The matched deployment, pattern, and user_agent are recorded in `metadata["tag_routing"]` so they flow through to SpendLogs automatically. * fix(tag_regex): address backwards-compat, metadata overwrite, and warning noise Three issues from code review: 1. Backwards-compat: `has_tag_filter` was widened to activate on any non-empty User-Agent, which would raise ValueError for existing deployments using plain tags without a `default` fallback. Fix: only activate header-based regex filtering when at least one candidate deployment has `tag_regex` configured. 2. Metadata overwrite: `metadata["tag_routing"]` was overwritten for every matching deployment in the loop, leaving inaccurate provenance when multiple deployments match. Fix: write only for the first match. 3. Warning noise: an invalid regex pattern logged one warning per header string rather than once per pattern. Fix: compile first (catching re.error once), then iterate over header strings. Also adds two new tests covering these cases, and adds docs page for tag_regex routing with a Claude Code walk-through. * refactor(tag_regex): remove unnecessary _healthy_list copy * docs: merge tag_regex section into tag_routing.md, remove standalone page - Add ## Regex-based tag routing (tag_regex) section to existing tag_routing.md instead of a separate page - Remove tag_regex_routing.md standalone doc (odd UX to have a separate page for a sub-feature) - Remove proxy/tag_regex_routing from sidebars.js - Add match_any=False debug warning in tag_based_routing.py when regex routing fires under strict mode (regex always uses OR semantics) * fix(tag_regex): address greptile review - security docs, strict-mode enforcement, validation order - Strengthen security note in tag_routing.md: explicitly state User-Agent is client-supplied and can be set to any value; frame tag_regex as a traffic classification hint, not an access-control mechanism - Move tag_regex startup validation before _add_deployment() so an invalid pattern never leaves partial router state - Enforce match_any=False strict-tag policy: when a deployment has both tags and tag_regex and the strict tag check fails, skip the regex fallback rather than silently bypassing the operator's intent - Extract per-deployment match logic into _match_deployment() helper to keep get_deployments_for_tag() readable - Add two new tests: strict-mode blocks regex fallback, regex-only deployment still matches under match_any=False * fix(ci): apply Black formatting to 14 files and stabilize flaky caplog tests - Run Black formatter on 14 files that were failing the lint check - Replace caplog-based assertions in TestAliasConflicts with unittest.mock.patch on verbose_logger.warning for xdist compatibility - The caplog fixture can produce empty text in pytest-xdist workers in certain CI environments, causing flaky test failures Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: restore offline tiktoken cache for non-root envs Made-with: Cursor * chore: mkdir for custom tiktoken cache dir Made-with: Cursor * test: patch tiktoken.get_encoding in custom-dir test to avoid network Made-with: Cursor * test: clear CUSTOM_TIKTOKEN_CACHE_DIR in helper for test isolation Made-with: Cursor * test: restore default_encoding module state after custom-dir test Made-with: Cursor

Map provider finish_reason "content_filtered" to the OpenAI-compatible "content_filter" and extend core_helpers tests to cover this case. Made-with: Cursor

Previously, stream_chunk_builder only took annotations from the first chunk that contained them, losing any annotations from later chunks. This is a problem because providers like Gemini/Vertex AI send grounding metadata (converted to annotations) in the final streaming chunk, while other providers may spread annotations across multiple chunks. Changes: - Collect and merge annotations from ALL annotation-bearing chunks instead of only using the first one

vercel · 2026-03-14T20:56:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 14, 2026 8:57pm

CLAassistant · 2026-03-14T20:56:12Z

All committers have signed the CLA.

greptile-apps · 2026-03-14T20:57:31Z

Greptile Summary

This PR fixes a bug in stream_chunk_builder where annotations from all but the first annotation-bearing streaming chunk were silently dropped. The fix replaces a single direct index access (annotation_chunks[0]) with a loop that extends a merged list across all annotation chunks, then assigns the full list to the assembled response message.

Key changes:

litellm/main.py: Replaces annotations = annotation_chunks[0][...]["annotations"] with a loop that calls all_annotations.extend(...) for every annotation chunk, ensuring no annotations are lost regardless of which chunk they arrive in.
tests/test_litellm/test_stream_chunk_builder_annotations.py: Adds three focused mock unit tests — multi-chunk merge correctness, single-chunk regression guard, and no-annotation regression guard — all using locally constructed ModelResponseStream objects with no network calls.

Confidence Score: 5/5

This PR is safe to merge — it is a minimal, well-tested bug fix with no backwards-incompatible changes.
The change is a straightforward one-to-many extension of an existing list-merge pattern already used elsewhere in the same function. The logic is correct, the three new tests cover the fixed case plus two regression scenarios, tests make no real network calls, and no existing behavior for callers without annotations is affected.
No files require special attention.

Important Files Changed

Filename	Overview
litellm/main.py	Fixes stream_chunk_builder to merge annotations from all streaming chunks instead of only the first — minimal, targeted, and correct change.
tests/test_litellm/test_stream_chunk_builder_annotations.py	New test file with three pure unit tests (no network calls): multi-chunk merge, single-chunk regression, and no-annotation regression — all compliant with the mock-only rule for this directory.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[stream_chunk_builder called with chunks] --> B[Filter annotation_chunks\nchunks where delta.annotations is not None]
    B --> C{len annotation_chunks > 0?}
    C -- No --> D[Skip annotation assignment]
    C -- Yes --> E["all_annotations = []"]
    E --> F[For each ac in annotation_chunks]
    F --> G["all_annotations.extend(\n  ac[choices][0][delta][annotations]\n)"]
    G --> F
    F -- done --> H["response[choices][0][message][annotations]\n= all_annotations"]
    H --> I[Continue with audio/image chunks...]

_{Last reviewed commit: e38bbec}

Jah-yee and others added 11 commits March 13, 2026 21:54

fix: auto-fill reasoning_content for moonshot kimi reasoning models i…

a94b961

…n multi-turn tool calling (BerriAI#23580)

fix: normalize content_filtered finish_reason (BerriAI#23564)

d29287c

Map provider finish_reason "content_filtered" to the OpenAI-compatible "content_filter" and extend core_helpers tests to cover this case. Made-with: Cursor

fix: Fixes BerriAI#23185 (BerriAI#23647)

8abf2d8

vercel bot deployed to Preview March 14, 2026 20:57 View deployment

RheagalFire approved these changes Mar 15, 2026

View reviewed changes

RheagalFire changed the base branch from main to litellm_oss_staging_03_14_2026 March 15, 2026 08:36

RheagalFire merged commit 8e5c3ed into BerriAI:litellm_oss_staging_03_14_2026 Mar 15, 2026
35 of 37 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: merge annotations from all streaming chunks in stream_chunk_builder#23663

fix: merge annotations from all streaming chunks in stream_chunk_builder#23663
RheagalFire merged 11 commits intoBerriAI:litellm_oss_staging_03_14_2026from
bbarwik:fix/stream-chunk-builder-merge-annotations

bbarwik commented Mar 14, 2026

Uh oh!

vercel bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 14, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 14, 2026

Important Files Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

Uh oh!

Conversation

bbarwik commented Mar 14, 2026

Relevant issues

Pre-Submission checklist

Type

Changes

Problem

Fix

Tests added

Uh oh!

vercel bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CLAassistant commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 14, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

vercel bot commented Mar 14, 2026 •

edited

Loading

CLAassistant commented Mar 14, 2026 •

edited

Loading