Skip to content

fix(bedrock): normalise streaming choice index=0 for extended-thinkin…#23248

Closed
awais786 wants to merge 1 commit intoBerriAI:mainfrom
awais786:fix/bedrock-extended-thinking-stream-choice-index
Closed

fix(bedrock): normalise streaming choice index=0 for extended-thinkin…#23248
awais786 wants to merge 1 commit intoBerriAI:mainfrom
awais786:fix/bedrock-extended-thinking-stream-choice-index

Conversation

@awais786
Copy link
Contributor

…g blocks (issue #23178)

When Claude extended-thinking is enabled on Bedrock the converse API emits two content-block types in the same response:
contentBlockIndex=0 → reasoning / thinking block
contentBlockIndex=1 → text block

The existing converse_chunk_parser already hardcodes StreamingChoices(index=0) for every event (tool-calls fix from #22867), so the normalisation is already in place for the converse path. The AmazonAnthropicClaudeStreamDecoder (invoke/anthropic path) likewise always sets index=0 via AnthropicModelResponseIterator.chunk_parser.

This commit adds explicit regression tests for both paths covering the full thinking-block event sequence (start, delta, signature, stop) and the subsequent text-block events that arrive on contentBlockIndex=1, ensuring choices[0].index is always 0 and OpenAI-compatible clients do not crash.

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

…g blocks (issue BerriAI#23178)

When Claude extended-thinking is enabled on Bedrock the converse API emits two
content-block types in the same response:
  contentBlockIndex=0  →  reasoning / thinking block
  contentBlockIndex=1  →  text block

The existing converse_chunk_parser already hardcodes StreamingChoices(index=0)
for every event (tool-calls fix from BerriAI#22867), so the normalisation is already
in place for the converse path.  The AmazonAnthropicClaudeStreamDecoder
(invoke/anthropic path) likewise always sets index=0 via
AnthropicModelResponseIterator.chunk_parser.

This commit adds explicit regression tests for both paths covering the full
thinking-block event sequence (start, delta, signature, stop) and the
subsequent text-block events that arrive on contentBlockIndex=1, ensuring
choices[0].index is always 0 and OpenAI-compatible clients do not crash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Error Error Mar 10, 2026 9:47am

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR adds regression tests to verify that Bedrock streaming responses always normalise choices[0].index to 0 when Claude extended-thinking is enabled — covering both the converse path (AWSEventStreamDecoder.converse_chunk_parser) and the invoke/anthropic path (AmazonAnthropicClaudeStreamDecoder._chunk_parser). No production code is modified; the existing normalisation (index=0 is hardcoded in both paths) was already in place from prior fixes.

Key observations:

  • The three new tests (test_thinking_block_chunks_use_choice_index_zero, test_thinking_signature_chunk_uses_choice_index_zero, test_anthropic_invoke_decoder_thinking_uses_choice_index_zero) are correct unit tests with no real network calls, satisfying the mock-only test requirement for this folder.
  • All tested assertions should pass against the current production code, confirming these are pure regression/documentation tests.
  • The PR title and description use "fix" language, but no production code is changed. The actual normalisation fix was shipped in prior PRs (fix(bedrock,azure_ai): strip scope from cache_control for Anthropic messages #22867 for converse; the Anthropic iterator hardcodes index = 0 on line 688 of handler.py). This PR's value is in formalising regression coverage.
  • The "Relevant Issues" section in the PR description is left empty. Issue [Bug]: Bedrock Claude extended thinking streams mixed choice indices (0 and 1), breaking OpenAI compatibility #23178 is referenced only in code comments and docstrings, not as a formal Fixes #23178 link. Per project guidelines, PRs claiming to fix an issue should include traceable evidence in the description body.
  • The pre-submission checklist is entirely unchecked (all [ ]), including the testing item, despite tests being present.
  • test_anthropic_invoke_decoder_thinking_uses_choice_index_zero only validates index == 0 per chunk and omits content-value assertions that the converse-path test includes for comparable chunks, making it a weaker regression guard against value-propagation regressions in AmazonAnthropicClaudeStreamDecoder.

Confidence Score: 4/5

  • Safe to merge — only adds regression tests, no production code is modified.
  • All new tests are pure unit tests with no network calls, the logic is correct, and no existing behaviour is altered. Score is 4 rather than 5 primarily because the PR description lacks a formal "Fixes [Bug]: Bedrock Claude extended thinking streams mixed choice indices (0 and 1), breaking OpenAI compatibility #23178" issue link and the pre-submission checklist is entirely unchecked, which are process gaps the team typically requires before merge. The test_anthropic_invoke_decoder_thinking_uses_choice_index_zero test would also benefit from content-value assertions to serve as a stronger regression guard.
  • No production files were changed. tests/test_litellm/llms/bedrock/chat/test_streaming_choice_index.py could be strengthened with content-value assertions in the invoke-path test.

Important Files Changed

Filename Overview
tests/test_litellm/llms/bedrock/chat/test_streaming_choice_index.py Adds three regression tests for choices[0].index == 0 normalisation during extended-thinking (reasoning) streaming for both the converse path (AWSEventStreamDecoder.converse_chunk_parser) and the invoke/anthropic path (AmazonAnthropicClaudeStreamDecoder._chunk_parser). No production code is changed — all tested behaviour was already in place. Tests are purely unit tests with no real network calls.

Sequence Diagram

sequenceDiagram
    participant Client as OpenAI-compat Client
    participant LiteLLM as LiteLLM Streaming Layer
    participant Bedrock as Bedrock API

    Note over Bedrock,LiteLLM: Extended-thinking response stream

    Bedrock->>LiteLLM: contentBlockIndex=0, start {reasoningContent}
    LiteLLM->>Client: choices[0].index=0 (thinking start)

    Bedrock->>LiteLLM: contentBlockIndex=0, delta {reasoningContent.text}
    LiteLLM->>Client: choices[0].index=0 (thinking delta → reasoning_content)

    Bedrock->>LiteLLM: contentBlockIndex=0, delta {reasoningContent.signature}
    LiteLLM->>Client: choices[0].index=0 (signature → thinking_blocks)

    Bedrock->>LiteLLM: contentBlockIndex=0 (stop)
    LiteLLM->>Client: choices[0].index=0 (thinking stop)

    Note over Bedrock,LiteLLM: ⚠️ Previously broken: index=1 was forwarded

    Bedrock->>LiteLLM: contentBlockIndex=1, start {} (text block)
    LiteLLM->>Client: choices[0].index=0 (normalised ✓)

    Bedrock->>LiteLLM: contentBlockIndex=1, delta {text}
    LiteLLM->>Client: choices[0].index=0 (normalised ✓)

    Bedrock->>LiteLLM: contentBlockIndex=1 (stop)
    LiteLLM->>Client: choices[0].index=0 (normalised ✓)

    Bedrock->>LiteLLM: stopReason=end_turn
    LiteLLM->>Client: choices[0].index=0 (finish)
Loading

Last reviewed commit: 5521f95

Comment on lines +244 to +248
result = handler._chunk_parser(chunk)
assert result.choices[0].index == 0, (
f"chunk type={chunk.get('type')} index={chunk.get('index')} "
f"produced choices[0].index={result.choices[0].index}, expected 0"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing assertion on thinking/content delta values

The test_anthropic_invoke_decoder_thinking_uses_choice_index_zero test only validates choices[0].index == 0 for all chunks, including the content_block_delta chunks that carry actual thinking text ("I should multiply.") and text content ("27 x 453 = 12231"). The existing test_thinking_block_chunks_use_choice_index_zero counterpart already asserts delta.reasoning_content and delta.content on comparable chunks.

For completeness and to catch regressions in value propagation through AmazonAnthropicClaudeStreamDecoder, consider adding content assertions for the delta chunks, e.g.:

for chunk in anthropic_chunks:
    result = handler._chunk_parser(chunk)
    assert result.choices[0].index == 0, (
        f"chunk type={chunk.get('type')} index={chunk.get('index')} "
        f"produced choices[0].index={result.choices[0].index}, expected 0"
    )
    # Validate that thinking delta propagates reasoning_content
    if chunk.get("type") == "content_block_delta" and chunk.get("index") == 0:
        assert result.choices[0].delta.reasoning_content is not None
    # Validate that text delta propagates content
    if chunk.get("type") == "content_block_delta" and chunk.get("index") == 1:
        assert result.choices[0].delta.content == "27 x 453 = 12231"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant