fix(bedrock): normalise streaming choice index=0 for extended-thinkin…#23248
fix(bedrock): normalise streaming choice index=0 for extended-thinkin…#23248awais786 wants to merge 1 commit intoBerriAI:mainfrom
Conversation
…g blocks (issue BerriAI#23178) When Claude extended-thinking is enabled on Bedrock the converse API emits two content-block types in the same response: contentBlockIndex=0 → reasoning / thinking block contentBlockIndex=1 → text block The existing converse_chunk_parser already hardcodes StreamingChoices(index=0) for every event (tool-calls fix from BerriAI#22867), so the normalisation is already in place for the converse path. The AmazonAnthropicClaudeStreamDecoder (invoke/anthropic path) likewise always sets index=0 via AnthropicModelResponseIterator.chunk_parser. This commit adds explicit regression tests for both paths covering the full thinking-block event sequence (start, delta, signature, stop) and the subsequent text-block events that arrive on contentBlockIndex=1, ensuring choices[0].index is always 0 and OpenAI-compatible clients do not crash. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR adds regression tests to verify that Bedrock streaming responses always normalise Key observations:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| tests/test_litellm/llms/bedrock/chat/test_streaming_choice_index.py | Adds three regression tests for choices[0].index == 0 normalisation during extended-thinking (reasoning) streaming for both the converse path (AWSEventStreamDecoder.converse_chunk_parser) and the invoke/anthropic path (AmazonAnthropicClaudeStreamDecoder._chunk_parser). No production code is changed — all tested behaviour was already in place. Tests are purely unit tests with no real network calls. |
Sequence Diagram
sequenceDiagram
participant Client as OpenAI-compat Client
participant LiteLLM as LiteLLM Streaming Layer
participant Bedrock as Bedrock API
Note over Bedrock,LiteLLM: Extended-thinking response stream
Bedrock->>LiteLLM: contentBlockIndex=0, start {reasoningContent}
LiteLLM->>Client: choices[0].index=0 (thinking start)
Bedrock->>LiteLLM: contentBlockIndex=0, delta {reasoningContent.text}
LiteLLM->>Client: choices[0].index=0 (thinking delta → reasoning_content)
Bedrock->>LiteLLM: contentBlockIndex=0, delta {reasoningContent.signature}
LiteLLM->>Client: choices[0].index=0 (signature → thinking_blocks)
Bedrock->>LiteLLM: contentBlockIndex=0 (stop)
LiteLLM->>Client: choices[0].index=0 (thinking stop)
Note over Bedrock,LiteLLM: ⚠️ Previously broken: index=1 was forwarded
Bedrock->>LiteLLM: contentBlockIndex=1, start {} (text block)
LiteLLM->>Client: choices[0].index=0 (normalised ✓)
Bedrock->>LiteLLM: contentBlockIndex=1, delta {text}
LiteLLM->>Client: choices[0].index=0 (normalised ✓)
Bedrock->>LiteLLM: contentBlockIndex=1 (stop)
LiteLLM->>Client: choices[0].index=0 (normalised ✓)
Bedrock->>LiteLLM: stopReason=end_turn
LiteLLM->>Client: choices[0].index=0 (finish)
Last reviewed commit: 5521f95
| result = handler._chunk_parser(chunk) | ||
| assert result.choices[0].index == 0, ( | ||
| f"chunk type={chunk.get('type')} index={chunk.get('index')} " | ||
| f"produced choices[0].index={result.choices[0].index}, expected 0" | ||
| ) |
There was a problem hiding this comment.
Missing assertion on thinking/content delta values
The test_anthropic_invoke_decoder_thinking_uses_choice_index_zero test only validates choices[0].index == 0 for all chunks, including the content_block_delta chunks that carry actual thinking text ("I should multiply.") and text content ("27 x 453 = 12231"). The existing test_thinking_block_chunks_use_choice_index_zero counterpart already asserts delta.reasoning_content and delta.content on comparable chunks.
For completeness and to catch regressions in value propagation through AmazonAnthropicClaudeStreamDecoder, consider adding content assertions for the delta chunks, e.g.:
for chunk in anthropic_chunks:
result = handler._chunk_parser(chunk)
assert result.choices[0].index == 0, (
f"chunk type={chunk.get('type')} index={chunk.get('index')} "
f"produced choices[0].index={result.choices[0].index}, expected 0"
)
# Validate that thinking delta propagates reasoning_content
if chunk.get("type") == "content_block_delta" and chunk.get("index") == 0:
assert result.choices[0].delta.reasoning_content is not None
# Validate that text delta propagates content
if chunk.get("type") == "content_block_delta" and chunk.get("index") == 1:
assert result.choices[0].delta.content == "27 x 453 = 12231"
…g blocks (issue #23178)
When Claude extended-thinking is enabled on Bedrock the converse API emits two content-block types in the same response:
contentBlockIndex=0 → reasoning / thinking block
contentBlockIndex=1 → text block
The existing converse_chunk_parser already hardcodes StreamingChoices(index=0) for every event (tool-calls fix from #22867), so the normalisation is already in place for the converse path. The AmazonAnthropicClaudeStreamDecoder (invoke/anthropic path) likewise always sets index=0 via AnthropicModelResponseIterator.chunk_parser.
This commit adds explicit regression tests for both paths covering the full thinking-block event sequence (start, delta, signature, stop) and the subsequent text-block events that arrive on contentBlockIndex=1, ensuring choices[0].index is always 0 and OpenAI-compatible clients do not crash.
Relevant issues
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes