Skip to content

fix(openai): handle server error chunks in streaming responses#7663

Closed
fresh3nough wants to merge 1 commit into
aaif-goose:mainfrom
fresh3nough:fix/streaming-error-chunks-7645
Closed

fix(openai): handle server error chunks in streaming responses#7663
fresh3nough wants to merge 1 commit into
aaif-goose:mainfrom
fresh3nough:fix/streaming-error-chunks-7645

Conversation

@fresh3nough
Copy link
Copy Markdown
Contributor

Problem

When an OpenAI-compatible server (e.g. llama.cpp) returns an error during streaming, it sends a JSON chunk with an error field instead of the expected choices field. The StreamingChunk deserialization fails with missing field choices, producing a confusing error:

Stream decode error: Failed to parse streaming chunk: missing field choices at line 1 column 99:

This is particularly triggered by subagents/summon, which create concurrent streaming sessions that can overwhelm local LLM servers (regression since the summon extension was introduced).

Fix

  • Add #[serde(default)] to StreamingChunk.choices so error-only chunks can be deserialized without crashing
  • Add optional error field to StreamingChunk to capture server error responses
  • Add check_streaming_error() helper that propagates server errors with clear messages including the original error code, type, and message
  • Check for errors in both the main streaming loop and the inner tool-call accumulation loop
  • 3 new tests covering: error-only chunks, errors during tool call streaming, and rate limit errors

This follows the same pattern already used by the Google provider (formats/google.rs lines 469-479) which already handles streaming errors correctly.

Testing

  • All 31 existing tests in formats::openai::tests pass (no regressions)
  • 3 new tests verify error chunk handling:
  • cargo clippy and cargo fmt clean

Fixes #7645
Related: #7364, #7570

@fresh3nough fresh3nough force-pushed the fix/streaming-error-chunks-7645 branch from 0089069 to d67a313 Compare March 4, 2026 22:00
When an OpenAI-compatible server (e.g. llama.cpp) returns an error during
streaming, it sends a JSON chunk with an 'error' field instead of the
expected 'choices' field. Previously, the StreamingChunk deserialization
would fail with 'missing field choices', producing a confusing error.

This is particularly triggered by subagents/summon, which create
concurrent streaming sessions that can overwhelm local LLM servers.

Changes:
- Add #[serde(default)] to StreamingChunk.choices so error-only chunks
  can be deserialized
- Add optional 'error' field to StreamingChunk to capture server errors
- Add check_streaming_error() that propagates server errors with clear
  messages including the original error code, type, and message
- Check for errors in both the main streaming loop and the inner
  tool-call accumulation loop

Fixes aaif-goose#7645

Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-131.us-east-2.compute.internal>
Signed-off-by: fre <anonwurcod@proton.me>
@fresh3nough fresh3nough force-pushed the fix/streaming-error-chunks-7645 branch from d67a313 to 0583ef7 Compare March 4, 2026 22:02
@DOsinga
Copy link
Copy Markdown
Collaborator

DOsinga commented Mar 20, 2026

Thanks for the fix @fresh3nough! Unfortunately this has been superseded by #8031 which landed the same fix (with a slightly different approach — using ProviderError::ServerError so the error is retryable, and handling both the OpenAI and vLLM error formats). Closing this one out, but the contribution is appreciated — it clearly highlighted a real bug that needed fixing!

@DOsinga DOsinga closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stream decode error when using subagents/summon extension (regression in v1.25.0+)

3 participants