Fix #732 by de-duplicating duplicated JSON responses from AWS Bedrock in structured data extraction #733
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
AWS Bedrock occasionally returns duplicate identical JSON objects during structured data extraction, causing
extract_data()
to fail with the error:This causes
parallel_chat_structured()
to fail completely when processing batches of prompts.Evidence
See also the reprex based on my own data, in #732 (comment).
Captured failure case from real-world usage (Spanish political manifesto analysis), before I simplified the type_object by flattening it, as in the reprex linked above:
Scale: Captured 11 failure cases vs 3 success cases during batch processing, indicating this is a significant
intermittent issue with Bedrock's tool calling mechanism.
Solution
Updated
extract_data()
inR/chat-structured.R
to gracefully handle multiple JSON responses:Tests
Added comprehensive tests in test-chat-structured.R:
Impact
Before:
After:
This ensures robust structured data extraction for production use, even when providers exhibit intermittent duplication behaviour.
Interactions with other (current) PRs
This works best when combined with #725. #705 does not fix this issue, but if the other two PRs are accepted, I will redo #705 and integrate them into a single, more
robust parallel_chat_structured()
.