Fix #732 by de-duplicating duplicated JSON responses from AWS Bedrock in structured data extraction #733

kbenoit · 2025-08-26T03:53:58Z

Problem

AWS Bedrock occasionally returns duplicate identical JSON objects during structured data extraction, causing
extract_data() to fail with the error:

  Data extraction failed: 2 data results received.

This causes parallel_chat_structured() to fail completely when processing batches of prompts.

Evidence

See also the reprex based on my own data, in #732 (comment).

Captured failure case from real-world usage (Spanish political manifesto analysis), before I simplified the type_object by flattening it, as in the reprex linked above:

# The Turn object contains two identical ContentJson objects:
turn@contents:
  [[1]] <ellmer::ContentJson>
  @value: List of 9
$ Country_LLM         : chr "Spain"
$ Party_LLM           : chr "Partido Popular"
$ is_populist         : logi FALSE
$ is_populist_evidence: chr "The party does not consistently frame politics in populist terms..."
$ antielitism         :List of 2
..$ evidence: chr "The party criticizes the current government's policy of 'cesiones humillantes'..."
..$ score   : int 4
$ generalwill         :List of 2
..$ evidence: chr "The party emphasizes the importance of unity and the general will of the people..."
..$ score   : int 5
$ indivisible         :List of 2
..$ evidence: chr "The party portrays Spain as a single nation with a common identity..."
..$ score   : int 6
$ manichean           :List of 2
..$ evidence: chr "The party criticizes the current government's policies and actions..."
..$ score   : int 3
$ peoplecentrism      :List of 2
..$ evidence: chr "The party emphasizes the importance of listening to the people..."
..$ score   : int 5

[[2]] <ellmer::ContentJson>
  @value: List of 9

# IDENTICAL structure and values as [[1]]
# Verification: identical(json1, json2) == TRUE

Scale: Captured 11 failure cases vs 3 success cases during batch processing, indicating this is a significant
intermittent issue with Bedrock's tool calling mechanism.

Solution

Updated extract_data() in R/chat-structured.R to gracefully handle multiple JSON responses:

  if (n == 2) {
    # Check if the two JSON objects are identical (duplicate case)
    if (identical(val1, val2)) {
      warning("Found duplicate JSON responses, using the first one", call. = FALSE)
      out <- val1
    } else {
      # Different JSON objects - use the last one (likely the final response)  
      warning("Found multiple different JSON responses, using the last one", call. = FALSE)
      out <- val2
    }
  }

Tests

Added comprehensive tests in test-chat-structured.R:

Handle duplicate identical JSON responses with warning
Handle different JSON responses (use last one)
Include prompt index in warning messages for parallel operations
Error appropriately on >2 JSON responses

Impact

Before:

# This would fail completely
parallel_chat_structured(chat, prompts, type = my_type)
#> Error: Data extraction failed: 2 data results received.

After:

# This now succeeds with warning
result <- parallel_chat_structured(chat, prompts, type = my_type)
#> Warning: Found duplicate JSON responses, using the first one (prompt 43).
#> ... successful extraction continues

This ensures robust structured data extraction for production use, even when providers exhibit intermittent duplication behaviour.

Interactions with other (current) PRs

This works best when combined with #725. #705 does not fix this issue, but if the other two PRs are accepted, I will redo #705 and integrate them into a single, more robust parallel_chat_structured().

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

kbenoit and others added 5 commits August 26, 2025 11:44

Fix tidyverse#732 by de-duplicating duplicated JSON

8057dbc

Apply suggestions from code review

66d719f

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Use direct tests instead of snapshot

df83971

Remove a blank line to avoid reviewdog pedantry

6f6dbc2

Accept formatting suggestion, under duress

21572bc

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix #732 by de-duplicating duplicated JSON responses from AWS Bedrock in structured data extraction #733

Fix #732 by de-duplicating duplicated JSON responses from AWS Bedrock in structured data extraction #733

Uh oh!

kbenoit commented Aug 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Fix #732 by de-duplicating duplicated JSON responses from AWS Bedrock in structured data extraction #733

Are you sure you want to change the base?

Fix #732 by de-duplicating duplicated JSON responses from AWS Bedrock in structured data extraction #733

Uh oh!

Conversation

kbenoit commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Evidence

Solution

Tests

Impact

Interactions with other (current) PRs

Uh oh!

Uh oh!

kbenoit commented Aug 26, 2025 •

edited

Loading