Skip to content

Conversation

kbenoit
Copy link

@kbenoit kbenoit commented Aug 26, 2025

Problem

AWS Bedrock occasionally returns duplicate identical JSON objects during structured data extraction, causing
extract_data() to fail with the error:

  Data extraction failed: 2 data results received.

This causes parallel_chat_structured() to fail completely when processing batches of prompts.

Evidence

See also the reprex based on my own data, in #732 (comment).

Captured failure case from real-world usage (Spanish political manifesto analysis), before I simplified the type_object by flattening it, as in the reprex linked above:

# The Turn object contains two identical ContentJson objects:
turn@contents:
  [[1]] <ellmer::ContentJson>
  @value: List of 9
$ Country_LLM         : chr "Spain"
$ Party_LLM           : chr "Partido Popular"
$ is_populist         : logi FALSE
$ is_populist_evidence: chr "The party does not consistently frame politics in populist terms..."
$ antielitism         :List of 2
..$ evidence: chr "The party criticizes the current government's policy of 'cesiones humillantes'..."
..$ score   : int 4
$ generalwill         :List of 2
..$ evidence: chr "The party emphasizes the importance of unity and the general will of the people..."
..$ score   : int 5
$ indivisible         :List of 2
..$ evidence: chr "The party portrays Spain as a single nation with a common identity..."
..$ score   : int 6
$ manichean           :List of 2
..$ evidence: chr "The party criticizes the current government's policies and actions..."
..$ score   : int 3
$ peoplecentrism      :List of 2
..$ evidence: chr "The party emphasizes the importance of listening to the people..."
..$ score   : int 5

[[2]] <ellmer::ContentJson>
  @value: List of 9

# IDENTICAL structure and values as [[1]]
# Verification: identical(json1, json2) == TRUE

Scale: Captured 11 failure cases vs 3 success cases during batch processing, indicating this is a significant
intermittent issue with Bedrock's tool calling mechanism.

Solution

Updated extract_data() in R/chat-structured.R to gracefully handle multiple JSON responses:

  if (n == 2) {
    # Check if the two JSON objects are identical (duplicate case)
    if (identical(val1, val2)) {
      warning("Found duplicate JSON responses, using the first one", call. = FALSE)
      out <- val1
    } else {
      # Different JSON objects - use the last one (likely the final response)  
      warning("Found multiple different JSON responses, using the last one", call. = FALSE)
      out <- val2
    }
  }

Tests

Added comprehensive tests in test-chat-structured.R:

  • Handle duplicate identical JSON responses with warning
  • Handle different JSON responses (use last one)
  • Include prompt index in warning messages for parallel operations
  • Error appropriately on >2 JSON responses

Impact

Before:

# This would fail completely
parallel_chat_structured(chat, prompts, type = my_type)
#> Error: Data extraction failed: 2 data results received.

After:

# This now succeeds with warning
result <- parallel_chat_structured(chat, prompts, type = my_type)
#> Warning: Found duplicate JSON responses, using the first one (prompt 43).
#> ... successful extraction continues

This ensures robust structured data extraction for production use, even when providers exhibit intermittent duplication behaviour.

Interactions with other (current) PRs

This works best when combined with #725. #705 does not fix this issue, but if the other two PRs are accepted, I will redo #705 and integrate them into a single, more robust parallel_chat_structured().

kbenoit and others added 5 commits August 26, 2025 11:44
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant