fix: add server_tool_use content blocks and tests for OpenAI native web search#26246
Conversation
cdc330a
into
noanflaherty/managed-openai-native-web-search
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f428c06d06
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (const toolUseId of webSearchCallIds) { | ||
| content.push({ | ||
| type: "server_tool_use", | ||
| id: toolUseId, | ||
| name: "web_search", | ||
| input: {}, | ||
| }); |
There was a problem hiding this comment.
Emit paired web-search result with server tool use
This now persists server_tool_use blocks for OpenAI web-search calls but never emits a matching web_search_tool_result block, which means every such message is treated as orphaned server-tool output by repairHistory and gets a synthetic web_search_tool_result_error injected on subsequent turns (via the pre-run repair path in conversation-agent-loop.ts). In practice, successful native web searches are rewritten as synthetic errors in runtime history, adding noisy repairs and polluting follow-up context; either emit a paired result block or avoid adding server_tool_use until a pair can be represented.
Useful? React with 👍 / 👎.
| for (const toolUseId of webSearchCallIds) { | ||
| content.push({ | ||
| type: "server_tool_use", | ||
| id: toolUseId, | ||
| name: "web_search", | ||
| input: {}, | ||
| }); | ||
| } |
There was a problem hiding this comment.
🔴 Missing web_search_tool_result blocks causes history repair to inject synthetic error results on every subsequent turn
The new code injects server_tool_use content blocks into the response (lines 305-311) but never produces the paired web_search_tool_result content blocks that the rest of the codebase expects. The repairHistory function (assistant/src/daemon/history-repair.ts:71-113) runs before every provider API call (assistant/src/daemon/conversation-agent-loop.ts:1023) and explicitly checks that every server_tool_use in an assistant message has a matching web_search_tool_result in the same message. When the pairing is missing, it injects a synthetic error result with {type: "web_search_tool_result_error", error_code: "unavailable"}, logs a warning, and increments stats.missingToolResultsInserted. This means every successful native web search through OpenAI will be retroactively marked as failed in the conversation history on the very next turn. The Anthropic provider produces both block types (see assistant/src/providers/anthropic/client.ts:1386-1393), and the comment on line 302 even says the goal is to match the Anthropic shape — but the pairing is incomplete.
Expected fix pattern
After each server_tool_use block, also push a web_search_tool_result block:
for (const toolUseId of webSearchCallIds) {
content.push({
type: "server_tool_use",
id: toolUseId,
name: "web_search",
input: {},
});
content.push({
type: "web_search_tool_result",
tool_use_id: toolUseId,
content: [],
});
}| for (const toolUseId of webSearchCallIds) { | |
| content.push({ | |
| type: "server_tool_use", | |
| id: toolUseId, | |
| name: "web_search", | |
| input: {}, | |
| }); | |
| } | |
| for (const toolUseId of webSearchCallIds) { | |
| content.push({ | |
| type: "server_tool_use", | |
| id: toolUseId, | |
| name: "web_search", | |
| input: {}, | |
| }); | |
| content.push({ | |
| type: "web_search_tool_result", | |
| tool_use_id: toolUseId, | |
| content: [], | |
| }); | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
* [managed] Enable OpenAI managed-proxy fallback routing in provider bootstrap (#26211) * [openai] Route inference-provider-native web search to Responses native web_search tool (#26212) * [macOS] Allow managed inference provider selection beyond Anthropic (#26218) * [macOS] Allow managed inference provider selection beyond Anthropic * fix: capture draftProvider before async Task to prevent race condition Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [macOS] Allow provider-native web search when managed inference provider supports it (#26230) * [macOS] Allow provider-native web search when managed inference provider supports it * fix: gate mode-specific web search invalidation on modeChanging to prevent false-positive alerts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: generalize alert message to cover both mode and provider changes * fix: scope provider-native invalidation to your-own web search mode --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: emit server_tool_start/complete events for OpenAI native web search (#26240) * fix: render web_search_call output items in LLM context diagnostics (#26243) * fix: add server_tool_use content blocks and tests for OpenAI native web search (#26246) * fix: diagnostics polish — web_search_call tests, tool count fix, loop consistency (#26250) * fix: emit paired web_search_tool_result blocks to prevent repairHistory corruption The OpenAI Responses provider emitted server_tool_use content blocks for web_search_call items but did not emit matching web_search_tool_result blocks. repairHistory() treats any unpaired server_tool_use as an interrupted search and injects a synthetic web_search_tool_result_error, which corrupts conversation history by making successful searches appear as failures. After each server_tool_use block, also emit a paired web_search_tool_result with empty content (since OpenAI weaves search results into the text output). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Fixes two gaps identified during plan review for managed-openai-native-web-search.md.
Gap 1: Missing test coverage for web_search_call event handling
Gap 2: No server_tool_use content blocks in ProviderResponse.content for conversation history parity with Anthropic