fix(acp): avoid duplicate thought chunks around tool calls#8215
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e11715321c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| MessageContent::Thinking(_) | MessageContent::RedactedThinking(_) | ||
| if has_tool_requests => {} |
There was a problem hiding this comment.
Preserve thought content for non-streaming tool responses
This match arm unconditionally drops Thinking/RedactedThinking whenever the response contains any tool request, but some providers emit a single combined assistant message (no prior incremental thought chunks) when streaming is disabled (for example, the non-streaming OpenAI paths return stream_from_single_message). In that case, this change removes the only reasoning content the client would receive for that turn, so ACP/CLI users lose thought output entirely around tool calls instead of just deduplicating replayed chunks.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good point. One proposed alternative is:
keep a per-turn flag in the agent stream loop around crates/goose/src/agents/agent.rs:1238,
something like “have we already surfaced thinking to the client for this assistant turn?”. Then pass that into
categorize_tool_requests and only suppress Thinking/RedactedThinking on tool-call messages when that flag is already
true. After yielding a filtered message, if it still contains thinking, set the flag. That handles both cases:
- Streaming replay bug: earlier thought chunk was already emitted, so the later tool-call replay gets suppressed.
- Non-streaming or no-thought-stream providers: no earlier thought was emitted, so the combined tool-call message
keeps its thinking.
Also looked at 'supports_streaming' as a flag, but that doesn't seem reliable. Codex points out that:
The OpenAI chat-completions streaming path does emit incremental
reasoning and then can replay accumulated reasoning on the tool-call chunk in crates/goose/src/providers/formats/
openai.rs:741 and crates/goose/src/providers/formats/openai.rs:806. But the Responses API streaming path does not emit
thought chunks incrementally here; it only materializes final reasoning from output items in crates/goose/src/
providers/formats/openai_responses.rs:35 and crates/goose/src/providers/formats/openai_responses.rs:790. So a provider
can be “streaming” and still need the final thinking preserved.
i.e. a provider may stream thought chunks in one code path and not in the other even if the overall provider "supports streaming". So the only reliable metric is probably to explicitly track, "were thought chunks streamed or not".
|
Hey @kevklam — the Codex P2 about unconditionally stripping thinking content is a real concern. Your proposed per-turn flag approach (track whether thinking was already surfaced to the client, and only suppress the replay when it was) sounds like the right fix. Could you push an update implementing that? We'll check back tomorrow. Thanks! |
Signed-off-by: Kevin Lam <kevklam@users.noreply.github.com>
Signed-off-by: Kevin Lam <kevklam@users.noreply.github.com>
e117153 to
12d342f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12d342f1a0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| MessageContent::Thinking(_) | MessageContent::RedactedThinking(_) | ||
| if should_suppress_replayed_thinking => {} |
There was a problem hiding this comment.
Preserve new thought deltas in tool-call responses
This branch drops all Thinking/RedactedThinking content whenever should_suppress_replayed_thinking is true, but streamed providers can append fresh reasoning while tool-call chunks are still being assembled (e.g., OpenAI streaming accumulates delta.reasoning_content in response_to_streaming_message before emitting the final tool-call message). In that case, the final tool-call message contains both replayed and newly-added reasoning, and this unconditional drop removes the new suffix entirely from client-visible output instead of only deduplicating previously surfaced content.
Useful? React with 👍 / 👎.
DOsinga
left a comment
There was a problem hiding this comment.
Approving — this is a clean, well-scoped bug fix for duplicate thought chunks in ACP streaming. The per-turn tracking flag correctly handles both streaming providers (where reasoning is replayed on tool-call messages) and non-streaming providers (where the tool-call message is the only source of reasoning). Tests are meaningful and cover both cases. Nice work!
Summary
Avoid replaying already-streamed thinking content in the user-visible filtered message when an assistant response also contains tool calls.
Root Cause
Some providers stream reasoning incrementally and then include the full accumulated thinking again on the assistant message that carries tool calls. Goose was surfacing both:
Over ACP this showed up as duplicate
agent_thought_chunknotifications.What Changed
When a response contains tool requests,
categorize_tool_requestsnow keeps thinking content on the original response for provider/state compatibility, but omits it from the user-visible filtered message that gets emitted to the client.Impact
ACP clients should stop seeing duplicate thought chunks around tool-call boundaries while Goose still preserves reasoning content where providers need it internally.
Validation
Built the CLI target only:
cargo build -p goose-cli --bin goose -j 1