fix: prevent tool-pair summarization from being silently undone#7414
fix: prevent tool-pair summarization from being silently undone#7414bzqzheng wants to merge 2 commits intoaaif-goose:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a client/server conversation history desync where server-side tool-pair summarization (hidden summary messages + metadata mutations) could be overwritten by the UI sending stale conversation_so_far, causing summarization work to be silently undone on the next turn.
Changes:
- Server: emits
AgentEvent::HistoryReplacedafter certain in-loop conversation mutations to keep clients synced. - UI: stops sending
conversation_so_faron normal submit turns (keeps it for edit-in-place flow).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
crates/goose/src/agents/agent.rs |
Adds a HistoryReplaced emission after extending/persisting new messages so the UI can resync state after server-side mutations (e.g., tool-pair summarization). |
ui/desktop/src/hooks/useChatStream.ts |
Removes conversation_so_far from the normal /reply request body to avoid overwriting server-authoritative history with stale UI state. |
| let conversation_mutated = !messages_to_add.is_empty(); | ||
| for msg in &messages_to_add { | ||
| session_manager.add_message(&session_config.id, msg).await?; | ||
| } | ||
| conversation.extend(messages_to_add); |
There was a problem hiding this comment.
conversation_mutated is true for most iterations (e.g., normal assistant responses push into messages_to_add), so this will emit HistoryReplaced/UpdateConversation almost every loop, cloning and streaming the full conversation repeatedly and potentially replacing the UI’s incremental stream state (e.g., dropping the just-yielded filtered_response when the persisted history differs), which is likely to cause flicker and high bandwidth/CPU. Consider narrowing the emission to cases where history was changed in a way the client can’t learn from AgentEvent::Message (metadata mutations, tool-pair summarization summary insertion, or other non-append replacements), rather than !messages_to_add.is_empty().
| if conversation_mutated && !did_recovery_compact_this_iteration { | ||
| yield AgentEvent::HistoryReplaced(conversation.clone()); | ||
| } |
There was a problem hiding this comment.
There’s no regression test covering the new expectation that tool-pair summarization results in a HistoryReplaced event; adding a focused test (similar to existing compaction tests) would help prevent reintroducing the client/server history desync.
I don't think that is true. that is how it used to work a long time agao, but now reply(...) just takes the user message and fetches the conversation from the session manager - whatever the UI has doesn't enter into it. If we have this bug though, we should fix it! Can we come up with a test that reproduces this? Since this compaction stuff doesn't actually require us to call the LLM that sounds doable, does it not? |
|
Thanks for the review! The I reproduced this deterministically — full steps in #7413. With Context on why I looked into this: I kept seeing an outdated "Jira extension failed to load" error resurface across turns during personal usage, even after the extension was working. Digging into the code, I found tool-pair summarization was compressing that old failure into a hidden summary that persisted as stale agent context. The desync bug amplifies it (summaries get wiped and recreated each turn from truncated history), but the stale-context problem itself is a separate concern worth its own issue. On tests: Fully agree — very testable without LLM calls. Happy to add a test that mocks |
…-goose#7413) Two changes that fix a conversation state desync between server and UI: 1. Emit HistoryReplaced at end of agent loop iterations where conversation was mutated, so the UI stays in sync with server-side metadata changes (e.g. tool-pair summarization marking messages agent-invisible). 2. Stop sending conversation_so_far on normal chat turns. The server already has the authoritative conversation in its DB. Sending it back from the UI created a window where stale client state could overwrite server-side changes. Edit-in-place flow (onMessageUpdate) is unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two tests that validate the fix from aaif-goose#7413: - test_history_replaced_emitted_after_tool_pair_summarization: proves HistoryReplaced is emitted with hidden summary after summarization - test_stale_conversation_overwrites_hidden_summary: demonstrates the desync mechanism where stale UI state wipes server-side summaries Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
26c1349 to
b0c750f
Compare
|
thanks again for looking into this. I think this is just an unfortunate regression and we should just not call the server with conversation_so_far - I have a small PR to make it so |
|
here's my take: #7438 |
sorry for the delayed review. your PR looks good to me. closing this PR. |
What
Fix a conversation state desync where tool-pair summarization creates server-side hidden messages but the UI never learns about these changes. On the next user turn, the UI sends its stale conversation state, and the server's
replace_conversation(DELETE ALL + INSERT) permanently deletes the hidden summaries — making the summarization silently undo itself every time.Changes
AgentEvent::HistoryReplacedat the end of agent loop iterations where conversation was mutated (tool-pair summarization added hidden messages)conversation_so_farfrom normalhandleSubmitturns (server already has authoritative state in DB)crates/goose/tests/tool_pair_summarization.rs:test_history_replaced_emitted_after_tool_pair_summarization— proves the fix works: HistoryReplaced is emitted with hidden summary and original tool pair marked agent-invisibletest_stale_conversation_overwrites_hidden_summary— proves why the fix matters: demonstrates that without HistoryReplaced, stale UI state would silently wipe server-side summaries via replace_conversationWhy both changes
!did_recovery_compact_this_iterationprevents duplicate HistoryReplaced when recovery compaction already emitted oneVerification
cargo test --package goose --test tool_pair_summarization#[serial]to avoid env var races withGOOSE_TOOL_CALL_CUTOFFReferences