Skip to content

fix: prevent tool-pair summarization from being silently undone#7414

Closed
bzqzheng wants to merge 2 commits intoaaif-goose:mainfrom
bzqzheng:fix/7413-tool-pair-summarization-silent-delete
Closed

fix: prevent tool-pair summarization from being silently undone#7414
bzqzheng wants to merge 2 commits intoaaif-goose:mainfrom
bzqzheng:fix/7413-tool-pair-summarization-silent-delete

Conversation

@bzqzheng
Copy link
Copy Markdown
Contributor

@bzqzheng bzqzheng commented Feb 22, 2026

What

Fix a conversation state desync where tool-pair summarization creates server-side hidden messages but the UI never learns about these changes. On the next user turn, the UI sends its stale conversation state, and the server's replace_conversation (DELETE ALL + INSERT) permanently deletes the hidden summaries — making the summarization silently undo itself every time.

Changes

  • Server: Emit AgentEvent::HistoryReplaced at the end of agent loop iterations where conversation was mutated (tool-pair summarization added hidden messages)
  • UI: Remove conversation_so_far from normal handleSubmit turns (server already has authoritative state in DB)
  • Tests: Add two integration tests in crates/goose/tests/tool_pair_summarization.rs:
    • test_history_replaced_emitted_after_tool_pair_summarization — proves the fix works: HistoryReplaced is emitted with hidden summary and original tool pair marked agent-invisible
    • test_stale_conversation_overwrites_hidden_summary — proves why the fix matters: demonstrates that without HistoryReplaced, stale UI state would silently wipe server-side summaries via replace_conversation

Why both changes

  • Server-side emit keeps UI state accurate after tool-pair summarization
  • UI stops sending stale conversation_so_far so server's work can't be overwritten
  • Guard !did_recovery_compact_this_iteration prevents duplicate HistoryReplaced when recovery compaction already emitted one

Verification

  • Both new integration tests pass: cargo test --package goose --test tool_pair_summarization
  • Tests use #[serial] to avoid env var races with GOOSE_TOOL_CALL_CUTOFF
  • No mock LLM calls needed — tests construct Agent with MockSummarizationProvider

References

@bzqzheng bzqzheng marked this pull request as ready for review February 22, 2026 01:43
Copilot AI review requested due to automatic review settings February 22, 2026 01:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a client/server conversation history desync where server-side tool-pair summarization (hidden summary messages + metadata mutations) could be overwritten by the UI sending stale conversation_so_far, causing summarization work to be silently undone on the next turn.

Changes:

  • Server: emits AgentEvent::HistoryReplaced after certain in-loop conversation mutations to keep clients synced.
  • UI: stops sending conversation_so_far on normal submit turns (keeps it for edit-in-place flow).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
crates/goose/src/agents/agent.rs Adds a HistoryReplaced emission after extending/persisting new messages so the UI can resync state after server-side mutations (e.g., tool-pair summarization).
ui/desktop/src/hooks/useChatStream.ts Removes conversation_so_far from the normal /reply request body to avoid overwriting server-authoritative history with stale UI state.

Comment on lines +1610 to 1614
let conversation_mutated = !messages_to_add.is_empty();
for msg in &messages_to_add {
session_manager.add_message(&session_config.id, msg).await?;
}
conversation.extend(messages_to_add);
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conversation_mutated is true for most iterations (e.g., normal assistant responses push into messages_to_add), so this will emit HistoryReplaced/UpdateConversation almost every loop, cloning and streaming the full conversation repeatedly and potentially replacing the UI’s incremental stream state (e.g., dropping the just-yielded filtered_response when the persisted history differs), which is likely to cause flicker and high bandwidth/CPU. Consider narrowing the emission to cases where history was changed in a way the client can’t learn from AgentEvent::Message (metadata mutations, tool-pair summarization summary insertion, or other non-append replacements), rather than !messages_to_add.is_empty().

Copilot uses AI. Check for mistakes.
Comment on lines +1620 to +1622
if conversation_mutated && !did_recovery_compact_this_iteration {
yield AgentEvent::HistoryReplaced(conversation.clone());
}
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s no regression test covering the new expectation that tool-pair summarization results in a HistoryReplaced event; adding a focused test (similar to existing compaction tests) would help prevent reintroducing the client/server history desync.

Copilot uses AI. Check for mistakes.
@DOsinga
Copy link
Copy Markdown
Collaborator

DOsinga commented Feb 22, 2026

Fixes a conversation state desync where tool-pair summarization creates server-side hidden messages (userVisible: false, agentVisible: true) and mutates prior message metadata, but the UI never learns about these changes. On the next user turn, the UI sends its stale conversation_so_far, and the server blindly calls replace_conversation — permanently deleting the hidden summaries and reverting metadata mutations.

I don't think that is true. that is how it used to work a long time agao, but now reply(...) just takes the user message and fetches the conversation from the session manager - whatever the UI has doesn't enter into it.

If we have this bug though, we should fix it! Can we come up with a test that reproduces this? Since this compaction stuff doesn't actually require us to call the LLM that sounds doable, does it not?

@bzqzheng
Copy link
Copy Markdown
Contributor Author

Thanks for the review! The None path does fetch from the session manager — but currently on main, the UI sends conversation_so_far on every turn with existing messages, and the server calls replace_conversation() when that field is present, which does a full DELETE + INSERT before processing the new message. So the UI inadvertently overwrites the server's state each turn.

I reproduced this deterministically — full steps in #7413. With GOOSE_TOOL_CALL_CUTOFF: 3: after turn 1, the server creates a hidden summary (userVisible:false, agentVisible:true). On the next turn, that summary is gone — the UI's conversation_so_far never included it, and replace_conversation wiped it.

Context on why I looked into this: I kept seeing an outdated "Jira extension failed to load" error resurface across turns during personal usage, even after the extension was working. Digging into the code, I found tool-pair summarization was compressing that old failure into a hidden summary that persisted as stale agent context. The desync bug amplifies it (summaries get wiped and recreated each turn from truncated history), but the stale-context problem itself is a separate concern worth its own issue.

On tests: Fully agree — very testable without LLM calls. Happy to add a test that mocks complete_fast(), creates tool pairs past the cutoff, runs summarization, then asserts hidden summaries survive a round-trip.

bzqzheng and others added 2 commits February 23, 2026 12:29
…-goose#7413)

Two changes that fix a conversation state desync between server and UI:

1. Emit HistoryReplaced at end of agent loop iterations where conversation
   was mutated, so the UI stays in sync with server-side metadata changes
   (e.g. tool-pair summarization marking messages agent-invisible).

2. Stop sending conversation_so_far on normal chat turns. The server
   already has the authoritative conversation in its DB. Sending it back
   from the UI created a window where stale client state could overwrite
   server-side changes. Edit-in-place flow (onMessageUpdate) is unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add two tests that validate the fix from aaif-goose#7413:
- test_history_replaced_emitted_after_tool_pair_summarization: proves
  HistoryReplaced is emitted with hidden summary after summarization
- test_stale_conversation_overwrites_hidden_summary: demonstrates the
  desync mechanism where stale UI state wipes server-side summaries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 23, 2026 17:31
@bzqzheng bzqzheng force-pushed the fix/7413-tool-pair-summarization-silent-delete branch from 26c1349 to b0c750f Compare February 23, 2026 17:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@DOsinga
Copy link
Copy Markdown
Collaborator

DOsinga commented Feb 23, 2026

thanks again for looking into this. I think this is just an unfortunate regression and we should just not call the server with conversation_so_far - I have a small PR to make it so

@DOsinga
Copy link
Copy Markdown
Collaborator

DOsinga commented Feb 23, 2026

here's my take: #7438

@bzqzheng
Copy link
Copy Markdown
Contributor Author

here's my take: #7438

sorry for the delayed review. your PR looks good to me. closing this PR.

@bzqzheng bzqzheng closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tool-pair summarization creates hidden messages that are silently deleted on next client round-trip

3 participants