Skip to content

Conversation

@DOsinga
Copy link
Collaborator

@DOsinga DOsinga commented Jul 21, 2025

I couldn't actually reproduce the session corruption easily, but my theory is that it can happen because of two reasons:

  • when we interrupt the agent, we don't actually cancel the agent. we just hang up. the agent will continue running until at some point writing to the stream will fail.
  • we don't write the messages our atomically, but inline

the two combined can lead to us writing a message to the agent while the user has already canceled. If the user fires up a new command before say an MCP server has finished, this can create havoc.

this PR introduces a cancellation token and makes the message appending atomic-ish. it also deletes the askAi handler since that is not used.

@DOsinga DOsinga requested review from baxen and jamadeo July 21, 2025 11:04
@DOsinga DOsinga self-assigned this Jul 21, 2025
Some("background") => "chat".to_string(),
_ => config
.get_param("GOOSE_MODE")
.unwrap_or_else(|_| "auto".to_string()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to your PR, but this mapping looks quite questionable. Why chat mode for background jobs? This looks like the root of a recent bug report

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kvadratni thoughts?

}
if let JsonRpcMessage::Notification(notif) = &notification {
if let Some(data) = notif.params.as_ref().and_then(|p| p.get("data")) {
if let (Some(subagent_id), Some(_message)) = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust 2024 lets you chain these! #3464

Comment on lines 719 to 720
if let Some(token) = &cancel_token {
if token.is_cancelled() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be

cancel_token.is_some_and(|token| token.is_cancelled())

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the AIs say you need an as_ref() in there still

pub fn routes(state: Arc<AppState>) -> Router {
Router::new()
.route("/reply", post(handler))
.route("/ask", post(ask_handler))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ask is old - I thought /reply is still used though?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is - I just renamed handler to reply_handler here. the code still exists!

@michaelneale
Copy link
Collaborator

seems sensible, and could prevent some other problems of processes hanging around.

I also still encounter hanging, and then unkillable sessions, wonder if this will help with that (vs trying to hang up)

@DOsinga DOsinga merged commit 9f356e7 into main Jul 22, 2025
8 checks passed
@DOsinga DOsinga deleted the agent-loop-defensive branch July 22, 2025 16:21
katzdave added a commit that referenced this pull request Jul 22, 2025
* 'main' of github.com:block/goose: (23 commits)
  fix: add fallback id to messages if none provided (#3584)
  feat: migrate ErrorData from internal mcp crates to rmcp version (#3586)
  fix: adjust subrecipe description to allow running tests (#3585)
  Scenario tests (#3430)
  feat: migrate JsonRpcMessage/Request/Response/Error/Notification from internal mcp crates to rmcp versions (#3564)
  Restore recipe parameters functionality (#3530)
  feat: Enhanced loading states with thinking icons and flying bird animation (#3478)
  Agent loop defensive (#3554)
  chore: remove needless clone() in goose/providers (#2528)
  Add recipe install warning (#3537)
  Replace mcp_core::resource::* with rmcp types (#3563)
  Add YouTube video embed to using-goosehints.md (#3560)
  fix: ensure retry-config and success-criteria are populated in openapi spec (#3575)
  fix: use sequential when sub recipe task is 1. (#3573)
  fix: track message id to keep like with like (#3572)
  Replace mcp_core::prompt with rmcp::model types (#3561)
  feat (ui): close recipe modals with esc key (#3568)
  feat: recipes can retry with success criteria (#3474)
  Env var to set Ollama request timeout (#3516)
  Updating docs to match new UI (#3552)
  ...
michaelneale added a commit that referenced this pull request Jul 23, 2025
* main:
  docs: desktop recipe format (#3594)
  Fix model display name not being updated immediately after leaving settings (#3587)
  Added option to summarize the chat when an error is triggered (#3598)
  Remove mcp_macros and unused types (#3581)
  fix: add fallback id to messages if none provided (#3584)
  feat: migrate ErrorData from internal mcp crates to rmcp version (#3586)
  fix: adjust subrecipe description to allow running tests (#3585)
  Scenario tests (#3430)
  feat: migrate JsonRpcMessage/Request/Response/Error/Notification from internal mcp crates to rmcp versions (#3564)
  Restore recipe parameters functionality (#3530)
  feat: Enhanced loading states with thinking icons and flying bird animation (#3478)
  Agent loop defensive (#3554)
  chore: remove needless clone() in goose/providers (#2528)
  Add recipe install warning (#3537)
  Replace mcp_core::resource::* with rmcp types (#3563)
  Add YouTube video embed to using-goosehints.md (#3560)
  fix: ensure retry-config and success-criteria are populated in openapi spec (#3575)
atarantino pushed a commit to atarantino/goose that referenced this pull request Aug 5, 2025
Co-authored-by: Douwe Osinga <[email protected]>
Signed-off-by: Adam Tarantino <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants