Skip to content

Conversation

@GaryZhous
Copy link
Contributor

@GaryZhous GaryZhous commented Jul 24, 2025

When users create subprocesses through Goose (e.g., python3 server.py or npm run dev) and then interrupt them with Ctrl + C (or Cmd + C on macOS), the Goose session would "freeze." This occurred because the subprocess continued running in the background, blocking the continuation of the chat session. Users had to manually find and kill these processes using commands like lsof and kill to resume the chat. This pull request introduces enhancements to process management and cleanup in the Goose CLI and MCP modules, particularly when handling shell commands and subprocesses. It ensures better resource cleanup to prevent lingering processes and potential system "freezing." The key changes include adding a process cleanup function, and integrating it with the already-existing token cancellation method, improving shell command execution, and making platform-specific adjustments for better compatibility.

Process tracking and cleanup improvements:

  • Added the new FilePidTracker utility (crates/goose-mcp/src/file_pid_tracker.rs) to persistently track shell subprocess PIDs, enabling reliable cleanup of orphaned processes and registration/unregistration of process information.
  • Integrated process registration and cleanup into shell command execution in DeveloperRouter::bash, storing PIDs on process start and removing them on completion or cancellation. Also, shell commands are now wrapped with setsid bash -c on Unix for better isolation.
  • Implemented a cleanup routine (cleanup_shell_processes) in the CLI session module to terminate any leftover shell processes when a shell tool is interrupted. [1] [2]

Cancellation support:

  • Added cancellation token handling throughout the tool execution pipeline, including new trait methods (call_tool_with_cancellation) in the router, and propagation of cancellation tokens from the MCP server to tool execution. This ensures that cancelled requests terminate subprocesses and unregister PIDs. [1] [2] [3] [4]

Dependency and codebase updates:

  • Added dependencies for process tracking and cancellation (uuid, tokio-util) to relevant crates. [1] [2]
  • Refactored imports and module structure to expose the new tracker and cancellation utilities. [1] [2] [3] [4] [5]

These changes collectively make shell command execution safer and more manageable, especially in cases of interruption or cancellation, and lay the groundwork for robust process lifecycle management across the application.

Before image

After image

@DOsinga
Copy link
Collaborator

DOsinga commented Jul 25, 2025

hey, thanks for jumping on this. our thinking was to solve this along these lines:

#3554

this introduces a cancellation token and should take care of the freezing. it does not kill the process that was started though, but it does feel like the right way forward.

@DOsinga DOsinga self-assigned this Jul 25, 2025
@GaryZhous
Copy link
Contributor Author

GaryZhous commented Jul 25, 2025

hey, thanks for jumping on this. our thinking was to solve this along these lines:

#3554

this introduces a cancellation token and should take care of the freezing. it does not kill the process that was started though, but it does feel like the right way forward.

Thanks for the context! I will refine this PR.

Copy link
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right idea, but I don't think this is where it should be implemented;

I think we should add code to dispatch_toolcall and forward the cancelation token to extension manager - how we go from there seems tricky

@GaryZhous GaryZhous requested a review from DOsinga July 31, 2025 17:52
@GaryZhous
Copy link
Contributor Author

I think this is the right idea, but I don't think this is where it should be implemented;

I think we should add code to dispatch_toolcall and forward the cancelation token to extension manager - how we go from there seems tricky

@DOsinga Hi, do you mind elaborating on this? Sorry that I don't have the full context of the stuff you guys are working on.

@DOsinga
Copy link
Collaborator

DOsinga commented Aug 5, 2025

this is the way: #3782

it forwards the cancelation token to the MCP servers. once we have that merged, we can modify the developer tools to respect that

@GaryZhous
Copy link
Contributor Author

this is the way: #3782

it forwards the cancelation token to the MCP servers. once we have that merged, we can modify the developer tools to respect that

@DOsinga I found that cancellation token forwarding to MCP servers is already implemented. My PR (#3638) adds PID-based process management for spawned processes.

Should I:

  1. Merge the PID management PR as-is (it complements existing cancellation tokens)
  2. Modify the cancellation token implementation in some way
  3. Integrate PID tracking with cancellation tokens

Which approach would you prefer?

@michaelneale
Copy link
Collaborator

thanks @GaryZhous we did merge some changes which look like need some cleaning up as conflicts, I think still worth looking at (changes were to do with how some streams were handled, and simplfying io a bit)

@GaryZhous
Copy link
Contributor Author

GaryZhous commented Aug 11, 2025

thanks @GaryZhous we did merge some changes which look like need some cleaning up as conflicts, I think still worth looking at (changes were to do with how some streams were handled, and simplfying io a bit)

@michaelneale any suggestions for this PR to move forwards? Thanks in advance! Also, my PR effectively addresses issues like this #3983

@michaelneale
Copy link
Collaborator

I think this would help it not freeze up when spawning, but won't stop goose itself from freezing if there is a spawned long running process? do I misread it?

@GaryZhous
Copy link
Contributor Author

GaryZhous commented Aug 12, 2025

I think this would help it not freeze up when spawning, but won't stop goose itself from freezing if there is a spawned long running process? do I misread it?

@michaelneale not really, previously, if we let goose run a long-running process like python3 server.py which creates a server at let's say port 8000, goose will wait til the user calls interrupt Ctrl + C or Command + C for SIGINT. And yes it seems like we can continue chatting with goose, however, the interrupt only stops goose from hanging but it doesn't terminate the server we just started. Therefore, Goose will "freeze" due to the orphanized process, saying things like "navigating knowledge graph..." continuously. User has to locate the orphan process themselves and use commands like kill <PID> to end it, then goose will stop freezing and talk to the user. The cancellation token is a very powerful approach, it doesn't only end the orphanized process, it also end the entire goose session...

@zanesq
Copy link
Collaborator

zanesq commented Aug 22, 2025

Still in progress?

@DOsinga
Copy link
Collaborator

DOsinga commented Aug 27, 2025

I can't find back where we discussed this, but I don't think file based tracking of PIDs can work. this would just lead to us killing all processes goose is currently running. The way to do this is to have the extensions react to the cancelation token

@DOsinga DOsinga closed this Aug 27, 2025
@GaryZhous GaryZhous deleted the GaryZ/Refine-CLI branch September 2, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants