Skip to content

fix: clean up MCP subprocesses after abrupt parent exit#8242

Merged
DOsinga merged 3 commits into
aaif-goose:mainfrom
fresh3nough:fix/mcp-extension-process-cleanup-8229
Apr 2, 2026
Merged

fix: clean up MCP subprocesses after abrupt parent exit#8242
DOsinga merged 3 commits into
aaif-goose:mainfrom
fresh3nough:fix/mcp-extension-process-cleanup-8229

Conversation

@fresh3nough
Copy link
Copy Markdown
Contributor

Summary

  • set a Linux parent-death signal in the shared subprocess helper used by MCP stdio children
  • keep subprocesses isolated from terminal Ctrl+C while ensuring abrupt goose exits no longer leave MCP children running
  • add an integration test that simulates an abrupt parent exit and asserts the child process is reaped

Testing

  • cargo test --manifest-path /home/ubuntu/github/goose-8229/Cargo.toml -p goose --test subprocess_cleanup
  • cargo clippy --manifest-path /home/ubuntu/github/goose-8229/Cargo.toml -p goose --all-targets -- -D warnings

Closes #8229
Conversation: https://app.warp.dev/conversation/8d563306-2cf0-45bf-830e-de26fed5e691
Co-Authored-By: Oz oz-agent@warp.dev

Set a Linux parent-death signal on shared subprocesses so MCP stdio servers receive SIGTERM if goose is terminated unexpectedly. Add an integration test that simulates an abrupt parent exit and verifies the child process is reaped.

Signed-off-by: fre <anonwurcod@proton.me>
Co-Authored-By: Oz <oz-agent@warp.dev>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12be615185

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/goose/src/subprocess.rs Outdated
Comment on lines +17 to +19
return Err(std::io::Error::other(
"parent process exited before subprocess exec",
));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use async-signal-safe operations in pre_exec closure

The Linux pre_exec hook now returns std::io::Error::other(...) on the parent-PID mismatch path, but this closure executes after fork() where only async-signal-safe work is allowed; constructing this Rust error can invoke non-signal-safe runtime paths (e.g., allocation/locks) and may deadlock child startup in the exact race this change is trying to handle. In this closure, stick to libc/syscall-level behavior (or errno-based io::Error::from_raw_os_error) instead of creating a custom other error.

Useful? React with 👍 / 👎.

Douwe Osinga added 2 commits April 1, 2026 20:29
… deps

- Replace std::io::Error::other() with from_raw_os_error(ESRCH) in the
  pre_exec closure to avoid non-signal-safe allocations after fork()
- Move libc dependency to [target.'cfg(target_os = "linux")'.dependencies]
  since it is only used behind #[cfg(target_os = "linux")]

Signed-off-by: Douwe Osinga <douwe@squareup.com>
Copy link
Copy Markdown
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid fix — the PR_SET_PDEATHSIG + getppid race check is the right pattern, and the test is well-designed.

I pushed two small fixes in 2cc0f93:

  1. Async-signal-safety in pre_exec: replaced std::io::Error::other(...) with std::io::Error::from_raw_os_error(libc::ESRCH) — the pre_exec closure runs after fork() where only async-signal-safe operations are allowed, and Error::other can allocate. (Addresses the Codex review comment.)

  2. libc dependency placement: moved libc from unconditional dependencies to [target.'cfg(target_os = "linux")'.dependencies] since it's only used behind #[cfg(target_os = "linux")].

Also merged origin/main — no conflicts.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2cc0f93b05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +12 to +14
if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM) != 0 {
return Err(std::io::Error::last_os_error());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Degrade gracefully when PR_SET_PDEATHSIG is unavailable

Returning an error from pre_exec when prctl(PR_SET_PDEATHSIG, ...) fails makes every Linux subprocess launch fail in environments where this prctl option is blocked or unsupported (for example, restricted seccomp/container profiles). That turns an optional cleanup hardening into a hard runtime regression for MCP/provider startup. This path should treat unsupported/denied prctl as best-effort (continue without PDEATHSIG) instead of aborting spawn.

Useful? React with 👍 / 👎.

@fresh3nough
Copy link
Copy Markdown
Contributor Author

Awesome

@DOsinga DOsinga added this pull request to the merge queue Apr 2, 2026
Merged via the queue into aaif-goose:main with commit b39762a Apr 2, 2026
22 checks passed
lifeizhou-ap added a commit that referenced this pull request Apr 7, 2026
* origin/main: (32 commits)
  docs: rework homepage and add aaif migration blog post (#8356)
  chore(aaif): rename a bunch of repository references (#8152)
  fix: use OPENAI_API_KEY secret for recipe security scanner (#8358)
  feat: configurable extension timeouts via ACP _meta and global default (#8295)
  fix: hide hidden extensions in UI (#8346)
  refactor: skills as its own platform ext (#8244)
  fix baseUrl (#8347)
  Fix desktop slash commands (#8341)
  fix(cli): display platform-correct secrets path in keyring config dialog (#8328)
  feat(acp): add reusable ACP provider controls (#8314)
  fix: resolve MDX compilation error in using-goosehints.md (#8332)
  fix: use v1beta1 API version for Google/MaaS models on GCP Vertex AI (#8278)
  docs: add MCP Roots guide (#8252)
  rust acp client for extension methods (#8227)
  fix: reconsolidate split tool-call messages to follow OpenAI format (#7921)
  fix: clean up MCP subprocesses after abrupt parent exit (#8242)
  build: raise default stack reserve to 8 MB (#8234)
  fix(config): honour GOOSE_DISABLE_KEYRING from config.yaml at startup (#8219)
  feat: add configurable fast_model for declarative providers (#8194)
  fix(authentication): Allow connecting to Oauth servers that use protected-resource fallback instead of the WWW-authenticate header (#8148)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP extension processes are not cleaned up when Goose sessions end, causing memory leak

2 participants