fix: clean up MCP subprocesses after abrupt parent exit#8242
Conversation
Set a Linux parent-death signal on shared subprocesses so MCP stdio servers receive SIGTERM if goose is terminated unexpectedly. Add an integration test that simulates an abrupt parent exit and verifies the child process is reaped. Signed-off-by: fre <anonwurcod@proton.me> Co-Authored-By: Oz <oz-agent@warp.dev>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12be615185
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return Err(std::io::Error::other( | ||
| "parent process exited before subprocess exec", | ||
| )); |
There was a problem hiding this comment.
Use async-signal-safe operations in pre_exec closure
The Linux pre_exec hook now returns std::io::Error::other(...) on the parent-PID mismatch path, but this closure executes after fork() where only async-signal-safe work is allowed; constructing this Rust error can invoke non-signal-safe runtime paths (e.g., allocation/locks) and may deadlock child startup in the exact race this change is trying to handle. In this closure, stick to libc/syscall-level behavior (or errno-based io::Error::from_raw_os_error) instead of creating a custom other error.
Useful? React with 👍 / 👎.
…cess-cleanup-8229
… deps - Replace std::io::Error::other() with from_raw_os_error(ESRCH) in the pre_exec closure to avoid non-signal-safe allocations after fork() - Move libc dependency to [target.'cfg(target_os = "linux")'.dependencies] since it is only used behind #[cfg(target_os = "linux")] Signed-off-by: Douwe Osinga <douwe@squareup.com>
DOsinga
left a comment
There was a problem hiding this comment.
Solid fix — the PR_SET_PDEATHSIG + getppid race check is the right pattern, and the test is well-designed.
I pushed two small fixes in 2cc0f93:
-
Async-signal-safety in
pre_exec: replacedstd::io::Error::other(...)withstd::io::Error::from_raw_os_error(libc::ESRCH)— thepre_execclosure runs afterfork()where only async-signal-safe operations are allowed, andError::othercan allocate. (Addresses the Codex review comment.) -
libcdependency placement: movedlibcfrom unconditional dependencies to[target.'cfg(target_os = "linux")'.dependencies]since it's only used behind#[cfg(target_os = "linux")].
Also merged origin/main — no conflicts.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2cc0f93b05
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if libc::prctl(libc::PR_SET_PDEATHSIG, libc::SIGTERM) != 0 { | ||
| return Err(std::io::Error::last_os_error()); | ||
| } |
There was a problem hiding this comment.
Degrade gracefully when PR_SET_PDEATHSIG is unavailable
Returning an error from pre_exec when prctl(PR_SET_PDEATHSIG, ...) fails makes every Linux subprocess launch fail in environments where this prctl option is blocked or unsupported (for example, restricted seccomp/container profiles). That turns an optional cleanup hardening into a hard runtime regression for MCP/provider startup. This path should treat unsupported/denied prctl as best-effort (continue without PDEATHSIG) instead of aborting spawn.
Useful? React with 👍 / 👎.
|
Awesome |
* origin/main: (32 commits) docs: rework homepage and add aaif migration blog post (#8356) chore(aaif): rename a bunch of repository references (#8152) fix: use OPENAI_API_KEY secret for recipe security scanner (#8358) feat: configurable extension timeouts via ACP _meta and global default (#8295) fix: hide hidden extensions in UI (#8346) refactor: skills as its own platform ext (#8244) fix baseUrl (#8347) Fix desktop slash commands (#8341) fix(cli): display platform-correct secrets path in keyring config dialog (#8328) feat(acp): add reusable ACP provider controls (#8314) fix: resolve MDX compilation error in using-goosehints.md (#8332) fix: use v1beta1 API version for Google/MaaS models on GCP Vertex AI (#8278) docs: add MCP Roots guide (#8252) rust acp client for extension methods (#8227) fix: reconsolidate split tool-call messages to follow OpenAI format (#7921) fix: clean up MCP subprocesses after abrupt parent exit (#8242) build: raise default stack reserve to 8 MB (#8234) fix(config): honour GOOSE_DISABLE_KEYRING from config.yaml at startup (#8219) feat: add configurable fast_model for declarative providers (#8194) fix(authentication): Allow connecting to Oauth servers that use protected-resource fallback instead of the WWW-authenticate header (#8148) ...
Summary
Testing
Closes #8229
Conversation: https://app.warp.dev/conversation/8d563306-2cf0-45bf-830e-de26fed5e691
Co-Authored-By: Oz oz-agent@warp.dev