Skip to content

fix(opencode): robust process exit detection for child processes#15757

Open
NachoFLizaur wants to merge 3 commits intoanomalyco:devfrom
NachoFLizaur:fix/bash-process-exit-detection
Open

fix(opencode): robust process exit detection for child processes#15757
NachoFLizaur wants to merge 3 commits intoanomalyco:devfrom
NachoFLizaur:fix/bash-process-exit-detection

Conversation

@NachoFLizaur
Copy link
Contributor

@NachoFLizaur NachoFLizaur commented Mar 2, 2026

Issue for this PR

Closes #14769
Possible related issue that might get fixed by this: #15675 (it seems related based on the description)

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

The bash tool and Process.spawn rely solely on Bun's proc.once("exit") to detect when a child process finishes. In containerized environments, this event is not reliably delivered (due to Bun), causing the completion promise to hang forever, tool calls get permanently stuck at status=running.

The fix removes the runtime dependency on Bun for process exit detection, replacing it with a multi-layer strategy that uses OS-level primitives:

  1. close event: independent signal path from exit in Bun's internals
  2. stdio end events (bash tool only): uses the pipe I/O subsystem, independent of process lifecycle monitoring
  3. Polling watchdog (1s interval): checks proc.exitCode and process.kill(pid, 0) to detect dead processes via OS-level signals, completely bypassing Bun's event system
  4. Server-level watchdog (5s interval in prompt.ts): external safety net that force-completes processes stuck past their timeout

All detection paths funnel through a single done() function guarded by a resolved boolean, ensuring exactly-once Promise resolution.

Also improves Shell.killTree() with an alive() check before escalating from SIGTERM to SIGKILL.

How did you verify your code works?

Polling watchdog isolation tests directly prove exit detection works WITHOUT exit/close events firing (simulates the exact Bun failure mode).

Regression tests verify normal bash tool execution is unaffected by the additional detection layers. Shell.killTree tests verify SIGTERM->alive check->SIGKILL escalation.

I also tested it in containerized opencode serve mode, bash tool calls now complete reliably instead of hanging.

Screenshots / recordings

No UI changes.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions bot added contributor needs:compliance This means the issue will auto-close after 2 hours. labels Mar 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

The following comment was made by an LLM, it may be inaccurate:

Based on my searches, I found one potentially related PR:

Related PR Found:

All other search results only returned the current PR (#15757) itself. The searches did not find any open PRs that are direct duplicates of this process exit detection fix. PR #13186 is adjacent in addressing process-related issues but focuses on memory leaks and cleanup handlers rather than the specific exit detection problem.

@NachoFLizaur
Copy link
Contributor Author

fixed description according to template

@github-actions github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Mar 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Thanks for updating your PR! It now meets our contributing guidelines. 👍

…t-detection

# Conflicts:
#	packages/opencode/src/shell/shell.ts
@NachoFLizaur NachoFLizaur force-pushed the fix/bash-process-exit-detection branch from 43f75bc to e4b57cf Compare March 6, 2026 11:37
binarydoubling added a commit to binarydoubling/opencode that referenced this pull request Mar 9, 2026
Port robust process exit detection from PR anomalyco#15757 to fix zombie/stuck
child processes in containers where Bun fails to deliver exit events.

- Add polling watchdog to bash tool and Process.spawn that detects
  process exit via kill(pid, 0) when event-loop events are missed
- Add process registry (active map) with stale/reap exports for
  server-level watchdog to detect and clean up stuck bash processes
- Improve Shell.killTree with alive() helper and proper SIGKILL
  escalation after SIGTERM timeout
- Add session-level watchdog interval in prompt loop to periodically
  reap stale bash processes

Based on the work in anomalyco#15757.

Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
binarydoubling added a commit to binarydoubling/opencode that referenced this pull request Mar 9, 2026
Complete the port of PR anomalyco#15757 with remaining pieces:

- Add stdio end event redundancy as third fallback for exit detection
  (fires when pipe file descriptors close, independent of exit events)
- Add diagnostic log.info calls at spawn, abort, timeout, and each
  exit detection path for debugging container issues
- Add comprehensive tests: defensive patterns, polling watchdog
  isolation, Shell.killTree, server-level watchdog (stale/reap),
  stdio end events, and Process.spawn defensive patterns
- Skip truncation tests on Windows (matching upstream)

Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Intermittent hang: session stays running forever until manual interrupt

1 participant