fix(hooks): wrap the augment CLI child in the orphan guard (#2163)#2169
Open
Minidoracat wants to merge 1 commit into
Open
fix(hooks): wrap the augment CLI child in the orphan guard (#2163)#2169Minidoracat wants to merge 1 commit into
Minidoracat wants to merge 1 commit into
Conversation
…atwari#2163) Follow-up invited by the maintainer on abhigyanpatwari#2165: the augment child (7s local / 12s npx) was the longest-lived unwrapped subprocess, exposed to the same SIGKILL-orphan mechanism fixed for lsof/ps. - Export resolveUnixGuardTimeout from the probe module (both copies, byte-identical); adapters share the same module instance, so the memo and lazy self-test still run at most once per hook process. - Wrap every CLI-executing branch of runGitNexusCli in the three probe-equipped adapters with the guard: budget ceil(inner/1000)+1 seconds with -k 1, strictly above each branch's inner spawnSync timeout, so the supervised path is unchanged and the wrapper only matters once the hook itself is SIGKILLed. Windows and no-guard hosts keep byte-identical argv. The plugin adapter's PATH-direct gitnexus branch (its most common production path) is wrapped too; the cheap which/where probe is not. - Cursor integration: debug-gated 'augment skipped: hook slots saturated' on the slot-starved early return. Its augment child stays unwrapped for now — that integration does not install the probe sibling (the 'cursor probe' item on the abhigyanpatwari#2163 follow-up list). - Reaping tests get a guard-availability precheck with an explicit failure message (assertion, not skipIf, so a coreutils-less Linux host fails diagnosably instead of going silently green). - Tests: orphaned-augment reaping (CJS + Plugin, red without the wrap, ~9.1s reap measured), disabled-sentinel degradation equivalence, source pinning for all three adapters (exact per-branch budget-formula counts) + probe export + cursor debug line. Note: pre-commit typecheck skipped; remaining tsc errors are pre-existing on main (none in files touched here).
|
@Minidoracat is attempting to deploy a commit to the NexusCore Team on Vercel. A member of the Team first needs to authorize it. |
Contributor
CI Report✅ All checks passed Pipeline Status
Test Results
✅ All 11314 tests passed 16 test(s) skipped — expand for details
Code CoverageTests
📋 View full run · Generated by CI |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wraps the augment CLI child — the longest-lived previously-unwrapped subprocess (7 s local / 12 s npx) — in the same self-tested orphan guard that #2165 introduced for
lsof/ps, closing the remaining SIGKILL-orphan window called out in the tri-review (finding 4). Also adds the cursor-integration slot-skip diagnostic and the guard-availability precheck polish from the same review thread.Follow-up to #2165 (maintainer-invited: #2165 (comment) → invitation in the merge comment). Part of the #2163 follow-up list.
Motivation / context
From the #2165 tri-review digest:
When Claude Code's hook timeout SIGKILLs a hook that is blocked in the augment
spawnSync, the CLI child (a full Node process, far heavier thanlsof) reparents to PID 1 and runs to completion. The mechanism, the fix, and the test pattern are identical to #2165 — this PR just extends them.Areas touched
gitnexus/(CLI / core / MCP server) — hooks + tests onlygitnexus-web/.github/eval/or other toolingAlso touches
gitnexus-claude-plugin/hooks/(byte-identical probe copy + plugin adapter) andgitnexus-cursor-integration/hooks/(debug line only).Scope & constraints
In scope
hook-db-lock-probe.cjs(both copies, byte-identical): exportresolveUnixGuardTimeout. Adapters require the same module instance they already use for the lock probe, so the memo is shared and the lazy self-test still runs at most once per hook process — whoever needs the guard first (probe fallback or augment) pays the one-shot ~5 ms.runGitNexusCliin the three probe-equipped adapters (claude CJS, plugin JS, antigravity CJS): every CLI-executing branch is wrapped withtimeout -k 1 <ceil(inner_ms/1000)+1>on Unix when a guard is available. The budget is strictly above each branch's innerspawnSynctimeout (7000 ms → 8 s,timeout+5000→ 13 s), so on a live hook Node's own timer always fires first and the existing error/status contract is untouched — the wrapper only matters once the hook has been SIGKILLed. Windows and no-guard hosts produce byte-identical argv to main. The plugin adapter's PATH-directgitnexusbranch (its most common production path) is wrapped as well; the cheapwhich/wheredetection spawn (3 s, not a CLI execution) deliberately is not.augment skipped: hook slots saturatedon the slot-starved early return (same observability gap fix(hooks): bound db-lock probe subprocesses and gate probe behind hook slot (#2163) #2165's F2 fixed in the other three adapters), following that file's ownGITNEXUS_DEBUGtruthiness convention.timeout. We chose an assertion overskipIfon purpose: a skip would let a coreutils-less Linux host go silently green on the exact regression lane this work exists for; GitHub ubuntu runners always ship coreutils, so CI behavior is unchanged.Explicitly out of scope / not done here
hook-db-lock-probe.cjs(see its README manual-install list), so there is no resolver sibling to require. Doing it properly means adding the probe to its install manifest — the "cursor probe" item on the bug(hooks): Claude db-lock probe leaks orphaned lsof processes on busy Linux hosts — snowballs to sustained 100% CPU #2163 follow-up list, which we'd rather do as its own PR./procscan and the probe-result TTL cache — remain the next planned follow-ups.[Unreleased].Implementation notes
spawnSync-literal-per-branch pattern preserved (argv assembled conditionally), so the spawn-count/windowsHideparity convention holds in every touched file (verified by the in-suite regression check: CJS 4=4, Plugin 6=6, Antigravity 4=4, probe 4=4, cursor 3=3).resolveUnixGuardTimeout()is never called onwin32(guarded at the call sites; the self-test spawns/bin/sh). Documented in the export JSDoc and pinned by the source tests.Testing & verification
cd gitnexus && npx vitest run test/unit/hooks.test.ts— 186/188 (the 2 reds are the pre-existing fix: skip Claude augment hook when GitNexus server owns DB #1493 ENOENT fail-open pair, environmental to the pathological host documented in fix(hooks): bound db-lock probe subprocesses and gate probe behind hook slot (#2163) #2165's Known limitations; verified red on clean base)cd gitnexus && npx vitest run test/unit/resolve-invocation.test.ts test/unit/setup.test.ts test/unit/cli-commands.test.ts test/integration/antigravity-hook-e2e.test.ts— all green (135/135 with the setup suites)npx prettier --check/npx eslinton changed files — 0 errorsnpx tsc --noEmit— no errors in touched files (remaining errors pre-existing on main)diff -qbyte-identicalRed/green evidence: with only the three adapter wraps reverted (probe export and tests kept), both orphaned-augment reaping tests go red — the SIGTERM-immune fake CLI survives the hook's SIGKILL for the full poll window. With the wrap applied, the child is reaped in ~9.1 s (8 s budget + 1 s
-kescalation), green on both the CJS and Plugin paths.🤖 Generated with Claude Code