Skip to content

t1482: fix pulse PID self-clash blocks dispatch when prefetch hangs#4575

Merged
alex-solovyev merged 3 commits intomainfrom
bugfix/pulse-pid-clash
Mar 14, 2026
Merged

t1482: fix pulse PID self-clash blocks dispatch when prefetch hangs#4575
alex-solovyev merged 3 commits intomainfrom
bugfix/pulse-pid-clash

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Mar 14, 2026

Summary

  • Fix PID self-clash: wrapper wrote its own PID to the PID file during setup, causing check_dedup to block all future invocations when prefetch_state hung
  • Fix prefetch sub-helper hangs: add per-helper timeouts (90-120s) so a single hanging gh API call can't block the entire pulse indefinitely
  • Add 120s hard timeout on parallel gh fetch wait loop

Root Cause

The pulse stopped dispatching workers ~12.5 hours ago because:

  1. main() wrote $$ (wrapper PID) to the PID file at line 2610
  2. prefetch_state() hung on pr-salvage-helper.sh doing sequential gh api calls across 8 repos
  3. launchd fired a new wrapper every 120s, but check_dedup() saw the old wrapper PID as alive and blocked
  4. The old wrapper was stuck in prefetch, never reaching run_pulse()
  5. Result: infinite loop of lock-acquire → dedup-block → exit, zero workers dispatched

Changes

check_dedup() (PID self-clash fix):

  • Recognizes SETUP:PID sentinel format — treats as non-blocking (instance lock prevents true concurrency)
  • Adds self-PID detection ($$ == old_pid) as safety net
  • Kills zombie SETUP wrappers from previous cycles

main() (PID file format):

  • Writes SETUP:$$ during pre-flight stages instead of plain $$
  • run_pulse() still overwrites with the plain opencode PID for watchdog tracking

prefetch_state() (sub-helper timeouts):

  • prefetch_hygiene (includes pr-salvage): 120s timeout
  • prefetch_ci_failures: 90s timeout
  • prefetch_gh_failure_notifications: 90s timeout
  • Parallel gh fetch wait loop: 120s hard timeout
  • All timeouts are non-fatal — pulse proceeds with degraded state

run_cmd_with_timeout() (new utility):

  • Lightweight per-call timeout for shell functions
  • Uses _kill_tree for clean process group cleanup

Self-healing

These fixes are systemic — they activate on the next aidevops update for all users. No manual intervention needed. The wrapper self-heals by:

  • Recognizing and clearing stale SETUP sentinels from dead wrappers
  • Timing out hung sub-helpers instead of blocking forever
  • Proceeding with partial state rather than failing completely

Closes #4576

Testing

  • ShellCheck clean (zero violations)
  • Verified PID file format change is backward-compatible (existing IDLE:* and numeric PIDs still handled)

Summary by CodeRabbit

  • New Features

    • Status now shows a distinct "setup" pre-flight state separate from running/idle.
  • Bug Fixes

    • Improved detection and cleanup for stalled, dead, or zombie setup processes.
    • Clearer separation between setup and main runtime states to avoid false-positive conflicts.
    • Added per-operation timeouts to prevent indefinite hangs and ensure hung calls are terminated.
    • Enhanced recovery so partial failures degrade gracefully without blocking startup/prefetch.

Two systemic bugs caused the pulse to stop dispatching workers:

1. PID self-clash: main() wrote the wrapper's own PID ($$) to the PID
   file during setup. When prefetch_state hung and the wrapper was killed
   by launchd, the next invocation saw the PID file with a live wrapper
   PID and blocked via check_dedup. Fix: use SETUP:$$ sentinel during
   setup phase; check_dedup treats SETUP: PIDs as non-blocking since the
   instance lock already prevents true concurrency. Also adds self-PID
   detection as a safety net.

2. Prefetch sub-helper hangs: sequential helpers called after the parallel
   gh fetches (pr-salvage-helper, gh-failure-miner-helper) could hang
   indefinitely on gh API calls, blocking the entire prefetch_state stage
   and preventing run_pulse() from ever being reached. Fix: wrap slow
   sub-helpers with run_cmd_with_timeout (90-120s per helper). Timeouts
   are non-fatal — the pulse proceeds with degraded state rather than
   blocking forever. Also adds a 120s hard timeout on the parallel gh
   fetch wait loop.
@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a SETUP:PID sentinel during wrapper startup, implements run_cmd_with_timeout() for per-call timeouts, integrates timeouts into prefetch sub-steps, and updates check_dedup / status logic to handle SETUP/IDLE sentinels, self-detection, and zombie/invalid PID cases.

Changes

Cohort / File(s) Summary
Pulse wrapper (core)
.agents/scripts/pulse-wrapper.sh
Reworked startup PID semantics to write SETUP:<PID> during preflight; expanded check_dedup to parse SETUP/IDLE sentinels, detect self-PID, validate numeric PIDs, handle dead/zombie setup wrappers, and reset to IDLE when appropriate.
Timeout helper & integrations
.agents/scripts/pulse-wrapper.sh
Added run_cmd_with_timeout() (kills process group, returns 124 on timeout) and replaced several direct prefetch/gh calls with timeout-wrapped invocations (120s/90s), using temp-file ephemera and non-fatal degraded paths on timeouts.
Session/status helper
.agents/scripts/pulse-session-helper.sh
Updated is_pulse_running and cmd_status to interpret PIDFILE sentinels (IDLE:*, SETUP:*), validate numeric parts, and present setup vs running state consistently.

Sequence Diagram(s)

sequenceDiagram
    participant Wrapper as "pulse-wrapper.sh"
    participant PIDFILE as "PID file"
    participant Prefetch as "prefetch helper(s) / gh"
    participant OS as "OS (processes)"

    Wrapper->>PIDFILE: write "SETUP:<PID>"
    Wrapper->>Prefetch: run_cmd_with_timeout(prefetch_hygiene, 120s)
    alt prefetch completes
        Prefetch-->>Wrapper: results
    else timeout (returns 124)
        Wrapper->>OS: kill process group of Prefetch (TERM/kill)
        OS-->>Wrapper: processes terminated
        Wrapper-->>PIDFILE: write "IDLE:*" or reset sentinel
    end
    Wrapper->>PIDFILE: write "<PID>" (normal runtime sentinel)
    Wrapper->>OS: spawn main runtime processes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🕐 A small SETUP sentinel stakes its claim,
Timeouts hush calls that otherwise would hang.
Zombies are noticed, then kindly sent away,
The wrapper hums on to start the day.
Pulses now move forward — steady, not astray. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: fixing a PID self-clash issue where the wrapper blocks dispatch when prefetch hangs.
Linked Issues check ✅ Passed All PR objectives directly address linked issue #4576 requirements: SETUP sentinel implementation, per-helper timeouts (90-120s), self-healing behavior, and non-fatal degradation are all implemented.
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #4576: SETUP sentinel logic in check_dedup(), timeout utilities, prefetch helper timeouts, and status display updates align with stated objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/pulse-pid-clash
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 414 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Mar 14 02:55:33 UTC 2026: Code review monitoring started
Sat Mar 14 02:55:33 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 414

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 414
  • VULNERABILITIES: 0

Generated on: Sat Mar 14 02:55:36 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
.agents/scripts/pulse-wrapper.sh (1)

1604-1607: Remove orphaned duplicate doc block

Line 1604-Line 1607 appears to be a leftover header fragment duplicated before run_stage_with_timeout(). Safe to delete for clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/pulse-wrapper.sh around lines 1604 - 1607, Remove the
orphaned duplicate doc block fragment that precedes the run_stage_with_timeout()
function; locate the stray header comments describing exit codes (the duplicated
0/124/else lines) right before the run_stage_with_timeout() definition and
delete that duplicate block so only the single, correct documentation remains
adjacent to run_stage_with_timeout().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 2730-2734: Update the PIDFILE reader in pulse-session-helper.sh to
recognize the "SETUP:" and "IDLE:" sentinels written by pulse-wrapper.sh before
treating the file content as a numeric PID: read PIDFILE into a variable, check
for prefixes "SETUP:" and "IDLE:" and handle those cases (render setup/idle
status) instead of immediately calling ps -p on the raw content; alternatively,
if you prefer backward compatibility, expose the numeric PID in parallel (e.g.,
write a separate numeric PID file or append the raw PID after the sentinel) so
existing ps -p usage still works—adjust the code paths that currently call ps -p
"$pid" to first normalize or extract the numeric PID when present.
- Around line 525-541: The loop currently uses a blocking wait "$pid" which can
hang and prevent re-checking the 120s limit; replace the blocking wait with a
non-blocking polling loop for each PID (use the existing pids array and computed
wait_start/wait_elapsed) that repeatedly checks whether the PID is still alive
(e.g., kill -0 or wait -n semantics), updates wait_elapsed, and if elapsed > 120
logs to LOGFILE and kills remaining PIDs; ensure you preserve the existing log
message and the kill/cleanup behavior when timing out.
- Around line 312-331: The code currently kills whatever process holds setup_pid
without verifying it belongs to pulse-wrapper; change check_dedup so before
calling _kill_tree/_force_kill_tree you validate the PID refers to the expected
prior pulse-wrapper process by checking its commandline or executable and owner:
read /proc/"$setup_pid"/cmdline (or use ps -p "$setup_pid" -o args=) and ensure
it contains the pulse-wrapper identifier (e.g. "pulse-wrapper.sh" or the exact
argv used), and confirm the UID matches the expected user (compare
/proc/"$setup_pid"/status Uid or ps -o uid=); only if both match, proceed to log
and invoke _kill_tree/_force_kill_tree, otherwise log a warning and do not kill,
but still write the IDLE sentinel to "$PIDFILE" as appropriate.

---

Nitpick comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 1604-1607: Remove the orphaned duplicate doc block fragment that
precedes the run_stage_with_timeout() function; locate the stray header comments
describing exit codes (the duplicated 0/124/else lines) right before the
run_stage_with_timeout() definition and delete that duplicate block so only the
single, correct documentation remains adjacent to run_stage_with_timeout().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c066a1fe-f131-42b0-b758-fa364d2368a3

📥 Commits

Reviewing files that changed from the base of the PR and between ee79b69 and 9d2f69f.

📒 Files selected for processing (1)
  • .agents/scripts/pulse-wrapper.sh

…tate

The parallel gh fetch wait loop used 'wait $pid' which blocks until the
process exits. The elapsed-time check between waits never fires when a
single wait hangs for minutes. Replace with kill -0 polling (sleep 2
interval) that reliably enforces the 120s timeout, matching the pattern
already used by run_cmd_with_timeout(). Survivors are killed via
_kill_tree/_force_kill_tree before reaping.
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 414 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Mar 14 03:06:53 UTC 2026: Code review monitoring started
Sat Mar 14 03:06:53 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 414

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 414
  • VULNERABILITIES: 0

Generated on: Sat Mar 14 03:06:55 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.agents/scripts/pulse-wrapper.sh (1)

312-331: ⚠️ Potential issue | 🟠 Major

PID reuse safety concern remains unaddressed

The logic at Lines 325-330 kills setup_pid without verifying it belongs to a pulse-wrapper.sh process. Since the instance lock is already held by the current process, a live process at setup_pid after passing acquire_instance_lock() indicates PID reuse occurred.

The suggested hardening from the prior review—validating the process command line contains pulse-wrapper.sh before killing—should be applied to prevent killing unrelated processes.

🛡️ Suggested hardening
 	if [[ "$pid_content" == SETUP:* ]]; then
 		local setup_pid="${pid_content#SETUP:}"
+		if ! [[ "$setup_pid" =~ ^[0-9]+$ ]]; then
+			echo "[pulse-wrapper] check_dedup: invalid SETUP sentinel '${pid_content}' — resetting to IDLE" >>"$LOGFILE"
+			echo "IDLE:$(date -u +%Y-%m-%dT%H:%M:%SZ)" >"$PIDFILE"
+			return 0
+		fi
+
 		if [[ "$setup_pid" == "$$" ]]; then
 			# We wrote this ourselves — proceed
 			return 0
 		fi
-		if ! ps -p "$setup_pid" >/dev/null 2>&1; then
+		local setup_cmd=""
+		setup_cmd=$(ps -p "$setup_pid" -o command= 2>/dev/null || echo "")
+		if [[ -z "$setup_cmd" ]]; then
 			echo "[pulse-wrapper] check_dedup: SETUP wrapper $setup_pid is dead — proceeding" >>"$LOGFILE"
 			echo "IDLE:$(date -u +%Y-%m-%dT%H:%M:%SZ)" >"$PIDFILE"
 			return 0
 		fi
+		if [[ "$setup_cmd" != *"pulse-wrapper.sh"* ]]; then
+			echo "[pulse-wrapper] check_dedup: SETUP PID $setup_pid belongs to non-wrapper process ('${setup_cmd:0:60}...'); resetting sentinel" >>"$LOGFILE"
+			echo "IDLE:$(date -u +%Y-%m-%dT%H:%M:%SZ)" >"$PIDFILE"
+			return 0
+		fi
 		# SETUP wrapper is alive but we hold the instance lock — it's a zombie

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/pulse-wrapper.sh around lines 312 - 331, The current
check_dedup flow kills the process at setup_pid without confirming it's an
instance of pulse-wrapper.sh, risking killing an unrelated reused PID; before
invoking _kill_tree or _force_kill_tree on setup_pid, probe the process cmdline
(e.g., via ps -p "$setup_pid" -o args=) and verify it contains
"pulse-wrapper.sh" (or another unique marker you use) and only then proceed with
killing and logging; if the cmdline does not match, log a warning, treat it as
PID reuse by an unrelated process, write the IDLE timestamp to PIDFILE and
return without killing. Ensure the checks are applied in the same branch that
currently calls _kill_tree and _force_kill_tree so you avoid terminating
non-wrapper processes.
🧹 Nitpick comments (1)
.agents/scripts/pulse-wrapper.sh (1)

1627-1631: Orphaned documentation block should be removed

Lines 1627-1630 contain exit code documentation that appears between run_cmd_with_timeout() (ends at 1625) and run_stage_with_timeout() (starts at 1631). This duplicates the documentation that properly appears at Lines 1639-1642. Likely a copy-paste remnant.

🧹 Proposed cleanup
 	return $?
 }

-#   0   - stage completed successfully
-#   124 - stage timed out and was killed
-#   else- stage exited with command exit code
-#######################################
 #######################################
 # Run a stage with a wall-clock timeout
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/pulse-wrapper.sh around lines 1627 - 1631, Remove the
orphaned exit-code documentation block that lives between run_cmd_with_timeout()
and run_stage_with_timeout() (the duplicate comment lines "0 - stage completed
successfully", "124 - stage timed out and was killed", "else- stage exited with
command exit code"); it is a copy-paste remnant and should be deleted so only
the proper documentation block (the one at the later location) remains.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 312-331: The current check_dedup flow kills the process at
setup_pid without confirming it's an instance of pulse-wrapper.sh, risking
killing an unrelated reused PID; before invoking _kill_tree or _force_kill_tree
on setup_pid, probe the process cmdline (e.g., via ps -p "$setup_pid" -o args=)
and verify it contains "pulse-wrapper.sh" (or another unique marker you use) and
only then proceed with killing and logging; if the cmdline does not match, log a
warning, treat it as PID reuse by an unrelated process, write the IDLE timestamp
to PIDFILE and return without killing. Ensure the checks are applied in the same
branch that currently calls _kill_tree and _force_kill_tree so you avoid
terminating non-wrapper processes.

---

Nitpick comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 1627-1631: Remove the orphaned exit-code documentation block that
lives between run_cmd_with_timeout() and run_stage_with_timeout() (the duplicate
comment lines "0 - stage completed successfully", "124 - stage timed out and was
killed", "else- stage exited with command exit code"); it is a copy-paste
remnant and should be deleted so only the proper documentation block (the one at
the later location) remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4b09ce8f-9fad-4e9d-a38b-1c1ae33653c9

📥 Commits

Reviewing files that changed from the base of the PR and between 9d2f69f and 4abe559.

📒 Files selected for processing (1)
  • .agents/scripts/pulse-wrapper.sh

… sentinel-aware status

check_dedup (GH#4575):
- Add numeric validation of SETUP sentinel PID before any process checks
- Validate cmdline contains 'pulse-wrapper.sh' via ps -p before killing;
  if PID was reused by an unrelated process, log warning and reset to IDLE
  instead of killing an innocent process

pulse-wrapper.sh:
- Remove orphaned duplicate doc block between run_cmd_with_timeout and
  run_stage_with_timeout (exit codes fragment from copy-paste)

pulse-session-helper.sh:
- is_pulse_running() now parses SETUP:/IDLE: sentinels before treating
  PIDFILE content as numeric PID — prevents ps -p failures on non-numeric
  content and correctly reports IDLE as not-running
- Status display distinguishes SETUP phase (yellow 'setup') from active
  pulse (green 'running')
@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 414 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Mar 14 03:15:52 UTC 2026: Code review monitoring started
Sat Mar 14 03:15:52 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 414

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 414
  • VULNERABILITIES: 0

Generated on: Sat Mar 14 03:15:55 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.agents/scripts/pulse-wrapper.sh (1)

624-626: Differentiate timeout vs generic failure in helper logs.

Each || { ... timed out ... } branch currently logs a timeout for any non-zero return. That can mask real helper errors as timeouts and make incident triage noisy.

♻️ Suggested refactor
-	run_cmd_with_timeout 120 prefetch_hygiene >"$hygiene_tmp" 2>/dev/null || {
-		echo "[pulse-wrapper] prefetch_hygiene timed out after 120s (non-fatal)" >>"$LOGFILE"
-	}
+	local hygiene_rc=0
+	run_cmd_with_timeout 120 prefetch_hygiene >"$hygiene_tmp" 2>/dev/null || hygiene_rc=$?
+	if [[ "$hygiene_rc" -eq 124 ]]; then
+		echo "[pulse-wrapper] prefetch_hygiene timed out after 120s (non-fatal)" >>"$LOGFILE"
+	elif [[ "$hygiene_rc" -ne 0 ]]; then
+		echo "[pulse-wrapper] prefetch_hygiene failed with exit ${hygiene_rc} (non-fatal)" >>"$LOGFILE"
+	fi

-	run_cmd_with_timeout 90 prefetch_ci_failures >"$ci_tmp" 2>/dev/null || {
-		echo "[pulse-wrapper] prefetch_ci_failures timed out after 90s (non-fatal)" >>"$LOGFILE"
-	}
+	local ci_rc=0
+	run_cmd_with_timeout 90 prefetch_ci_failures >"$ci_tmp" 2>/dev/null || ci_rc=$?
+	if [[ "$ci_rc" -eq 124 ]]; then
+		echo "[pulse-wrapper] prefetch_ci_failures timed out after 90s (non-fatal)" >>"$LOGFILE"
+	elif [[ "$ci_rc" -ne 0 ]]; then
+		echo "[pulse-wrapper] prefetch_ci_failures failed with exit ${ci_rc} (non-fatal)" >>"$LOGFILE"
+	fi

-	run_cmd_with_timeout 90 prefetch_gh_failure_notifications >"$ghfail_tmp" 2>/dev/null || {
-		echo "[pulse-wrapper] prefetch_gh_failure_notifications timed out after 90s (non-fatal)" >>"$LOGFILE"
-	}
+	local ghfail_rc=0
+	run_cmd_with_timeout 90 prefetch_gh_failure_notifications >"$ghfail_tmp" 2>/dev/null || ghfail_rc=$?
+	if [[ "$ghfail_rc" -eq 124 ]]; then
+		echo "[pulse-wrapper] prefetch_gh_failure_notifications timed out after 90s (non-fatal)" >>"$LOGFILE"
+	elif [[ "$ghfail_rc" -ne 0 ]]; then
+		echo "[pulse-wrapper] prefetch_gh_failure_notifications failed with exit ${ghfail_rc} (non-fatal)" >>"$LOGFILE"
+	fi

As per coding guidelines "Automation scripts - focus on: Reliability and robustness" and "Clear logging and feedback".

Also applies to: 633-635, 651-653

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/pulse-wrapper.sh around lines 624 - 626, The log currently
treats any non-zero exit from run_cmd_with_timeout as a timeout; change the
pattern used around the prefetch_hygiene invocation (and the similar blocks for
lines referenced) to capture the command exit status into a variable (e.g.,
rc=$?), then discriminate timeout vs other failures by checking for the timeout
exit code your timeout helper uses (commonly 124 for GNU timeout or the
wrapper-defined code) and log either "[pulse-wrapper] prefetch_hygiene timed out
after 120s (non-fatal)" when rc matches the timeout code, or "[pulse-wrapper]
prefetch_hygiene failed with exit $rc" (and include any available stderr or
context) for other non-zero rc values; apply the same change to the other
occurrences (the blocks around prefetch_hygiene at the other noted locations)
and ensure you capture rc immediately after run_cmd_with_timeout before any
other command runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 624-626: The log currently treats any non-zero exit from
run_cmd_with_timeout as a timeout; change the pattern used around the
prefetch_hygiene invocation (and the similar blocks for lines referenced) to
capture the command exit status into a variable (e.g., rc=$?), then discriminate
timeout vs other failures by checking for the timeout exit code your timeout
helper uses (commonly 124 for GNU timeout or the wrapper-defined code) and log
either "[pulse-wrapper] prefetch_hygiene timed out after 120s (non-fatal)" when
rc matches the timeout code, or "[pulse-wrapper] prefetch_hygiene failed with
exit $rc" (and include any available stderr or context) for other non-zero rc
values; apply the same change to the other occurrences (the blocks around
prefetch_hygiene at the other noted locations) and ensure you capture rc
immediately after run_cmd_with_timeout before any other command runs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2322812d-f03b-4274-8358-478a4939da34

📥 Commits

Reviewing files that changed from the base of the PR and between 4abe559 and 19cba81.

📒 Files selected for processing (2)
  • .agents/scripts/pulse-session-helper.sh
  • .agents/scripts/pulse-wrapper.sh

@alex-solovyev alex-solovyev merged commit 12829be into main Mar 14, 2026
21 checks passed
@alex-solovyev alex-solovyev deleted the bugfix/pulse-pid-clash branch March 14, 2026 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Auto-created from TODO.md tag

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t1482: fix: pulse PID self-clash blocks dispatch when prefetch_state hangs

2 participants