t1482: poll-based fetch timeout and reduced sub-helper timeouts by marcusquinn · Pull Request #4580 · marcusquinn/aidevops

marcusquinn · 2026-03-14T03:27:10Z

Summary

Follow-up to PR #4575 (merged). Three additional fixes for the pulse dispatch hang:

Replace blocking wait $pid loop with kill -0 poll-based timeout in prefetch_state() parallel gh fetches — wait blocks until the process exits, so the elapsed-time check between waits never fires
Reduce parallel fetch timeout from 120s to 60s — normal completion is <30s, and the wrapper spends ~20s on cleanup before reaching prefetch, so 120s exceeds launchd's cycle
Reduce sub-helper timeouts from 90-120s to 30s each — total prefetch budget: 60s + 30s + 30s + 30s = 150s worst case

Evidence

Production logs show prefetch_state taking 89-182s with the old code. The wait $pid timeout never fired because each wait blocked for the full duration of the hung process. New instances kept starting every 120s (launchd), correctly detecting dead SETUP wrappers, but never completing prefetch before the next cycle.

Closes #4576

Summary by CodeRabbit

Chores

Introduced timeout safeguards for setup and prefetch operations to prevent indefinite hangs and improve process reliability.
Enhanced process cleanup logic with improved handling for setup-phase wrapper detection and stale zombie process termination.
Added per-call execution timeouts to API fetch operations for more predictable behavior and bounded resource usage during initialization.

gemini-code-assist · 2026-03-14T03:27:15Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-03-14T03:27:26Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 17 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: eccad433-0863-493d-82fc-a25e5955bc8e

📥 Commits

Reviewing files that changed from the base of the PR and between 940af8c and 1524a4f.

📒 Files selected for processing (1)

.agents/scripts/pulse-wrapper.sh

Walkthrough

This PR addresses a race condition in pulse-wrapper.sh where setup-phase PID writes cause subsequent invocations to block when prefetch helpers hang. The fix introduces per-call timeouts for prefetch operations and a SETUP: sentinel in the PIDFILE to distinguish setup phases from main execution.

Changes

Cohort / File(s)	Summary
Pulse Wrapper Process Management `.agents/scripts/pulse-wrapper.sh`	Introduces `run_cmd_with_timeout()` function for bounded command execution; modifies `check_dedup()` to parse SETUP sentinels and clean stale zombie processes; enhances `prefetch_state()` with per-call timeouts (90-120s) for gh API calls with fallback logging; updates `main()` to write `SETUP:$$` sentinel instead of plain PID.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

t1482: fix pulse PID self-clash blocks dispatch when prefetch hangs #4575: Directly overlaps this PR—identical changes to pulse-wrapper.sh including run_cmd_with_timeout, SETUP sentinel logic, and per-call timeout integration.
t1425, t1426, t1427: Fix pulse-wrapper zombie accumulation and blocking #4020: Modifies PIDFILE handling and setup-phase dedup logic in pulse-wrapper.sh with zombie cleanup semantics.
t1454: harden pulse pre-run stages and queue accounting #4293: Adds timeout-wrapping infrastructure (run_stage_with_timeout) for pre-run/prefetch stages in pulse-wrapper.sh.

Suggested labels

bug

Poem

Setup sentinels stand guard,

Timeouts whisper "slow down, bard,"

No more PID-clash parade—

Prefetch helpers get timeout aid! 🛡️⏱️

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main changes: introducing poll-based fetch timeout and reducing sub-helper timeouts, which directly aligns with the core objectives of the PR.
Linked Issues check	✅ Passed	The PR comprehensively addresses all requirements from `#4576`: implements SETUP sentinel for PID self-clash prevention, wraps prefetch sub-helpers with run_cmd_with_timeout, replaces blocking wait with kill -0 polling, and reduces timeouts from 90–120s to 30s.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to resolving the pulse dispatch hang issue: SETUP sentinel introduction, per-helper timeout wrappers, and poll-based fetch timeout mechanism align with the linked issue objectives.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bugfix/pulse-pid-clash

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 1633-1636: Delete the duplicated/orphaned exit-code comment block
that documents exit codes (the three lines starting with "#   0   - stage
completed successfully" through "#   else- stage exited with command exit code")
since it was a copy-paste artifact and the real documentation for
run_stage_with_timeout follows immediately; remove the duplicate so only the
intended run_stage_with_timeout comment block remains and no stray comment lines
are left above the function.
- Around line 2759-2763: pulse-session-helper.sh's is_pulse_running() and status
display must handle the PIDFILE sentinel "SETUP:$$" so queries during wrapper
initialization don't show false idle or the raw sentinel; update
is_pulse_running() to read the PIDFILE, detect a "SETUP:<num>" prefix, extract
the numeric PID (<num>) and use that PID in the ps check and in status output
(so it shows "running (PID <num>)" rather than "SETUP:<num>"), and ensure any
other helper logic that calls is_pulse_running() (e.g., status formatting) also
strips the "SETUP:" prefix before presenting or testing the PID; alternatively,
if you prefer separate coordination, replace the SETUP sentinel write in
pulse-wrapper.sh with a distinct temporary sentinel file and keep PIDFILE as the
actual numeric PID only.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0cfc07d4-7859-4722-a65b-475b18f60922

📥 Commits

Reviewing files that changed from the base of the PR and between eaa6c2b and 940af8c.

📒 Files selected for processing (1)

.agents/scripts/pulse-wrapper.sh

.agents/scripts/pulse-wrapper.sh

Two systemic bugs caused the pulse to stop dispatching workers: 1. PID self-clash: main() wrote the wrapper's own PID ($$) to the PID file during setup. When prefetch_state hung and the wrapper was killed by launchd, the next invocation saw the PID file with a live wrapper PID and blocked via check_dedup. Fix: use SETUP:$$ sentinel during setup phase; check_dedup treats SETUP: PIDs as non-blocking since the instance lock already prevents true concurrency. Also adds self-PID detection as a safety net. 2. Prefetch sub-helper hangs: sequential helpers called after the parallel gh fetches (pr-salvage-helper, gh-failure-miner-helper) could hang indefinitely on gh API calls, blocking the entire prefetch_state stage and preventing run_pulse() from ever being reached. Fix: wrap slow sub-helpers with run_cmd_with_timeout (90-120s per helper). Timeouts are non-fatal — the pulse proceeds with degraded state rather than blocking forever. Also adds a 120s hard timeout on the parallel gh fetch wait loop.

The wrapper spends ~20s on cleanup/normalize before reaching prefetch_state. With launchd's 120s StartInterval, a 120s fetch timeout means the wrapper gets killed before the timeout fires. Normal fetch completion is <30s, so 60s is generous while leaving ~40s headroom for sub-helpers and pulse launch.

Total prefetch budget: 60s (parallel gh) + 30s (hygiene) + 30s (ci) + 30s (gh-failure) = 150s worst case. Well within the 600s stage timeout and leaves headroom for pulse launch within launchd's 120s cycle. These helpers are non-critical — pulse proceeds with degraded state if they timeout.

marcusquinn · 2026-03-14T03:39:27Z

Rebased onto latest main, resolved conflicts, and addressed the requested changes (duplicate comment block removal + SETUP PID handling remains normalized in session status paths). Re-running review please. @coderabbitai review

coderabbitai · 2026-03-14T03:39:46Z

@marcusquinn: Acknowledged — rebased onto main, conflict resolution, duplicate comment block removal, and normalized SETUP PID handling in session status paths all noted. Kicking off a fresh review of the updated pulse-wrapper.sh changes now.

🧠 Learnings used

Learnt from: marcusquinn
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-03T16:33:18.313Z
Learning: For the marcusquinn/aidevops repository: The daily quality sweep system (pulse-wrapper.sh) should only mention coderabbitai for full codebase review when there are significant changes (Quality Gate failure, +10 issues in one day, etc.), not on every automated sweep with stable metrics.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

github-actions · 2026-03-14T03:40:01Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 414 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Mar 14 03:39:56 UTC 2026: Code review monitoring started
Sat Mar 14 03:39:57 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 414

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 414
VULNERABILITIES: 0

Generated on: Sat Mar 14 03:39:59 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-03-14T03:40:47Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

alex-solovyev · 2026-03-14T03:50:24Z

CodeRabbit CHANGES_REQUESTED — Verification Complete

Both findings have been addressed in the current branch HEAD (1524a4f):

(1) Minor: Orphaned comment block in `pulse-wrapper.sh` ✅

The orphaned exit-code comment block (lines 1633–1636 in commit 940af8c0) was a copy-paste artifact from run_cmd_with_timeout's documentation that appeared between the function's closing brace and the run_stage_with_timeout header. It is not present in the current HEAD — the rebase cleaned it up. The run_stage_with_timeout header at lines 1654–1666 is the only occurrence and is correctly attached to its function.

(2) Major: `is_pulse_running()` SETUP phase PIDFILE gap ✅

pulse-session-helper.sh is_pulse_running() (lines 96–118) correctly handles the SETUP: sentinel:

IDLE:* or empty → not running (return 1)
SETUP:<pid> → strips prefix, extracts numeric PID, checks via ps -p → returns 0 if alive
Plain numeric PID → standard ps -p check

Status display (lines 432–446) also strips the sentinel for user-facing output, showing setup (PID <num>, pre-flight stages) in yellow during the SETUP window.

ShellCheck

shellcheck .agents/scripts/pulse-wrapper.sh       → 0 violations
shellcheck .agents/scripts/pulse-session-helper.sh → 0 violations

No additional commits needed — both issues are resolved in the current branch state.

Both findings verified as already addressed in current HEAD: (1) orphaned comment block not present, (2) pulse-session-helper.sh already handles SETUP sentinel correctly.

alex-solovyev mentioned this pull request Mar 14, 2026

[Supervisor:alex-solovyev] 0 PRs, 0 assigned, 0 workers at 08:15 UTC #4433

Open

marcusquinn mentioned this pull request Mar 14, 2026

[Supervisor:marcusquinn] 0 PRs, 0 assigned, 1 worker at 07:40 UTC #2645

Open

coderabbitai bot previously requested changes Mar 14, 2026

View reviewed changes

.agents/scripts/pulse-wrapper.sh Outdated Show resolved Hide resolved

.agents/scripts/pulse-wrapper.sh Show resolved Hide resolved

marcusquinn added 3 commits March 14, 2026 03:38

marcusquinn force-pushed the bugfix/pulse-pid-clash branch from 940af8c to 1524a4f Compare March 14, 2026 03:39

marcusquinn merged commit 8fafe93 into main Mar 14, 2026
32 checks passed

marcusquinn deleted the bugfix/pulse-pid-clash branch March 14, 2026 03:53

github-actions bot mentioned this pull request Mar 14, 2026

t1482: fix: pulse PID self-clash blocks dispatch when prefetch_state hangs #4576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

t1482: poll-based fetch timeout and reduced sub-helper timeouts#4580

t1482: poll-based fetch timeout and reduced sub-helper timeouts#4580
marcusquinn merged 3 commits intomainfrom
bugfix/pulse-pid-clash

marcusquinn commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

marcusquinn commented Mar 14, 2026

Uh oh!

coderabbitai bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

sonarqubecloud bot commented Mar 14, 2026

Uh oh!

alex-solovyev commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcusquinn commented Mar 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evidence

Summary by CodeRabbit

Chores

Uh oh!

gemini-code-assist bot commented Mar 14, 2026

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marcusquinn commented Mar 14, 2026

Uh oh!

coderabbitai bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Mar 14, 2026

Quality Gate passed

Uh oh!

alex-solovyev commented Mar 14, 2026

CodeRabbit CHANGES_REQUESTED — Verification Complete

(1) Minor: Orphaned comment block in pulse-wrapper.sh ✅

(2) Major: is_pulse_running() SETUP phase PIDFILE gap ✅

ShellCheck

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

marcusquinn commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading

(1) Minor: Orphaned comment block in `pulse-wrapper.sh` ✅

(2) Major: `is_pulse_running()` SETUP phase PIDFILE gap ✅