t1482: poll-based fetch timeout and reduced sub-helper timeouts#4580
t1482: poll-based fetch timeout and reduced sub-helper timeouts#4580marcusquinn merged 3 commits intomainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR addresses a race condition in pulse-wrapper.sh where setup-phase PID writes cause subsequent invocations to block when prefetch helpers hang. The fix introduces per-call timeouts for prefetch operations and a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 1633-1636: Delete the duplicated/orphaned exit-code comment block
that documents exit codes (the three lines starting with "# 0 - stage
completed successfully" through "# else- stage exited with command exit code")
since it was a copy-paste artifact and the real documentation for
run_stage_with_timeout follows immediately; remove the duplicate so only the
intended run_stage_with_timeout comment block remains and no stray comment lines
are left above the function.
- Around line 2759-2763: pulse-session-helper.sh's is_pulse_running() and status
display must handle the PIDFILE sentinel "SETUP:$$" so queries during wrapper
initialization don't show false idle or the raw sentinel; update
is_pulse_running() to read the PIDFILE, detect a "SETUP:<num>" prefix, extract
the numeric PID (<num>) and use that PID in the ps check and in status output
(so it shows "running (PID <num>)" rather than "SETUP:<num>"), and ensure any
other helper logic that calls is_pulse_running() (e.g., status formatting) also
strips the "SETUP:" prefix before presenting or testing the PID; alternatively,
if you prefer separate coordination, replace the SETUP sentinel write in
pulse-wrapper.sh with a distinct temporary sentinel file and keep PIDFILE as the
actual numeric PID only.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0cfc07d4-7859-4722-a65b-475b18f60922
📒 Files selected for processing (1)
.agents/scripts/pulse-wrapper.sh
Two systemic bugs caused the pulse to stop dispatching workers: 1. PID self-clash: main() wrote the wrapper's own PID ($$) to the PID file during setup. When prefetch_state hung and the wrapper was killed by launchd, the next invocation saw the PID file with a live wrapper PID and blocked via check_dedup. Fix: use SETUP:$$ sentinel during setup phase; check_dedup treats SETUP: PIDs as non-blocking since the instance lock already prevents true concurrency. Also adds self-PID detection as a safety net. 2. Prefetch sub-helper hangs: sequential helpers called after the parallel gh fetches (pr-salvage-helper, gh-failure-miner-helper) could hang indefinitely on gh API calls, blocking the entire prefetch_state stage and preventing run_pulse() from ever being reached. Fix: wrap slow sub-helpers with run_cmd_with_timeout (90-120s per helper). Timeouts are non-fatal — the pulse proceeds with degraded state rather than blocking forever. Also adds a 120s hard timeout on the parallel gh fetch wait loop.
The wrapper spends ~20s on cleanup/normalize before reaching prefetch_state. With launchd's 120s StartInterval, a 120s fetch timeout means the wrapper gets killed before the timeout fires. Normal fetch completion is <30s, so 60s is generous while leaving ~40s headroom for sub-helpers and pulse launch.
Total prefetch budget: 60s (parallel gh) + 30s (hygiene) + 30s (ci) + 30s (gh-failure) = 150s worst case. Well within the 600s stage timeout and leaves headroom for pulse launch within launchd's 120s cycle. These helpers are non-critical — pulse proceeds with degraded state if they timeout.
940af8c to
1524a4f
Compare
|
Rebased onto latest main, resolved conflicts, and addressed the requested changes (duplicate comment block removal + SETUP PID handling remains normalized in session status paths). Re-running review please. @coderabbitai review |
|
🧠 Learnings used✅ Actions performedReview triggered.
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Mar 14 03:39:59 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
CodeRabbit CHANGES_REQUESTED — Verification CompleteBoth findings have been addressed in the current branch HEAD ( (1) Minor: Orphaned comment block in
|
Both findings verified as already addressed in current HEAD: (1) orphaned comment block not present, (2) pulse-session-helper.sh already handles SETUP sentinel correctly.



Summary
Follow-up to PR #4575 (merged). Three additional fixes for the pulse dispatch hang:
wait $pidloop withkill -0poll-based timeout inprefetch_state()parallel gh fetches —waitblocks until the process exits, so the elapsed-time check between waits never firesEvidence
Production logs show
prefetch_statetaking 89-182s with the old code. Thewait $pidtimeout never fired because eachwaitblocked for the full duration of the hung process. New instances kept starting every 120s (launchd), correctly detecting dead SETUP wrappers, but never completing prefetch before the next cycle.Closes #4576
Summary by CodeRabbit
Chores