t1469: adapt stale pulse recovery timeout to underfill severity#4348
t1469: adapt stale pulse recovery timeout to underfill severity#4348marcusquinn merged 1 commit intomainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
WalkthroughThis PR modifies the pulse wrapper script to implement adaptive timeout recovery for underfilled worker pools. The system now dynamically adjusts recovery timeout based on the deficit percentage—50%+ deficit uses 300s timeout, 25%+ uses 450s—and adds PIDFILE cleanup during underfill recycling. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Fri Mar 13 03:37:41 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.agents/scripts/pulse-wrapper.sh (1)
248-256:⚠️ Potential issue | 🔴 CriticalKeep the GH#4324 PID sentinel invariant on the early-recycle path.
Line 255 still deletes
PIDFILE. That violates the "PID file is NEVER deleted" contract documented at Lines 34-39 and 174-179, and this PR makes that path trigger sooner and more often. Preserve the sentinel here too socheck_dedup()never falls back to its! -ffast path during recovery.🛠️ Suggested fix
- rm -f "$PIDFILE" + echo "IDLE:$(date -u +%Y-%m-%dT%H:%M:%SZ)" >"$PIDFILE" return 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/pulse-wrapper.sh around lines 248 - 256, Early-recycle path currently deletes PIDFILE (rm -f "$PIDFILE") which breaks the GH#4324 "PID file is NEVER deleted" invariant; instead of removing the file after calling _kill_tree/_force_kill_tree, preserve the sentinel so check_dedup() never sees ! -f. Replace the rm -f "$PIDFILE" step in the block containing _kill_tree and _force_kill_tree with logic that leaves PIDFILE present (either by leaving existing content or overwriting it with the agreed sentinel value) so check_dedup() continues to use the sentinel fast path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 237-245: The current logic can raise adaptive_timeout above the
configured PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT; change it so the computed
candidate time (300 for >=50% deficit, 450 for >=25% deficit) only shortens the
configured timeout — do not increase it. In practice, compute candidate time
based on deficit_pct (using the existing deficit_pct branches), then set
adaptive_timeout to the smaller of candidate and the existing adaptive_timeout
(adaptive_timeout = min(candidate, adaptive_timeout)), keeping the original
adaptive_timeout when the candidate would be longer; reference variables:
adaptive_timeout, PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT, deficit_pct,
max_workers, active_workers.
---
Outside diff comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 248-256: Early-recycle path currently deletes PIDFILE (rm -f
"$PIDFILE") which breaks the GH#4324 "PID file is NEVER deleted" invariant;
instead of removing the file after calling _kill_tree/_force_kill_tree, preserve
the sentinel so check_dedup() never sees ! -f. Replace the rm -f "$PIDFILE" step
in the block containing _kill_tree and _force_kill_tree with logic that leaves
PIDFILE present (either by leaving existing content or overwriting it with the
agreed sentinel value) so check_dedup() continues to use the sentinel fast path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aac7c898-7a52-4786-ae76-8b3695fb174d
📒 Files selected for processing (1)
.agents/scripts/pulse-wrapper.sh
| adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT" | ||
|
|
||
| if [[ "$active_workers" -lt "$max_workers" ]]; then | ||
| deficit_pct=$(((max_workers - active_workers) * 100 / max_workers)) | ||
| if [[ "$deficit_pct" -ge 50 ]]; then | ||
| adaptive_timeout=300 | ||
| elif [[ "$deficit_pct" -ge 25 ]]; then | ||
| adaptive_timeout=450 | ||
| fi |
There was a problem hiding this comment.
Don't let the moderate tier relax a stricter configured timeout.
With the validator at Line 119, PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT=300..449 is valid. Lines 241-244 then force moderate underfill to 450s, which can make recovery slower than the configured baseline instead of faster.
🛠️ Suggested fix
adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT"
if [[ "$active_workers" -lt "$max_workers" ]]; then
deficit_pct=$(((max_workers - active_workers) * 100 / max_workers))
- if [[ "$deficit_pct" -ge 50 ]]; then
+ if [[ "$deficit_pct" -ge 50 && "$adaptive_timeout" -gt 300 ]]; then
adaptive_timeout=300
- elif [[ "$deficit_pct" -ge 25 ]]; then
+ elif [[ "$deficit_pct" -ge 25 && "$adaptive_timeout" -gt 450 ]]; then
adaptive_timeout=450
fi
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT" | |
| if [[ "$active_workers" -lt "$max_workers" ]]; then | |
| deficit_pct=$(((max_workers - active_workers) * 100 / max_workers)) | |
| if [[ "$deficit_pct" -ge 50 ]]; then | |
| adaptive_timeout=300 | |
| elif [[ "$deficit_pct" -ge 25 ]]; then | |
| adaptive_timeout=450 | |
| fi | |
| adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT" | |
| if [[ "$active_workers" -lt "$max_workers" ]]; then | |
| deficit_pct=$(((max_workers - active_workers) * 100 / max_workers)) | |
| if [[ "$deficit_pct" -ge 50 && "$adaptive_timeout" -gt 300 ]]; then | |
| adaptive_timeout=300 | |
| elif [[ "$deficit_pct" -ge 25 && "$adaptive_timeout" -gt 450 ]]; then | |
| adaptive_timeout=450 | |
| fi | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.agents/scripts/pulse-wrapper.sh around lines 237 - 245, The current logic
can raise adaptive_timeout above the configured
PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT; change it so the computed candidate
time (300 for >=50% deficit, 450 for >=25% deficit) only shortens the configured
timeout — do not increase it. In practice, compute candidate time based on
deficit_pct (using the existing deficit_pct branches), then set adaptive_timeout
to the smaller of candidate and the existing adaptive_timeout (adaptive_timeout
= min(candidate, adaptive_timeout)), keeping the original adaptive_timeout when
the candidate would be longer; reference variables: adaptive_timeout,
PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT, deficit_pct, max_workers,
active_workers.



Summary
check_dedup()use an adaptive underfilled stale-recovery timeout based on current worker deficit.Verification
shellcheck .agents/scripts/pulse-wrapper.shbash -n .agents/scripts/pulse-wrapper.shCloses #4347
Refs #4311
Summary by CodeRabbit