Skip to content

t1469: adapt stale pulse recovery timeout to underfill severity#4348

Merged
marcusquinn merged 1 commit intomainfrom
bugfix/pulse-adaptive-underfill-recovery
Mar 13, 2026
Merged

t1469: adapt stale pulse recovery timeout to underfill severity#4348
marcusquinn merged 1 commit intomainfrom
bugfix/pulse-adaptive-underfill-recovery

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Mar 13, 2026

Summary

  • Make check_dedup() use an adaptive underfilled stale-recovery timeout based on current worker deficit.
  • Recycle stalled pulse sessions after 300s when underfill is severe (>=50%), 450s for moderate underfill (>=25%), and fallback to configured threshold for minor underfill.
  • Preserve existing behavior for healthy/saturated operation while reducing prolonged underfill windows.

Verification

  • shellcheck .agents/scripts/pulse-wrapper.sh
  • bash -n .agents/scripts/pulse-wrapper.sh

Closes #4347
Refs #4311

Summary by CodeRabbit

  • Improvements
    • Worker pool recovery now uses adaptive timeout mechanisms that dynamically adjust based on resource utilization levels.
    • Enhanced logging with utilization metrics for improved visibility into system state.
    • Optimized cleanup procedures during recovery operations for improved stability.

@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

Walkthrough

This PR modifies the pulse wrapper script to implement adaptive timeout recovery for underfilled worker pools. The system now dynamically adjusts recovery timeout based on the deficit percentage—50%+ deficit uses 300s timeout, 25%+ uses 450s—and adds PIDFILE cleanup during underfill recycling.

Changes

Cohort / File(s) Summary
Adaptive Underfill Recovery Logic
.agents/scripts/pulse-wrapper.sh
Introduces adaptive timeout calculation in check_dedup based on worker pool deficit percentage (50%+ deficit → 300s, 25%+ deficit → 450s). Updates recovery logging to include deficit metrics and adds PIDFILE cleanup during underfill-triggered recycling.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

bug

Poem

🔄 When worker pools grow lean and sparse,
The pulse adapts with steady grace,
From fifty down to twenty-five,
Adaptive healing keeps dreams alive! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adapting the stale pulse recovery timeout to respond dynamically to underfill severity levels.
Linked Issues check ✅ Passed The code changes implement adaptive timeout logic (300s for ≥50% underfill, 450s for ≥25% underfill) and deficit tracking, directly fulfilling the requirements specified in issue #4347.
Out of Scope Changes check ✅ Passed All changes are focused on adaptive underfill recovery timing in check_dedup; no extraneous modifications to unrelated code areas.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/pulse-adaptive-underfill-recovery
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 412 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Fri Mar 13 03:37:38 UTC 2026: Code review monitoring started
Fri Mar 13 03:37:39 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 412

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 412
  • VULNERABILITIES: 0

Generated on: Fri Mar 13 03:37:41 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.agents/scripts/pulse-wrapper.sh (1)

248-256: ⚠️ Potential issue | 🔴 Critical

Keep the GH#4324 PID sentinel invariant on the early-recycle path.

Line 255 still deletes PIDFILE. That violates the "PID file is NEVER deleted" contract documented at Lines 34-39 and 174-179, and this PR makes that path trigger sooner and more often. Preserve the sentinel here too so check_dedup() never falls back to its ! -f fast path during recovery.

🛠️ Suggested fix
-		rm -f "$PIDFILE"
+		echo "IDLE:$(date -u +%Y-%m-%dT%H:%M:%SZ)" >"$PIDFILE"
 		return 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/pulse-wrapper.sh around lines 248 - 256, Early-recycle path
currently deletes PIDFILE (rm -f "$PIDFILE") which breaks the GH#4324 "PID file
is NEVER deleted" invariant; instead of removing the file after calling
_kill_tree/_force_kill_tree, preserve the sentinel so check_dedup() never sees !
-f. Replace the rm -f "$PIDFILE" step in the block containing _kill_tree and
_force_kill_tree with logic that leaves PIDFILE present (either by leaving
existing content or overwriting it with the agreed sentinel value) so
check_dedup() continues to use the sentinel fast path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 237-245: The current logic can raise adaptive_timeout above the
configured PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT; change it so the computed
candidate time (300 for >=50% deficit, 450 for >=25% deficit) only shortens the
configured timeout — do not increase it. In practice, compute candidate time
based on deficit_pct (using the existing deficit_pct branches), then set
adaptive_timeout to the smaller of candidate and the existing adaptive_timeout
(adaptive_timeout = min(candidate, adaptive_timeout)), keeping the original
adaptive_timeout when the candidate would be longer; reference variables:
adaptive_timeout, PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT, deficit_pct,
max_workers, active_workers.

---

Outside diff comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 248-256: Early-recycle path currently deletes PIDFILE (rm -f
"$PIDFILE") which breaks the GH#4324 "PID file is NEVER deleted" invariant;
instead of removing the file after calling _kill_tree/_force_kill_tree, preserve
the sentinel so check_dedup() never sees ! -f. Replace the rm -f "$PIDFILE" step
in the block containing _kill_tree and _force_kill_tree with logic that leaves
PIDFILE present (either by leaving existing content or overwriting it with the
agreed sentinel value) so check_dedup() continues to use the sentinel fast path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aac7c898-7a52-4786-ae76-8b3695fb174d

📥 Commits

Reviewing files that changed from the base of the PR and between be4b2f6 and 4237cbb.

📒 Files selected for processing (1)
  • .agents/scripts/pulse-wrapper.sh

Comment on lines +237 to +245
adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT"

if [[ "$active_workers" -lt "$max_workers" ]]; then
deficit_pct=$(((max_workers - active_workers) * 100 / max_workers))
if [[ "$deficit_pct" -ge 50 ]]; then
adaptive_timeout=300
elif [[ "$deficit_pct" -ge 25 ]]; then
adaptive_timeout=450
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let the moderate tier relax a stricter configured timeout.

With the validator at Line 119, PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT=300..449 is valid. Lines 241-244 then force moderate underfill to 450s, which can make recovery slower than the configured baseline instead of faster.

🛠️ Suggested fix
 	adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT"

 	if [[ "$active_workers" -lt "$max_workers" ]]; then
 		deficit_pct=$(((max_workers - active_workers) * 100 / max_workers))
-		if [[ "$deficit_pct" -ge 50 ]]; then
+		if [[ "$deficit_pct" -ge 50 && "$adaptive_timeout" -gt 300 ]]; then
 			adaptive_timeout=300
-		elif [[ "$deficit_pct" -ge 25 ]]; then
+		elif [[ "$deficit_pct" -ge 25 && "$adaptive_timeout" -gt 450 ]]; then
 			adaptive_timeout=450
 		fi
 	fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT"
if [[ "$active_workers" -lt "$max_workers" ]]; then
deficit_pct=$(((max_workers - active_workers) * 100 / max_workers))
if [[ "$deficit_pct" -ge 50 ]]; then
adaptive_timeout=300
elif [[ "$deficit_pct" -ge 25 ]]; then
adaptive_timeout=450
fi
adaptive_timeout="$PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT"
if [[ "$active_workers" -lt "$max_workers" ]]; then
deficit_pct=$(((max_workers - active_workers) * 100 / max_workers))
if [[ "$deficit_pct" -ge 50 && "$adaptive_timeout" -gt 300 ]]; then
adaptive_timeout=300
elif [[ "$deficit_pct" -ge 25 && "$adaptive_timeout" -gt 450 ]]; then
adaptive_timeout=450
fi
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/pulse-wrapper.sh around lines 237 - 245, The current logic
can raise adaptive_timeout above the configured
PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT; change it so the computed candidate
time (300 for >=50% deficit, 450 for >=25% deficit) only shortens the configured
timeout — do not increase it. In practice, compute candidate time based on
deficit_pct (using the existing deficit_pct branches), then set adaptive_timeout
to the smaller of candidate and the existing adaptive_timeout (adaptive_timeout
= min(candidate, adaptive_timeout)), keeping the original adaptive_timeout when
the candidate would be longer; reference variables: adaptive_timeout,
PULSE_UNDERFILLED_STALE_RECOVERY_TIMEOUT, deficit_pct, max_workers,
active_workers.

@marcusquinn marcusquinn merged commit 5563e59 into main Mar 13, 2026
31 checks passed
@marcusquinn marcusquinn deleted the bugfix/pulse-adaptive-underfill-recovery branch March 13, 2026 03:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t1469: Adapt stale pulse recovery timeout to underfill severity

1 participant