t1429: Separate stats from pulse to prevent rate-limit blocking#4094
t1429: Separate stats from pulse to prevent rate-limit blocking#4094alex-solovyev merged 4 commits intomainfrom
Conversation
- Remove run_daily_quality_sweep and update_health_issues from pulse-wrapper.sh main() — these blocked the pulse for hours when GitHub Search API rate limits were exhausted (30 req/min) - Create stats-wrapper.sh as a separate cron process (every 15 min) with its own PID dedup and timeout for quality/health/person-stats - Fix contributor-activity-helper.sh infinite sleep loop — bail out with partial results instead of waiting for rate limit reset - Wire stats-wrapper cron in setup.sh (auto-installed when pulse enabled)
WalkthroughDecouples pulse housekeeping into a new stats-wrapper scheduled job, alters contributor-activity-helper to return partial results instead of sleeping on GitHub Search API rate limits, and reorders pulse-wrapper execution to prefetch state earlier and remove inline sweeps. Changes
Sequence Diagram(s)sequenceDiagram
participant Cron as Cron/LaunchAgent
participant Pulse as pulse-wrapper.sh
participant Stats as stats-wrapper.sh
participant Helper as contributor-activity-helper.sh
participant GitHub as GitHub API
rect rgba(100,150,200,0.5)
Note over Cron,Pulse: Before — pulse performed sweeps inline
Cron->>Pulse: trigger pulse
Pulse->>Helper: request contribution data
Helper->>GitHub: search API call
GitHub-->>Helper: rate-limit nearing
Helper->>Helper: sleep until reset
Pulse->>Pulse: run quality sweep & update health
Pulse->>Pulse: dispatch workers
end
rect rgba(150,200,100,0.5)
Note over Cron,Stats: After — sweeps decoupled to stats-wrapper
par Pulse Execution
Cron->>Pulse: trigger pulse
Pulse->>Pulse: prefetch state & cleanup
Pulse->>Helper: request contribution data
Helper->>GitHub: search API call
GitHub-->>Helper: rate-limit nearing
Helper-->>Pulse: return partial results (is_partial=true)
Pulse->>Pulse: dispatch workers
and Stats Execution
Cron->>Stats: trigger stats-wrapper (separate schedule)
Stats->>Stats: dedup check, run_daily_quality_sweep, update_health_issues
Stats-->>Stats: log completion
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Tue Mar 10 20:05:43 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where the pulse process was being stalled indefinitely due to GitHub API rate limits encountered during statistics gathering. It introduces a separate cron job for statistics and modifies the rate limit handling in Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses a critical blocking issue by separating long-running statistics generation from the main pulse process. The introduction of stats-wrapper.sh and the modification to contributor-activity-helper.sh to prevent infinite blocking are solid architectural improvements. The changes are well-documented and the setup script is correctly updated. I've identified one high-severity portability issue in the new stats-wrapper.sh script that would cause it to fail on macOS systems due to the use of a non-standard date command option. I have provided a code suggestion to resolve this by using a more portable method to determine process age.
.agents/scripts/stats-wrapper.sh
Outdated
| local start_time | ||
| start_time=$(ps -o lstart= -p "$old_pid" 2>/dev/null) || { | ||
| rm -f "$STATS_PIDFILE" | ||
| return 0 | ||
| } | ||
|
|
||
| local start_epoch | ||
| start_epoch=$(date -d "$start_time" +%s 2>/dev/null) || start_epoch=0 | ||
| local now | ||
| now=$(date +%s) | ||
| local elapsed=$((now - start_epoch)) |
There was a problem hiding this comment.
The use of date -d on line 56 introduces a portability issue, as this option is a GNU extension and is not available on macOS/BSD systems, which this repository appears to support. This will cause the script to fail when checking the age of a running process on macOS.
A more portable and simpler approach is to use ps -o etimes= which directly provides the elapsed time of the process in seconds. This avoids the need for parsing dates and works on both Linux and macOS.
| local start_time | |
| start_time=$(ps -o lstart= -p "$old_pid" 2>/dev/null) || { | |
| rm -f "$STATS_PIDFILE" | |
| return 0 | |
| } | |
| local start_epoch | |
| start_epoch=$(date -d "$start_time" +%s 2>/dev/null) || start_epoch=0 | |
| local now | |
| now=$(date +%s) | |
| local elapsed=$((now - start_epoch)) | |
| local elapsed | |
| elapsed=$(ps -p "$old_pid" -o etimes= 2>/dev/null | tr -d ' ') | |
| # If ps fails or the process has disappeared, elapsed will be empty. | |
| # In this case, we can clean up the PID file and allow a new run. | |
| if [[ -z "$elapsed" ]]; then | |
| rm -f "$STATS_PIDFILE" | |
| return 0 | |
| fi |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
.agents/scripts/stats-wrapper.sh (1)
87-99: Point the sourced helper logs atstats.logtoo.After sourcing
.agents/scripts/pulse-wrapper.sh, the shared helpers still append toLOGFILE, and that file is hardcoded there to~/.aidevops/logs/pulse.log.stats.logwill only show wrapper start/finish, which makes this new cron much harder to troubleshoot.As per coding guidelines, automation scripts should focus on clear logging and feedback.
♻️ Suggested tweak
source "${SCRIPT_DIR}/pulse-wrapper.sh" || { echo "[stats-wrapper] Failed to source pulse-wrapper.sh" >>"$STATS_LOGFILE" return 1 } +LOGFILE="$STATS_LOGFILE" run_daily_quality_sweep || true update_health_issues || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/stats-wrapper.sh around lines 87 - 99, The helpers in pulse-wrapper.sh are appending to their own LOGFILE (pulse.log); fix by overriding that LOGFILE so helpers write into stats.log: after sourcing pulse-wrapper.sh (source "${SCRIPT_DIR}/pulse-wrapper.sh" ...), set LOGFILE="$STATS_LOGFILE" and export it before calling run_daily_quality_sweep and update_health_issues so those functions use the wrapper's STATS_LOGFILE instead of the hardcoded pulse.log.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/contributor-activity-helper.sh:
- Around line 1035-1040: When rate limit exhaustion is detected in
contributor-activity-helper.sh (the branch that currently echoes "Rate limit
exhausted (${remaining} remaining), returning partial results" and then breaks),
modify the behavior to make partial results explicit: either inject a well-known
JSON field (e.g. "partial": true) or a clear marker line into stdout alongside
the existing output, and/or exit with a non-zero distinct status code (different
from other errors) so callers like pulse-wrapper.sh can detect truncated
payloads; update the branch where the echo and break occur to emit that marker
and set the distinct exit status before returning so callers can preserve and
react to the partial body.
In @.agents/scripts/pulse-wrapper.sh:
- Around line 3268-3273: Update the block comment in pulse-wrapper.sh that
explains why statistics (stats-wrapper.sh) run in a separate process: change the
wording to reflect that contributor-activity-helper.sh no longer uses a blocking
sleep loop but instead bails out with partial results on Search API limits; keep
the separation rationale but frame it as a past behavior or as an ongoing risk
to pulse dispatching due to GitHub Search API latency/budget (30 req/min) and
potential long delays, and mention the new behavior (exits with partial results)
so future readers understand the current failure mode and why separation is
still required.
In @.agents/scripts/stats-wrapper.sh:
- Around line 49-67: The current age calculation using start_time and date -d in
stats-wrapper.sh is GNU-specific and breaks on macOS; update the logic in the
block that computes start_epoch/elapsed (variables start_time, start_epoch, now,
elapsed, and constants STATS_TIMEOUT/STATS_PIDFILE) to use a portable check:
prefer ps -o etimes= to get seconds on Linux, fall back to parsing ps -o etime=
output into seconds via awk on macOS (or use date -j -f when needed), and only
then compute elapsed and compare to STATS_TIMEOUT; ensure the fallback paths
handle errors (set start_epoch to now or 0 appropriately) and keep the same
kill/remove sequence when elapsed > STATS_TIMEOUT.
In `@setup.sh`:
- Around line 993-1015: The stats-wrapper block currently only installs a
crontab entry for stats-wrapper.sh; on macOS (Darwin) you must instead install a
launchd plist like the other schedulers (e.g.,
process-guard/memory-pressure-monitor). Update the stats-wrapper logic that
references stats_script and uses _cron_escape to: detect Darwin via uname -s (or
reuse the existing macOS branch pattern), create/write a LaunchAgents plist
(e.g., com.aidevops.stats-wrapper.plist in ~/Library/LaunchAgents) that runs the
same command with PATH and log redirection, load it with launchctl when
enabling, and unload/remove the plist when disabling (mirror the pulse block
behavior for install/uninstall and use print_info messages). Ensure you still
fall back to crontab on non-Darwin systems and keep the existing grep marker
"aidevops: stats-wrapper" consistent for removal.
---
Nitpick comments:
In @.agents/scripts/stats-wrapper.sh:
- Around line 87-99: The helpers in pulse-wrapper.sh are appending to their own
LOGFILE (pulse.log); fix by overriding that LOGFILE so helpers write into
stats.log: after sourcing pulse-wrapper.sh (source
"${SCRIPT_DIR}/pulse-wrapper.sh" ...), set LOGFILE="$STATS_LOGFILE" and export
it before calling run_daily_quality_sweep and update_health_issues so those
functions use the wrapper's STATS_LOGFILE instead of the hardcoded pulse.log.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 03d24ae6-7194-4658-b422-58ca3fda6756
⛔ Files ignored due to path filters (1)
todo/tasks/t1429-brief.mdis excluded by!todo/tasks/*-brief.md
📒 Files selected for processing (4)
.agents/scripts/contributor-activity-helper.sh.agents/scripts/pulse-wrapper.sh.agents/scripts/stats-wrapper.shsetup.sh
- Replace date -d (GNU-only) with epoch stored in PID file for
portable process age detection on macOS/Linux
- Add partial flag to person-stats output: JSON gets {partial: true},
markdown gets a footnote when rate-limited
- Pass _ps_partial through python formatter so callers can detect
truncated data in cached results
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Tue Mar 10 20:26:48 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
…omments - Add Darwin/launchd plist for stats-wrapper.sh (matching pulse pattern) - Fall back to crontab on Linux - Update pulse-wrapper main() comment to reflect current helper behavior (bails out with partial results, not infinite blocking)
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Tue Mar 10 21:19:27 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
- Source shared-constants.sh and worker-lifecycle-common.sh for _kill_tree, _force_kill_tree, _validate_int - Use _kill_tree/_force_kill_tree for proper process tree cleanup - Add _validate_int for STATS_TIMEOUT env var - Use zsh-compatible source guard (_stats_is_sourced with ZSH_EVAL_CONTEXT fallback, matching GH#3931) - Use portable SCRIPT_DIR with return fallback for zsh
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Tue Mar 10 21:20:18 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.agents/scripts/stats-wrapper.sh (1)
85-109: Verify external timeout enforcement for long-running stats tasks.The
STATS_TIMEOUT(default 10 min) only affects the dedup stale-process check on subsequent invocations. If a single stats run takes longer than 15 minutes (the cron interval), the dedup check will kill it on the next invocation — but there's no internal watchdog within the current run to enforce the timeout mid-execution.For tasks like
run_daily_quality_sweepthat iterate multiple repos, a runaway repo could block indefinitely until the next cron fires. Consider wrapping the task invocations withtimeout_sec(from shared-constants.sh) for defense-in-depth.♻️ Optional: Add internal timeout wrapper
- run_daily_quality_sweep || true - update_health_issues || true + # timeout_sec provides defense-in-depth if a single task stalls + timeout_sec "$STATS_TIMEOUT" bash -c 'run_daily_quality_sweep' || true + timeout_sec "$STATS_TIMEOUT" bash -c 'update_health_issues' || trueNote: This requires exporting the functions or refactoring to script-based invocation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/scripts/stats-wrapper.sh around lines 85 - 109, The current main() invokes run_daily_quality_sweep and update_health_issues without enforcing STATS_TIMEOUT, so long-running repos can run past the cron interval; fix by sourcing shared-constants.sh (if not already) to get timeout_sec and then run each long task under timeout_sec "$STATS_TIMEOUT" -- e.g. replace direct calls to run_daily_quality_sweep and update_health_issues with invocations wrapped by timeout_sec "$STATS_TIMEOUT" <command>, ensure the wrapper captures non-zero/timeout exits to log to STATS_LOGFILE (including which task timed out) and return a non-zero status if desired; if these functions cannot be executed via timeout_sec directly, export/refactor them into script-invocable functions or add small wrapper scripts that call run_daily_quality_sweep/update_health_issues so timeout_sec can manage them.setup.sh (1)
1005-1060: Consider regenerating plist on each setup run to pick up config changes.The pulse scheduler block (lines 881-971) always regenerates its plist to incorporate updated environment variables and thresholds. The stats-wrapper block skips installation entirely if
_stats_installed=true, meaning configuration updates (e.g., PATH changes, new env vars) won't propagate until the user manually removes the existing scheduler.For consistency and to ensure config drift doesn't accumulate, consider always regenerating the plist/cron entry (unload → write → load pattern) when
_pulse_lower == "true", similar to the pulse block.♻️ Suggested approach
- if [[ "$_stats_installed" == "false" ]]; then - if [[ "$(uname -s)" == "Darwin" ]]; then + if [[ "$(uname -s)" == "Darwin" ]]; then + local stats_plist="$HOME/Library/LaunchAgents/${stats_label}.plist" + # Unload existing if upgrading + if _launchd_has_agent "$stats_label"; then + launchctl unload "$stats_plist" 2>/dev/null || true + fi + # Always regenerate plist to pick up config changes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@setup.sh` around lines 1005 - 1060, The stats-wrapper block currently skips any work when _stats_installed=="true", so updates to stats_label/stats_script/ENV (created via _xml_escape/_cron_escape) never propagate; change the logic to always regenerate the plist/cron entry when installing (same pattern as pulse): if on macOS, if launchd already has the agent unload the existing stats_plist (launchctl unload "$stats_plist"), rewrite the plist using _xml_escape for stats_script/HOME/PATH and then load it (launchctl load "$stats_plist"); otherwise for cron remove any existing "aidevops: stats-wrapper" entry from crontab and re-add the updated crontab line using _cron_escape for stats_script so PATH/HOME changes take effect. Ensure you reference stats_plist, stats_label, stats_script, _xml_escape, _cron_escape and the launchctl/crontab commands when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.agents/scripts/stats-wrapper.sh:
- Around line 85-109: The current main() invokes run_daily_quality_sweep and
update_health_issues without enforcing STATS_TIMEOUT, so long-running repos can
run past the cron interval; fix by sourcing shared-constants.sh (if not already)
to get timeout_sec and then run each long task under timeout_sec
"$STATS_TIMEOUT" -- e.g. replace direct calls to run_daily_quality_sweep and
update_health_issues with invocations wrapped by timeout_sec "$STATS_TIMEOUT"
<command>, ensure the wrapper captures non-zero/timeout exits to log to
STATS_LOGFILE (including which task timed out) and return a non-zero status if
desired; if these functions cannot be executed via timeout_sec directly,
export/refactor them into script-invocable functions or add small wrapper
scripts that call run_daily_quality_sweep/update_health_issues so timeout_sec
can manage them.
In `@setup.sh`:
- Around line 1005-1060: The stats-wrapper block currently skips any work when
_stats_installed=="true", so updates to stats_label/stats_script/ENV (created
via _xml_escape/_cron_escape) never propagate; change the logic to always
regenerate the plist/cron entry when installing (same pattern as pulse): if on
macOS, if launchd already has the agent unload the existing stats_plist
(launchctl unload "$stats_plist"), rewrite the plist using _xml_escape for
stats_script/HOME/PATH and then load it (launchctl load "$stats_plist");
otherwise for cron remove any existing "aidevops: stats-wrapper" entry from
crontab and re-add the updated crontab line using _cron_escape for stats_script
so PATH/HOME changes take effect. Ensure you reference stats_plist, stats_label,
stats_script, _xml_escape, _cron_escape and the launchctl/crontab commands when
making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f77587e1-d62d-484a-b435-70cdbe7d0e4e
📒 Files selected for processing (4)
.agents/scripts/contributor-activity-helper.sh.agents/scripts/pulse-wrapper.sh.agents/scripts/stats-wrapper.shsetup.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .agents/scripts/contributor-activity-helper.sh
When person_stats() hits GitHub Search API rate limits mid-run, it now: - Returns exit code 75 (EX_TEMPFAIL) instead of 0, so callers like pulse-wrapper.sh can distinguish complete from truncated results - Emits <!-- partial-results --> HTML comment in markdown output for machine-readable detection by callers that suppress stderr - JSON output already had "partial": true (unchanged) cross_repo_person_stats() now: - Handles EX_PARTIAL from sub-calls and propagates the flag - Correctly extracts .data array from person_stats JSON wrapper - Includes partial flag in both JSON and markdown output Addresses CodeRabbit review feedback from PR #4094.
…4108) When person_stats() hits GitHub Search API rate limits mid-run, it now: - Returns exit code 75 (EX_TEMPFAIL) instead of 0, so callers like pulse-wrapper.sh can distinguish complete from truncated results - Emits <!-- partial-results --> HTML comment in markdown output for machine-readable detection by callers that suppress stderr - JSON output already had "partial": true (unchanged) cross_repo_person_stats() now: - Handles EX_PARTIAL from sub-calls and propagates the flag - Correctly extracts .data array from person_stats JSON wrapper - Includes partial flag in both JSON and markdown output Addresses CodeRabbit review feedback from PR #4094.



Summary
pulse-wrapper.shran quality sweep and health issue updates (including person-stats) BEFORErun_pulse().contributor-activity-helper.shhas a blocking sleep loop that waits for GitHub Search API rate limits (30 req/min). With multiple contributors x repos, this loop ran for hours — the pulse never dispatched workers or merged PRs.stats-wrapper.sh, every 15 min) with its own PID dedup and timeout. The pulse only does cleanup, prefetch, and dispatch/merge.contributor-activity-helper.shnow bails out with partial results instead of sleeping indefinitely when rate-limited.Changes
.agents/scripts/pulse-wrapper.shrun_daily_quality_sweepandupdate_health_issuesfrommain().agents/scripts/stats-wrapper.sh.agents/scripts/contributor-activity-helper.shsetup.shRoot Cause
The
Rate limit low (0 remaining), waiting 56s...message repeated for hours in pulse-wrapper.log. The watchdog didn't catch it because the sleep loop writes to stderr -> log grows -> progress detection sees "activity". The pulse never reachedrun_pulse().Closes #4069
Summary by CodeRabbit
Refactor
New Features