Skip to content

t1425, t1426, t1427: Fix pulse-wrapper zombie accumulation and blocking#4020

Merged
alex-solovyev merged 3 commits intomainfrom
bugfix/pulse-wrapper-stability
Mar 9, 2026
Merged

t1425, t1426, t1427: Fix pulse-wrapper zombie accumulation and blocking#4020
alex-solovyev merged 3 commits intomainfrom
bugfix/pulse-wrapper-stability

Conversation

@alex-solovyev
Copy link
Copy Markdown
Collaborator

@alex-solovyev alex-solovyev commented Mar 9, 2026

Summary

  • t1425: Write PID file immediately after check_dedup() to close race window where cron spawns 18+ parallel instances during setup phase
  • t1426: Skip person-stats / cross-repo-person-stats when GitHub Search API rate limit < 10, avoiding 56s+ blocking waits that stall the pulse cycle
  • t1427: Guard empty sqlite3 -json output (returns "" not "[]" on no rows) before piping to Python, preventing JSONDecodeError tracebacks

Closes #4015, closes #4016, closes #4009

Summary by CodeRabbit

  • New Features

    • Hourly caching of person statistics so activity dashboards use recent cached data and reduce redundant live queries.
  • Bug Fixes

    • Ensure empty SQL query results are treated as empty JSON arrays to avoid downstream JSON errors.
    • Early PID write to guard against parallel workflow runs.
  • Improvements

    • Health/activity outputs now reflect cached person-stats and include clearer behavior when rate-limited.

…g (t1425, t1426, t1427)

- Write PID file immediately after check_dedup() to close race window
  where cron spawns parallel instances during setup phase (t1425)
- Skip person-stats calls when GitHub Search API rate limit < 10,
  avoiding 56s+ blocking waits that stall the pulse cycle (t1426)
- Guard empty sqlite3 -json output before piping to Python,
  preventing JSONDecodeError tracebacks in pulse logs (t1427)

Closes #4015, closes #4016, closes #4009
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the robustness and efficiency of the pulse-wrapper system by addressing several critical issues. It prevents the accumulation of zombie processes by ensuring proper PID file handling, avoids blocking waits caused by GitHub Search API rate limit exhaustion, and enhances data processing stability by guarding against malformed JSON output from SQLite queries. These changes collectively contribute to a more reliable and performant contributor activity reporting mechanism.

Highlights

  • Zombie Process Accumulation: The PID file is now written immediately after check_dedup() in pulse-wrapper.sh to prevent cron from spawning multiple parallel instances during the setup phase, addressing a race condition.
  • GitHub Search API Rate Limit Handling: The person-stats and cross-repo-person-stats calls are now conditionally skipped in pulse-wrapper.sh if the GitHub Search API rate limit falls below 10, preventing long blocking waits that could stall the pulse cycle.
  • Robust JSON Parsing: A guard was added in contributor-activity-helper.sh to handle empty output from sqlite3 -json (which returns "" instead of "[]" for no rows), preventing JSONDecodeError tracebacks when piping to Python.
Changelog
  • .agents/scripts/contributor-activity-helper.sh
    • Added a conditional check to ensure query_result is a valid JSON array ([]) if sqlite3 -json returns an empty string or non-array, preventing JSONDecodeError.
  • .agents/scripts/pulse-wrapper.sh
    • Introduced a rate limit check for the GitHub Search API before calling person-stats, skipping the call if search_remaining is less than 10 to avoid blocking.
    • Implemented a similar rate limit check for cross-repo-person-stats, skipping the call and logging the reason if the GitHub Search API rate limit is low.
    • Moved the PID file writing operation to an earlier point in the main function, immediately after check_dedup(), to prevent multiple instances from running concurrently during initial setup.
Activity
  • No specific activity (comments, reviews, progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 9, 2026

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 0 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b86522fa-8100-4e0b-9ba1-b84e8e427e97

📥 Commits

Reviewing files that changed from the base of the PR and between fbbcd70 and c044f23.

📒 Files selected for processing (1)
  • .agents/scripts/pulse-wrapper.sh

Walkthrough

Adds a sqlite3 -json output guard in contributor-activity-helper.sh to coerce empty/non-array results to [], and implements hourly caching of person-stats plus early PID write and a new cache-refresh function in pulse-wrapper.sh, with _update_health_issue_for_repo signature updated to accept cross-repo person-stats.

Changes

Cohort / File(s) Summary
JSON Handling Safety
​.agents/scripts/contributor-activity-helper.sh
Add guard for sqlite3 -json output: if empty or not starting with [ coerce to [] to ensure downstream JSON parsing remains consistent.
Person-stats Caching & Orchestration
​.agents/scripts/pulse-wrapper.sh
Introduce hourly PERSON_STATS cache, new _refresh_person_stats_cache() function, cache directory/files, PERSON_STATS interval/last-run vars, and read-from-cache behavior when composing per-repo and cross-repo outputs. Update _update_health_issue_for_repo(...) signature to accept cross_repo_person_stats_md. Add early PID file write and minor logging.

Sequence Diagram(s)

sequenceDiagram
    participant Pulse as Pulse Wrapper
    participant CacheFS as Cache (filesystem)
    participant Helper as Contributor Activity Helper
    participant GH as GitHub Search API
    rect rgba(200,220,255,0.5)
    Pulse->>CacheFS: check PERSON_STATS_LAST_RUN
    alt cache stale
        Pulse->>Helper: _refresh_person_stats_cache()
        Helper->>GH: fetch per-person stats (search API)
        GH-->>Helper: return person stats
        Helper->>CacheFS: write person-stats cache files
    else cache fresh
        Pulse->>CacheFS: read person-stats cache files
    end
    Pulse->>Helper: run per-repo health flow with cached person-stats
    Helper-->>Pulse: per-repo & cross-repo markdown
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

bug

Poem

Guardians of JSON, caches kept neat,
Hourly records hum, rate limits retreat,
PID files penned early to stop the race,
Shell scripts march onward with steadier pace. 🚀

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title clearly references three linked issues (t1425, t1426, t1427) and summarizes the main objective: fixing pulse-wrapper zombie accumulation and blocking problems.
Linked Issues check ✅ Passed Changes address PID file early write for race condition (#4015), hourly person-stats caching to reduce API calls (#4016), and JSON guard for empty sqlite3 output (#4009).
Out of Scope Changes check ✅ Passed All changes directly support the three linked issues: PID file timing, caching mechanism, and sqlite3 JSON output handling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/pulse-wrapper-stability

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 393 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Mon Mar 9 21:44:25 UTC 2026: Code review monitoring started
Mon Mar 9 21:44:29 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 393

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 393
  • VULNERABILITIES: 0

Generated on: Mon Mar 9 21:44:32 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 1957-1964: The current gate uses a fixed ">=10" check which
doesn't account for the dynamic cost of person_stats() (it consumes ~4 Search
API calls per login) so the blocking helper can still sleep on repos with 3+
contributors; update the logic to either (A) compute an estimated required
budget before invoking the helper by querying the expected number of logins and
comparing search_remaining to (estimated_logins * 4) or (B) move the skip
decision into the helper functions person_stats() / cross_repo_person_stats() so
they can count actual logins and decide to skip or sleep themselves; modify the
call site that sets person_stats_md and the activity_helper invocation
accordingly so the skip decision uses the estimated_cost or the helper-internal
check rather than the hardcoded ">=10" threshold.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7b23eb7c-0ec1-44be-b595-7ced11cc3869

📥 Commits

Reviewing files that changed from the base of the PR and between 8e2baf6 and 726bd44.

📒 Files selected for processing (2)
  • .agents/scripts/contributor-activity-helper.sh
  • .agents/scripts/pulse-wrapper.sh

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several fixes to improve the robustness of the pulse-wrapper and contributor-activity-helper scripts. The changes address a race condition by writing a PID file earlier, prevent blocking waits by checking the GitHub API rate limit before expensive operations, and handle empty JSON output from sqlite3 to avoid errors.

My review focuses on improving code maintainability and aligning with established repository rules. I've suggested simplifying a conditional check in contributor-activity-helper.sh to ensure valid JSON output and refactoring duplicated code in pulse-wrapper.sh into a helper function to improve readability and reduce redundancy. Overall, the changes are solid and address important stability issues.

Note: Security Review did not run due to the size of the PR.

Comment on lines +636 to +638
if [[ -z "$query_result" || "${query_result:0:1}" != "[" ]]; then
query_result="[]"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The condition to check if query_result is a valid JSON array can be simplified. The current check [[ -z "$query_result" || "${query_result:0:1}" != "[" ]] is a bit verbose. A simpler and more robust way to check if the string starts with [ is to use a glob pattern match [[ "$query_result" != "["* ]]. This single check correctly handles both an empty string and a string that doesn't start with [, making the code more concise. This also aligns with the practice that shell functions expected to return JSON should output a valid empty JSON array ([]) on no results, ensuring consumers like jq do not break.

Suggested change
if [[ -z "$query_result" || "${query_result:0:1}" != "[" ]]; then
query_result="[]"
fi
if [[ "$query_result" != "["* ]]; then
query_result="[]"
fi
References
  1. When a shell function is expected to return JSON, it should output a valid empty JSON array ([]) or object ({}) on no results, rather than returning a non-zero exit code or empty string, to ensure consumers of the output (like pipes to jq) do not break.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in follow-up PR. Simplified to [[ "$query_result" != "["* ]] as suggested -- handles both empty string and non-array in a single glob match.

Comment on lines +1958 to +1964
local search_remaining
search_remaining=$(gh api rate_limit --jq '.resources.search.remaining' 2>/dev/null) || search_remaining=0
if [[ "$search_remaining" -ge 10 ]]; then
person_stats_md=$(bash "$activity_helper" person-stats "$repo_path" --period month --format markdown || echo "_Person stats unavailable._")
else
person_stats_md="_Person stats skipped (GitHub Search API rate limit: ${search_remaining} remaining)._"
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for checking the GitHub search API rate limit is duplicated in the update_health_issues function (lines 2190-2197). To avoid code duplication and centralize the API call, consider creating a helper function. This aligns with the practice of extracting repeated logic into an internal helper function to improve maintainability.

For example, you could define a function like this:
get_search_rate_limit_remaining() {
local remaining
remaining=$(gh api rate_limit --jq '.resources.search.remaining' 2>/dev/null) || remaining=0
echo "$remaining"
}

Then, you can use it in both places like this:
local search_remaining
search_remaining=$(get_search_rate_limit_remaining)
if [[ "$search_remaining" -ge 10 ]]; then
# ...
else
# ...
fi
This would make the code more maintainable by adhering to the DRY (Don't Repeat Yourself) principle.

References
  1. In shell scripts, extract repeated logic into an internal helper function to improve maintainability. This applies even for standalone scripts where external source dependencies are avoided.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not applying. The code was refactored in c044f23 -- the original duplicate at the call site no longer exists (replaced with cache reads at line 1964-1970). The remaining rate limit checks in _refresh_person_stats_cache() are in different contexts (pre-flight, per-repo bail-out, cross-repo gate) with different comparison values. Extracting a helper for a one-liner gh api rate_limit --jq ... adds indirection without meaningful DRY benefit.

Add _refresh_person_stats_cache() with 1-hour timestamp guard.
Per-repo and cross-repo person-stats written to cache files,
read by health issue updates each pulse. Replaces rate limit
guards — 44 Search API calls/hr instead of ~1320/hr.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 393 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Mon Mar 9 22:13:39 UTC 2026: Code review monitoring started
Mon Mar 9 22:13:39 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 393

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 393
  • VULNERABILITIES: 0

Generated on: Mon Mar 9 22:13:42 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@marcusquinn
Copy link
Copy Markdown
Owner

Dispatching worker to address coderabbit CHANGES_REQUESTED.

  • Model: sonnet (anthropic/claude-sonnet-4-6)
  • Branch: bugfix/pulse-wrapper-stability
  • Scope: Fix 1 unresolved coderabbit suggestion in pulse-wrapper.sh — dynamic cost accounting for person_stats() in the gate check
  • Attempt: 1 of 1
  • Direction: Replace fixed >=10 check with dynamic cost calculation accounting for person_stats() consuming ~4 Search API calls

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.agents/scripts/pulse-wrapper.sh (1)

1963-1971: Minor: slug_safe is already defined earlier in this function.

The variable slug_safe is computed identically at line 1663 for the cache file path. This redefinition is harmless but redundant.

♻️ Remove redundant variable declaration
 	fi
 	# t1426: person-stats from hourly cache (see _refresh_person_stats_cache)
-	local slug_safe="${repo_slug//\//-}"
 	local ps_cache="${PERSON_STATS_CACHE_DIR}/person-stats-cache-${slug_safe}.md"
 	if [[ -f "$ps_cache" ]]; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.agents/scripts/pulse-wrapper.sh around lines 1963 - 1971, The slug_safe
variable is redundantly redefined before computing ps_cache; remove the repeated
declaration and reuse the existing slug_safe already set earlier in the function
so you only build ps_cache via PERSON_STATS_CACHE_DIR and slug_safe once; update
the block that computes ps_cache and person_stats_md (referencing ps_cache,
PERSON_STATS_CACHE_DIR, and person_stats_md) to rely on the previously defined
slug_safe and eliminate the duplicate assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.agents/scripts/pulse-wrapper.sh:
- Around line 1963-1971: The slug_safe variable is redundantly redefined before
computing ps_cache; remove the repeated declaration and reuse the existing
slug_safe already set earlier in the function so you only build ps_cache via
PERSON_STATS_CACHE_DIR and slug_safe once; update the block that computes
ps_cache and person_stats_md (referencing ps_cache, PERSON_STATS_CACHE_DIR, and
person_stats_md) to rely on the previously defined slug_safe and eliminate the
duplicate assignment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26008f55-fe9e-4658-8a87-2fa22aaefdb4

📥 Commits

Reviewing files that changed from the base of the PR and between 726bd44 and fbbcd70.

📒 Files selected for processing (1)
  • .agents/scripts/pulse-wrapper.sh

The fixed >=10 threshold didn't account for the dynamic cost of
person_stats() (~4 Search API calls per contributor per repo). With 3+
contributors, the helper's internal sleep branch could still stall the
pulse.

Replace the hardcoded threshold with cost-aware gating:
- Estimate minimum budget (repos × 4 queries) before starting
- Re-check remaining budget before each repo, bail early if exhausted
- Gate cross-repo stats on remaining budget too

Addresses CodeRabbit review suggestion on PR #4020.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 393 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Mon Mar 9 22:18:34 UTC 2026: Code review monitoring started
Mon Mar 9 22:18:34 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 393

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 393
  • VULNERABILITIES: 0

Generated on: Mon Mar 9 22:18:37 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Mar 9, 2026

@marcusquinn
Copy link
Copy Markdown
Owner

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 9, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@marcusquinn
Copy link
Copy Markdown
Owner

Dispatching worker to address 1 unresolved CodeRabbit suggestion.

@marcusquinn
Copy link
Copy Markdown
Owner

Follow-up PR addressing remaining review suggestions:

Applied:

  • Gemini (contributor-activity-helper.sh:636): Simplified JSON check to [[ "$query_result" != "["* ]] -- single glob match handles both empty string and non-array cases.
  • CodeRabbit (pulse-wrapper.sh:1965): Removed redundant slug_safe redefinition -- reuses the one already defined at line 1663 in the same function.

Dismissed:

  • Gemini (pulse-wrapper.sh): DRY helper for rate limit check -- the original duplication was already eliminated in c044f23. The remaining rate limit checks in _refresh_person_stats_cache() are in different contexts with different comparison values; extracting a helper for a one-liner adds indirection without meaningful benefit.

marcusquinn added a commit that referenced this pull request Mar 9, 2026
- Simplify JSON array check to single glob match (Gemini suggestion)
- Remove redundant slug_safe redefinition (CodeRabbit nitpick)
marcusquinn added a commit that referenced this pull request Mar 9, 2026
- Simplify JSON array check to single glob match (Gemini suggestion)
- Remove redundant slug_safe redefinition (CodeRabbit nitpick)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants