t1449: reallocate pulse slots when product repos are capped#4186
t1449: reallocate pulse slots when product repos are capped#4186marcusquinn merged 1 commit intomainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the Pulse system's resource utilization and worker management. It addresses a key inefficiency where worker slots remained idle when product repositories hit their daily PR caps, despite available system or tooling work. By making reservation calculations dynamic based on actual dispatchability and implementing a more flexible allocation strategy, Pulse can now intelligently reallocate resources, ensuring workers remain productive and the overall backlog is processed more efficiently. The changes also refine worker termination policies to include a coaching phase, aiming to resolve issues before outright killing a worker. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Caution Review failedPull request was closed or merged during review WalkthroughThis PR enhances the pulse dispatch system with smarter worker allocation and improved lifecycle management. It adds coaching-intervention guidance before killing stuck workers, enforces fill-to-cap behavior for active slots, introduces dispatchable product repo tracking accounting for daily PR caps, and increases the idle timeout threshold to reduce false positives. Changes
Sequence Diagram(s)sequenceDiagram
participant Pulse as Pulse Wrapper
participant PrCap as Daily PR Cap Check
participant Alloc as Priority Allocator
participant Dispatch as Worker Dispatcher
participant Workers as Active Workers
Pulse->>PrCap: Calculate dispatchable_product_repos<br/>(filter by cap availability)
PrCap-->>Pulse: dispatchable_count
Pulse->>Alloc: compute_priority_allocations<br/>(with dispatchable_count)
alt dispatchable_product_repos > 0
Alloc->>Alloc: Allocate slots by product_min ratio
else dispatchable_product_repos == 0
Alloc->>Alloc: Shift all slots to tooling<br/>(product_min = 0)
end
Alloc-->>Pulse: allocation {product, tooling}
Pulse->>Dispatch: Fill to cap<br/>(dispatch until MAX_WORKERS or no candidates)
Dispatch->>Workers: Dispatch product/tooling<br/>per allocation
Workers-->>Dispatch: Checkpoint reached
Dispatch-->>Pulse: Updated active_workers
alt active_workers < MAX_WORKERS
Pulse-->>Dispatch: Continue fill-to-cap
else active_workers == MAX_WORKERS
Pulse-->>Pulse: Pulse cycle complete
end
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly Related PRs
Suggested Labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 12 00:12:53 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant improvement to the pulse worker allocation logic by making it aware of daily PR caps on product repositories. This allows for more efficient use of available worker slots by reallocating them to tooling/system work when product repos are blocked. The accompanying documentation changes in pulse.md are clear and accurately reflect the new policies. My main suggestion is to refactor the logic for counting daily PRs in pulse-wrapper.sh to be more efficient by using a single gh search prs API call instead of making a separate call for each repository in a loop, ensuring all relevant PRs are retrieved by using the --paginate flag.
| if [[ "$product_repos" -gt 0 && "$DAILY_PR_CAP" -gt 0 ]]; then | ||
| while IFS= read -r slug; do | ||
| [[ -n "$slug" ]] || continue | ||
| local pr_json daily_pr_count | ||
| pr_json=$(gh pr list --repo "$slug" --state open --json createdAt --limit 100 2>/dev/null) || pr_json="[]" | ||
| daily_pr_count=$(echo "$pr_json" | jq --arg today "$today_utc" '[.[] | select(.createdAt | startswith($today))] | length' 2>/dev/null) || daily_pr_count=0 | ||
| [[ "$daily_pr_count" =~ ^[0-9]+$ ]] || daily_pr_count=0 | ||
| if [[ "$daily_pr_count" -lt "$DAILY_PR_CAP" ]]; then | ||
| dispatchable_product_repos=$((dispatchable_product_repos + 1)) | ||
| fi | ||
| done < <(jq -r '.initialized_repos[] | select(.pulse == true and (.local_only // false) == false and .slug != "" and .priority == "product") | .slug' "$repos_json" 2>/dev/null) | ||
| else | ||
| dispatchable_product_repos="$product_repos" | ||
| fi |
There was a problem hiding this comment.
This loop makes a gh pr list API call for every product repository, which can be inefficient and slow if there are many repositories. You can significantly improve performance by fetching the PR counts for all relevant repositories in a single API call using gh search prs. Additionally, to ensure all relevant PRs are retrieved, the --paginate flag should be used as per repository guidelines for gh commands.
Here's a suggested refactoring that first collects all product repo slugs, then uses gh search prs with --paginate to get all open PRs created today for those repos, and finally processes the results locally to count dispatchable repos. This reduces N API calls to just one and ensures completeness.
if [[ "$product_repos" -gt 0 && "$DAILY_PR_CAP" -gt 0 ]]; then
local product_repo_slugs_str
product_repo_slugs_str=$(jq -r '.initialized_repos[] | select(.pulse == true and (.local_only // false) == false and .slug != "" and .priority == "product") | .slug' "$repos_json" 2>/dev/null)
if [[ -n "$product_repo_slugs_str" ]]; then
local search_args=()
while IFS= read -r slug; do
[[ -n "$slug" ]] && search_args+=(--repo "$slug")
done <<< "$product_repo_slugs_str"
# Get daily PR counts for all product repos in a single, more efficient API call
local pr_counts_json
pr_counts_json=$(gh search prs --created ">=today_utc" --state open "${search_args[@]}" --json repo --paginate | jq 'group_by(.repo.nameWithOwner) | map({(.[0].repo.nameWithOwner): length}) | add' 2>/dev/null) || pr_counts_json="{}"
# Count dispatchable repos by checking the fetched counts
while IFS= read -r slug; do
[[ -n "$slug" ]] || continue
local daily_pr_count
daily_pr_count=$(echo "$pr_counts_json" | jq -r --arg slug "$slug" '.[$slug] // 0')
[[ "$daily_pr_count" =~ ^[0-9]+$ ]] || daily_pr_count=0
if [[ "$daily_pr_count" -lt "$DAILY_PR_CAP" ]]; then
dispatchable_product_repos=$((dispatchable_product_repos + 1))
fi
done <<< "$product_repo_slugs_str"
fi
else
dispatchable_product_repos="$product_repos"
fiReferences
- When fetching a list of items from the GitHub API with the
ghcommand, use the--paginateflag to ensure all items are retrieved, not just the first page.



Summary
Why
Pulse was underutilizing concurrency when product repos hit PR caps, leaving slots idle despite large system/tooling backlog. This change removes that blocker and requires redistribution to keep workers productive.
Closes #4185
Summary by CodeRabbit
Documentation
Improvements