feat: Add dynamic token load balancer for API rate limit management#1008
feat: Add dynamic token load balancer for API rate limit management#1008
Conversation
Implement intelligent token rotation across multiple PATs and GitHub Apps to prevent API rate limit exhaustion during keepalive operations. Key features: - Token registry with capacity tracking for all available tokens - Dynamic selection based on remaining capacity and task requirements - Token specialization support (exclusive/primary assignments) - Proactive rotation before limits are hit - Graceful degradation when all tokens low Token specializations: - KEEPALIVE_APP: Exclusive for keepalive-loop (isolated pool) - OWNER_PR_PAT: Exclusive for PR creation as owner - SERVICE_BOT_PAT: Primary for bot comments/labels - ACTIONS_BOT_PAT: Primary for workflow dispatch - GH_APP: Primary for comment handling Integration: - checkRateLimitStatus() added to keepalive_loop.js - Early rate limit check with automatic deferral - Rate limit outputs added to workflow Diagnostics: - Health-75 updated to check all 6 token types - Aggregate totals across all token pools
❌ Sync Manifest Validation FailedThis PR modifies files that should be synced to consumer repos, but the sync manifest is incomplete. Required action: Update Why this mattersFiles not declared in the manifest won't be synced to consumer repos (Template, Manager-Database, trip-planner, Travel-Plan-Permission), causing features to silently not work in those repos. How to fix
See the workflow logs for specific files that need to be added. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1bbaea73d5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Determine if we should defer | ||
| result.shouldDefer = tokenLoadBalancer.shouldDefer(minRequired); | ||
| result.canProceed = !result.shouldDefer && result.totalRemaining >= minRequired; |
There was a problem hiding this comment.
Initialize token registry before deferring on limits
Because checkRateLimitStatus unconditionally calls tokenLoadBalancer.shouldDefer(minRequired) when the module is present, the keepalive loop will defer even if the primary token has quota whenever the registry is empty (the new module is required but initializeTokenRegistry is never called in this script). shouldDefer returns true when tokenRegistry.tokens is empty (token_load_balancer.js lines 631–637), so this path will always hit the early action: 'defer' return on every run unless forceRetry is set. This effectively stalls the loop in the default configuration.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR implements a comprehensive token load balancer system to prevent API rate limit exhaustion during keepalive operations by intelligently rotating across multiple Personal Access Tokens (PATs) and GitHub Apps.
Changes:
- Introduces new
token_load_balancer.jsmodule for dynamic token selection based on rate limit capacity - Integrates rate limit checking into the keepalive loop with early deferral when all tokens are exhausted
- Enhances diagnostic workflows to monitor all 6 token types (GITHUB_TOKEN, CODESPACES_WORKFLOWS, SERVICE_BOT_PAT, WORKFLOWS_APP, KEEPALIVE_APP, GH_APP)
- Improves task completion analysis to support numbered checklists and issue-only tasks
- Refactors Source section extraction to properly handle nested headings and code blocks
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| templates/consumer-repo/.github/scripts/token_load_balancer.js | New module implementing token registry, rate limit tracking, and optimal token selection with task specialization |
| .github/scripts/token_load_balancer.js | Duplicate of template version for repository use |
| templates/consumer-repo/.github/scripts/keepalive_loop.js | Adds rate limit status checking, enhanced Source section parsing, numbered checklist support, and issue-only task matching |
| .github/scripts/keepalive_loop.js | Duplicate of template version with same enhancements |
| templates/consumer-repo/.github/workflows/agents-keepalive-loop.yml | Adds rate_limit_remaining and rate_limit_recommendation outputs, improves YAML formatting |
| .github/workflows/agents-keepalive-loop.yml | Duplicate of template version with same outputs |
| .github/workflows/health-75-api-rate-diagnostic.yml | Expands token monitoring from 3 to 6 tokens, calculates total remaining capacity |
| .github/workflows/health-72-template-sync.yml | Uses HEAD_REF variable for consistency |
| .github/scripts/tests/keepalive-loop.test.js | Adds comprehensive tests for new features (Source section parsing, numbered checklists, issue-only tasks) |
| agents/codex-1001.md | Bootstrap file for codex on issue #1001 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const percentUsed = core_limit.limit > 0 | ||
| ? ((core_limit.used / core_limit.limit) * 100).toFixed(1) | ||
| : 0; | ||
|
|
||
| return { | ||
| limit: core_limit.limit, | ||
| remaining: core_limit.remaining, | ||
| used: core_limit.used, | ||
| reset: core_limit.reset * 1000, | ||
| checked: Date.now(), | ||
| percentUsed: parseFloat(percentUsed), | ||
| percentRemaining: 100 - parseFloat(percentUsed), | ||
| }; |
There was a problem hiding this comment.
The percentUsed calculation uses toFixed which returns a string, but this string is then parsed back to a number. The percentRemaining calculation on line 308 will subtract a string from 100, which in JavaScript will work due to type coercion but could lead to unexpected behavior. Consider using the numeric value directly instead of converting to string and back.
| tokenInfo.rateLimit.percentUsed = tokenInfo.rateLimit.limit > 0 | ||
| ? ((tokenInfo.rateLimit.used / tokenInfo.rateLimit.limit) * 100).toFixed(1) | ||
| : 0; | ||
| tokenInfo.rateLimit.percentRemaining = 100 - tokenInfo.rateLimit.percentUsed; |
There was a problem hiding this comment.
The percentUsed calculation uses toFixed which returns a string, but this string is then used in arithmetic operations. The percentRemaining assignment will perform string-to-number coercion. This is similar to the issue in checkTokenRateLimit. Consider calculating both as numbers directly to avoid type inconsistencies.
| async function initializeTokenRegistry({ secrets, github, core, githubToken }) { | ||
| tokenRegistry.tokens.clear(); | ||
|
|
||
| // Register GITHUB_TOKEN (always available) | ||
| if (githubToken) { | ||
| registerToken({ | ||
| id: 'GITHUB_TOKEN', | ||
| token: githubToken, | ||
| type: 'GITHUB_TOKEN', | ||
| source: 'github.token', | ||
| capabilities: TOKEN_CAPABILITIES.GITHUB_TOKEN, | ||
| priority: 0, // Lowest priority (most restricted) | ||
| }); | ||
| } | ||
|
|
||
| // Register PATs (check for PAT1, PAT2, etc. pattern as well as named PATs) | ||
| const patSources = [ | ||
| { id: 'SERVICE_BOT_PAT', env: secrets.SERVICE_BOT_PAT, account: 'stranske-automation-bot' }, | ||
| { id: 'ACTIONS_BOT_PAT', env: secrets.ACTIONS_BOT_PAT, account: 'stranske-automation-bot' }, | ||
| { id: 'CODESPACES_WORKFLOWS', env: secrets.CODESPACES_WORKFLOWS, account: 'stranske' }, | ||
| { id: 'OWNER_PR_PAT', env: secrets.OWNER_PR_PAT, account: 'stranske' }, | ||
| { id: 'AGENTS_AUTOMATION_PAT', env: secrets.AGENTS_AUTOMATION_PAT, account: 'unknown' }, | ||
| // Numbered PATs for future expansion | ||
| { id: 'PAT_1', env: secrets.PAT_1, account: 'pool' }, | ||
| { id: 'PAT_2', env: secrets.PAT_2, account: 'pool' }, | ||
| { id: 'PAT_3', env: secrets.PAT_3, account: 'pool' }, | ||
| ]; | ||
|
|
||
| for (const pat of patSources) { | ||
| if (pat.env) { | ||
| registerToken({ | ||
| id: pat.id, | ||
| token: pat.env, | ||
| type: 'PAT', | ||
| source: pat.id, | ||
| account: pat.account, | ||
| capabilities: TOKEN_CAPABILITIES.PAT, | ||
| priority: 5, // Medium priority | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
The initializeTokenRegistry function does not validate that the secrets object exists before accessing its properties. If secrets is undefined or null, this will throw an error. Consider adding a guard check at the beginning of the function.
| .filter(line => /^\s*[-*+]\s*\[\s*\]/.test(line)) | ||
| .map(line => { | ||
| const match = line.match(/^\s*[-*+]\s*\[\s*\]\s*(.+)$/); | ||
| const match = line.match(/^\s*(?:[-*+]|\d+[.)])\s*\[\s*\]\s*(.+)$/); |
There was a problem hiding this comment.
The regex pattern used to match numbered checklist items includes both '1.' and '1)' formats. However, the pattern requires whitespace after the number/marker with '\s*', which means '1.[ ]' (without space) won't match. Consider if this is the intended behavior or if the pattern should be more flexible.
| const strippedIssueTask = task | ||
| .replace(/\[[^\]]*\]\(([^)]+)\)/g, '$1') | ||
| .replace(/https?:\/\/\S+/gi, '') | ||
| .replace(/[#\d]/g, '') |
There was a problem hiding this comment.
The strippedIssueTask calculation removes all digits with .replace(/[#\d]/g, ''), which could inadvertently remove legitimate digits that are part of the task description, not just the issue number. This might cause false positives for isIssueOnlyTask. Consider a more targeted approach that only removes the specific issue reference pattern.
| const strippedIssueTask = task | |
| .replace(/\[[^\]]*\]\(([^)]+)\)/g, '$1') | |
| .replace(/https?:\/\/\S+/gi, '') | |
| .replace(/[#\d]/g, '') | |
| let strippedIssueTask = task | |
| .replace(/\[[^\]]*\]\(([^)]+)\)/g, '$1') | |
| .replace(/https?:\/\/\S+/gi, ''); | |
| if (issuePattern) { | |
| strippedIssueTask = strippedIssueTask.replace(issuePattern, ''); | |
| } | |
| strippedIssueTask = strippedIssueTask |
| let token = best.tokenInfo.token; | ||
| if (best.tokenInfo.type === 'APP' && !token) { | ||
| token = await mintAppToken({ tokenInfo: best.tokenInfo, core }); | ||
| best.tokenInfo.token = token; | ||
| } | ||
|
|
||
| core?.info?.(`Selected token: ${best.id} (${best.remaining} remaining, ${best.percentRemaining.toFixed(1)}% capacity)${best.isPrimary ? ' [primary]' : ''}`); | ||
|
|
||
| return { | ||
| token, | ||
| source: best.id, | ||
| type: best.tokenInfo.type, | ||
| remaining: best.remaining, | ||
| percentRemaining: best.percentRemaining, | ||
| percentUsed: best.tokenInfo.rateLimit?.percentUsed ?? 0, | ||
| isPrimary: best.isPrimary, | ||
| task, | ||
| }; |
There was a problem hiding this comment.
After minting an app token, if the token is null (minting failed), the function continues and returns a response with token: null. This could cause issues for callers expecting a valid token. Consider adding a check after token minting and returning null from getOptimalToken if the token couldn't be minted, or trying the next candidate.
| if (tokenInfo.type === 'APP' && !token) { | ||
| token = await mintAppToken({ tokenInfo, core }); | ||
| tokenInfo.token = token; | ||
| } | ||
| if (token) { | ||
| return { | ||
| token, | ||
| source: id, | ||
| type: tokenInfo.type, | ||
| remaining: tokenInfo.rateLimit?.remaining ?? 0, | ||
| percentRemaining: tokenInfo.rateLimit?.percentRemaining ?? 0, | ||
| percentUsed: tokenInfo.rateLimit?.percentUsed ?? 0, | ||
| exclusive: true, | ||
| task, | ||
| }; | ||
| } | ||
| } |
There was a problem hiding this comment.
Similarly to the main token selection path, after minting an app token for an exclusive task (line 384-385), if the token is null (minting failed), the code checks 'if (token)' but this means no token is returned for that exclusive task. The function will then fall through to general token selection, which may violate the exclusivity contract. Consider handling the failed minting case more explicitly.
| tokenInfo.rateLimit = { | ||
| limit, | ||
| remaining, | ||
| used: used || (limit - remaining), | ||
| reset: reset ? reset * 1000 : tokenInfo.rateLimit.reset, | ||
| checked: Date.now(), | ||
| percentUsed: ((limit - remaining) / limit * 100).toFixed(1), | ||
| percentRemaining: (remaining / limit * 100).toFixed(1), | ||
| }; |
There was a problem hiding this comment.
The updateFromHeaders function has inconsistent type handling. Lines 522-523 calculate percentUsed and percentRemaining using toFixed which returns strings, while the rest of the rateLimit object uses numbers. This type inconsistency could cause issues in comparisons and calculations elsewhere in the code.
|
|
||
| if (pattern.test(updatedBody)) { | ||
| updatedBody = updatedBody.replace(pattern, '$1[x]$2'); | ||
| updatedBody = updatedBody.replace(pattern, '$1$2[x]$3'); |
There was a problem hiding this comment.
The regex pattern on line 3000 attempts to match numbered checklist items for replacement. However, the capture groups have changed from the previous version. The pattern now captures '(^|\n)(\s*(?:[-+]|\d+[.)])\s)\\s*\', but the replacement '$1$2[x]$3' may not preserve the original formatting correctly. Specifically, if a match occurs at the start of a line (not after \n), the \n won't be in $1, which could affect the output formatting.
| updatedBody = updatedBody.replace(pattern, '$1$2[x]$3'); | |
| updatedBody = updatedBody.replace( | |
| pattern, | |
| (fullMatch, lineStart, prefix, taskText) => `${lineStart}${prefix}[x]${taskText}`, | |
| ); |
| const prTitle = pr?.title; | ||
| const prRef = pr?.head?.ref; | ||
| const prMatch = issueMatchesText(issuePattern, prTitle) || issueMatchesText(issuePattern, prRef); |
There was a problem hiding this comment.
In the analyzeTaskCompletion function, the pr parameter is added but there's no null/undefined check before accessing pr.title and pr.head.ref on lines 2844-2845. If pr is not provided (which is optional based on the function signature), this will throw an error.
…anifest - Add SERVICE_BOT_PAT to DISPATCH_TOKEN_KEYS for fallback support - Add token_load_balancer.js to sync manifest - Fixes failing keepalive-runner.test.js test - Fixes sync manifest validation
- Initialize token registry before shouldDefer check to prevent always-defer bug (P1) - Fix percentUsed/percentRemaining type inconsistency (use numbers instead of strings) - Add secrets validation in initializeTokenRegistry - Handle failed app token minting for exclusive tasks (don't fall through) - Handle failed app token minting in general selection (try next candidate) - Add pr parameter null check before accessing properties - Use targeted issue number removal (only #number patterns, not all digits) Addresses 8 code review comments from Copilot and Codex bots.
Adds isInitialized() helper to check if token registry contains tokens before attempting to use shouldDefer(). Required by keepalive_loop.js P1 fix.
Add .flake8 config with max-line-length=100 to match black/ruff/isort configuration across all repos.
Fixed line length violations in workflows added/modified in main: - agents-80-pr-event-hub.yml: Split long if conditions (2 long JS lines in script block are unavoidable) - agents-pr-meta.yml: Multiline if conditions - agents-81-gate-followups.yml: Split long env var expressions - agents-verify-to-issue-v2.yml: Multiline if condition
Automated Status SummaryHead SHA: d6a6935
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeNo scope information available Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #1008 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
|
Status | ✅ autofix updates applied |
|
Autofix updated these files:
|
- Removed leftover <<<<<<< HEAD marker at line 2823 - Fixed duplicate strippedIssueTask declaration - Fixed missing semicolon and improper if statement placement - Removed duplicate isIssueOnlyTask block All JavaScript syntax now valid.
2e1be17 to
472fed1
Compare
When resolving merge conflicts, accidentally removed the variable declarations for 'confidence' and 'reason' that are used throughout analyzeTaskCompletion. This caused 'ReferenceError: confidence/reason is not defined' in tests.
ab8225c to
fb804c8
Compare
This commit adds: 1. New unified setup-api-client action (.github/actions/setup-api-client/action.yml) - Combines npm install + token export into one reusable action - Pins @octokit/* versions for consistency (20.0.2, 6.0.1, 9.1.5, 6.0.3) - Supports both JSON secrets and individual inputs - Reports token count for debugging 2. Comprehensive remediation plan (docs/fixes/RATE_LIMIT_REMEDIATION_PLAN.md) - Detailed PR history from #1008 to #1182 - Root cause analysis - Implementation phases - Testing strategy - Handoff protocol Next steps: Apply the action to high-frequency workflows
* feat: add unified setup-api-client action and remediation plan This commit adds: 1. New unified setup-api-client action (.github/actions/setup-api-client/action.yml) - Combines npm install + token export into one reusable action - Pins @octokit/* versions for consistency (20.0.2, 6.0.1, 9.1.5, 6.0.3) - Supports both JSON secrets and individual inputs - Reports token count for debugging 2. Comprehensive remediation plan (docs/fixes/RATE_LIMIT_REMEDIATION_PLAN.md) - Detailed PR history from #1008 to #1182 - Root cause analysis - Implementation phases - Testing strategy - Handoff protocol Next steps: Apply the action to high-frequency workflows * refactor: apply setup-api-client action to keepalive workflow This commit updates agents-keepalive-loop.yml to use the new unified setup-api-client action instead of separate npm install + export steps. Changes: - Replace 4 instances of 'npm install' + 'export-load-balancer-tokens' with single 'setup-api-client' action - Remove duplicate export block in evaluate job - Add setup-api-client to summary job Benefits: - Single point of dependency management (pinned versions) - Consistent token export across all jobs - Reduced workflow file from 965 to 887 lines - Easier maintenance - one action to update, not multiple blocks Jobs updated: - evaluate: lines 78-82 - mark-running: lines 332-336 - run-codex: lines 427-431 - summary: lines 639-643 * chore: add setup-api-client action to sync manifest This ensures the new unified setup action will be synced to consumer repos. Also marks export-load-balancer-tokens as deprecated. * fix: add explicit 'Agent Stopped: API capacity depleted' status When rate limits are exhausted, the summary comment now shows: ### 🛑 Agent Stopped: API capacity depleted This replaces the misleading '🔄 Agent Running' status that previously appeared when the agent was actually blocked by rate limits. The new status clearly indicates: - All token pools are exhausted - This is NOT a code/prompt problem - Automatic recovery when limits reset (~1 hour) Detection logic: - reason === 'rate-limit-exhausted' - action === 'defer' with rate-related reason * chore: sync template scripts * chore(codex-autofix): apply updates (PR #1183) * fix: update API wrapper guard to accept setup-api-client action The check_api_wrapper_guard.py script now accepts either: - export-load-balancer-tokens (old pattern) - setup-api-client (new unified action) This allows workflows to use the new setup-api-client action without triggering lint errors. * chore(autofix): formatting/lint * chore(autofix): formatting/lint * fix: skip node_modules in API guard scan The _collect_all_files fallback was scanning node_modules directories when the git diff failed, causing false positives from third-party library code (e.g., @octokit type definitions containing api.github.com). This fix explicitly skips any path containing 'node_modules' in its parts, which is the correct behavior since we only want to lint project code, not dependencies. * fix: recognize ensureRateLimitWrapped as valid wrapper pattern The github-rate-limited-wrapper.js wrapper internally uses createTokenAwareRetry from github-api-with-retry.js to provide rate limit protection. This change recognizes both wrappers as valid, avoiding false positives for files like keepalive_loop.js that use the higher-level wrapper. * fix: exclude node_modules from _is_target_file check The previous commit added node_modules exclusion only to _collect_all_files(). When running with --base-ref (diff mode), the files come from git diff which can include node_modules paths if lockfiles were updated. Adding the node_modules check to _is_target_file ensures it applies to both code paths. * fix: address code review feedback from Copilot Changes to setup-api-client action: - Fix output description: 'tokens exported to environment' (not 'load balancer') - Create .github/scripts dir if it doesn't exist (avoid clutter at workspace root) - Capture npm stderr for debugging instead of suppressing it - Add jq availability check with warning and fallback to individual inputs - Fix double-counting of GITHUB_TOKEN/GH_TOKEN (same value, count once) - Add clarifying comment about empty values not being counted Documentation updates: - Mark Task A.1 and A.2 as DONE - Update example to match actual implementation - Note that package.json is not needed (inline version pinning) - Update 'to be created' to 'created in this PR' * fix: sync setup-api-client and updated keepalive to templates Critical fix: The sync system copies from templates/consumer-repo/, not from .github/workflows/. Without this commit: - setup-api-client action wouldn't sync to consumer repos - agents-keepalive-loop.yml updates wouldn't sync to consumer repos Consumer repos would continue failing with 'Token registry initialized with 0 tokens' Remaining work (documented in RATE_LIMIT_REMEDIATION_PLAN.md): - Update other workflows to use setup-api-client (belt dispatcher, worker, conveyor, etc.) - Currently 9 other workflows still use export-load-balancer-tokens * docs: add remaining work section to remediation plan Documents: - 9 workflows still using old export-load-balancer-tokens pattern - Priority order for updates - Template sync architecture (templates/consumer-repo/ is the source) - Update pattern to follow - Verification steps after merge * refactor: migrate ALL workflows to setup-api-client with simplified params This comprehensive update migrates all 70 workflow files from the legacy export-load-balancer-tokens action to the new setup-api-client action. Changes: - Replace all sparse-checkout paths from export-load-balancer-tokens to setup-api-client - Replace all action usages to use simplified parameter interface: - secrets: ${{ toJSON(secrets) }} - github_token: ${{ github.token }} - Remove individual secret parameters (service_bot_pat, token_rotation_json, etc) - Update consumer-repo templates to match The new setup-api-client action: - Accepts secrets via toJSON(secrets) for automatic discovery - Properly counts tokens without double-counting GH_TOKEN - Includes jq availability check - Creates .github/scripts directory if needed - Preserves backward compatibility via optional individual secret inputs Workflow files updated: 60+ workflows Template files updated: 10+ consumer templates Action synced: templates/consumer-repo/.github/actions/setup-api-client/ * fix: escape template expressions in action description GitHub Actions doesn't allow template expressions in the description field. Changed from ${{ }} to '{{ }}' for the usage examples. * fix: sync agents-auto-pilot.yml template from main workflow The consumer template was significantly out of date and still had direct gh api calls. Synced from the main workflow which uses the rate-limit wrapped helpers. * chore(codex-autofix): apply updates (PR #1183) * chore: sync template scripts * docs: add Rate Limiting Architecture section to CLAUDE.md Documents the relationship between: - setup-api-client action (exports tokens to GITHUB_ENV) - github-api-with-retry.js (reads env vars, creates token-aware wrapper) - token_load_balancer.js (token registry and selection) This ensures future changes to rate limiting understand the full component chain. * ci: add template drift check workflow Fails CI when templates in templates/consumer-repo/ drift more than 50 lines from their main workflow counterparts in .github/workflows/. This prevents the situation where consumers receive outdated versions because templates weren't updated when main workflows changed. Triggered on: - Push/PR touching agents-*.yml workflows - Push/PR touching consumer templates * docs: add copilot-instructions.md with mandatory read-first rule Explicitly instructs Copilot to read CLAUDE.md before any work. Also documents the template sync requirement. * fix: rename template drift check to follow naming convention - Renamed ci-template-drift.yml → health-74-template-drift.yml - Added to EXPECTED_NAMES in test_workflow_naming.py - Changed to warn-only mode (pre-existing drift shouldn't block PRs) * docs: add health-74-template-drift to workflow inventory Required for test_inventory_docs_list_all_workflows test --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary
Implement intelligent token rotation across multiple PATs and GitHub Apps to prevent API rate limit exhaustion during keepalive operations.
Problem
The keepalive loop was failing due to API rate limit exhaustion, even though multiple token sources (PATs and Apps) were available with unused capacity. GITHUB_TOKEN was 100% exhausted while WORKFLOWS_APP only had 3.2% usage.
Solution
New
token_load_balancer.jsModuleA comprehensive token management system that:
Integration
checkRateLimitStatus()added to keepalive_loop.jsrate_limit_remaining,rate_limit_recommendationEnhanced Diagnostics
Health-75 workflow now checks all 6 token types:
Files Changed
.github/scripts/token_load_balancer.js(new).github/scripts/keepalive_loop.js(rate limit integration).github/workflows/agents-keepalive-loop.yml(new outputs).github/workflows/health-75-api-rate-diagnostic.yml(all 6 tokens)Testing