feat: Add dynamic token load balancer for API rate limit management by stranske · Pull Request #1008 · stranske/Workflows

stranske · 2026-01-21T06:20:26Z

Summary

Implement intelligent token rotation across multiple PATs and GitHub Apps to prevent API rate limit exhaustion during keepalive operations.

Problem

The keepalive loop was failing due to API rate limit exhaustion, even though multiple token sources (PATs and Apps) were available with unused capacity. GITHUB_TOKEN was 100% exhausted while WORKFLOWS_APP only had 3.2% usage.

Solution

New `token_load_balancer.js` Module

A comprehensive token management system that:

Token Registry - Tracks all available tokens and their rate limit status
Capacity-based Selection - Selects token with highest remaining capacity
Task Specialization - Routes specific tasks to appropriate tokens:

Token	Primary Tasks	Exclusive?
KEEPALIVE_APP	keepalive-loop	✅ Yes
OWNER_PR_PAT	PR creation as owner	✅ Yes
SERVICE_BOT_PAT	bot comments, labels	Primary
ACTIONS_BOT_PAT	workflow dispatch	Primary
GH_APP	comment handling	Primary
WORKFLOWS_APP	general ops	Fallback

Proactive Rotation - Switches tokens before hitting limits
Graceful Degradation - Defers when all tokens exhausted

Integration

checkRateLimitStatus() added to keepalive_loop.js
Early rate limit check at evaluation start
Automatic deferral when capacity exhausted
New workflow outputs: rate_limit_remaining, rate_limit_recommendation

Enhanced Diagnostics

Health-75 workflow now checks all 6 token types:

GITHUB_TOKEN
CODESPACES_WORKFLOWS
SERVICE_BOT_PAT
WORKFLOWS_APP
KEEPALIVE_APP
GH_APP

Files Changed

.github/scripts/token_load_balancer.js (new)
.github/scripts/keepalive_loop.js (rate limit integration)
.github/workflows/agents-keepalive-loop.yml (new outputs)
.github/workflows/health-75-api-rate-diagnostic.yml (all 6 tokens)
Templates synced

Testing

All JavaScript files pass syntax check
All YAML files valid
Pre-commit hooks pass
Ready for integration testing via Health-75 workflow

Implement intelligent token rotation across multiple PATs and GitHub Apps to prevent API rate limit exhaustion during keepalive operations. Key features: - Token registry with capacity tracking for all available tokens - Dynamic selection based on remaining capacity and task requirements - Token specialization support (exclusive/primary assignments) - Proactive rotation before limits are hit - Graceful degradation when all tokens low Token specializations: - KEEPALIVE_APP: Exclusive for keepalive-loop (isolated pool) - OWNER_PR_PAT: Exclusive for PR creation as owner - SERVICE_BOT_PAT: Primary for bot comments/labels - ACTIONS_BOT_PAT: Primary for workflow dispatch - GH_APP: Primary for comment handling Integration: - checkRateLimitStatus() added to keepalive_loop.js - Early rate limit check with automatic deferral - Rate limit outputs added to workflow Diagnostics: - Health-75 updated to check all 6 token types - Aggregate totals across all token pools

github-actions · 2026-01-21T06:20:38Z

❌ Sync Manifest Validation Failed

This PR modifies files that should be synced to consumer repos, but the sync manifest is incomplete.

Required action: Update .github/sync-manifest.yml to include the new/modified files.

Why this matters

Files not declared in the manifest won't be synced to consumer repos (Template, Manager-Database, trip-planner, Travel-Plan-Permission), causing features to silently not work in those repos.

How to fix

Open .github/sync-manifest.yml
Add your new files to the appropriate section (workflows, prompts, scripts, etc.)
Include a description explaining what the file does
If the file should NOT be synced, add it to excluded: with a reason

See the workflow logs for specific files that need to be added.

agents-workflows-bot · 2026-01-21T06:20:39Z

⚠️ Action Required: Unable to determine source issue for PR #1008. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1bbaea73d5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-01-21T06:23:45Z

.github/scripts/keepalive_loop.js

+      // Determine if we should defer
+      result.shouldDefer = tokenLoadBalancer.shouldDefer(minRequired);
+      result.canProceed = !result.shouldDefer && result.totalRemaining >= minRequired;


Initialize token registry before deferring on limits

Because checkRateLimitStatus unconditionally calls tokenLoadBalancer.shouldDefer(minRequired) when the module is present, the keepalive loop will defer even if the primary token has quota whenever the registry is empty (the new module is required but initializeTokenRegistry is never called in this script). shouldDefer returns true when tokenRegistry.tokens is empty (token_load_balancer.js lines 631–637), so this path will always hit the early action: 'defer' return on every run unless forceRetry is set. This effectively stalls the loop in the default configuration.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR implements a comprehensive token load balancer system to prevent API rate limit exhaustion during keepalive operations by intelligently rotating across multiple Personal Access Tokens (PATs) and GitHub Apps.

Changes:

Introduces new token_load_balancer.js module for dynamic token selection based on rate limit capacity
Integrates rate limit checking into the keepalive loop with early deferral when all tokens are exhausted
Enhances diagnostic workflows to monitor all 6 token types (GITHUB_TOKEN, CODESPACES_WORKFLOWS, SERVICE_BOT_PAT, WORKFLOWS_APP, KEEPALIVE_APP, GH_APP)
Improves task completion analysis to support numbered checklists and issue-only tasks
Refactors Source section extraction to properly handle nested headings and code blocks

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
templates/consumer-repo/.github/scripts/token_load_balancer.js	New module implementing token registry, rate limit tracking, and optimal token selection with task specialization
.github/scripts/token_load_balancer.js	Duplicate of template version for repository use
templates/consumer-repo/.github/scripts/keepalive_loop.js	Adds rate limit status checking, enhanced Source section parsing, numbered checklist support, and issue-only task matching
.github/scripts/keepalive_loop.js	Duplicate of template version with same enhancements
templates/consumer-repo/.github/workflows/agents-keepalive-loop.yml	Adds rate_limit_remaining and rate_limit_recommendation outputs, improves YAML formatting
.github/workflows/agents-keepalive-loop.yml	Duplicate of template version with same outputs
.github/workflows/health-75-api-rate-diagnostic.yml	Expands token monitoring from 3 to 6 tokens, calculates total remaining capacity
.github/workflows/health-72-template-sync.yml	Uses HEAD_REF variable for consistency
.github/scripts/tests/keepalive-loop.test.js	Adds comprehensive tests for new features (Source section parsing, numbered checklists, issue-only tasks)
agents/codex-1001.md	Bootstrap file for codex on issue #1001

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-21T06:25:22Z

templates/consumer-repo/.github/scripts/token_load_balancer.js

+  const percentUsed = core_limit.limit > 0 
+    ? ((core_limit.used / core_limit.limit) * 100).toFixed(1)
+    : 0;
+
+  return {
+    limit: core_limit.limit,
+    remaining: core_limit.remaining,
+    used: core_limit.used,
+    reset: core_limit.reset * 1000,
+    checked: Date.now(),
+    percentUsed: parseFloat(percentUsed),
+    percentRemaining: 100 - parseFloat(percentUsed),
+  };


The percentUsed calculation uses toFixed which returns a string, but this string is then parsed back to a number. The percentRemaining calculation on line 308 will subtract a string from 100, which in JavaScript will work due to type coercion but could lead to unexpected behavior. Consider using the numeric value directly instead of converting to string and back.

Copilot · 2026-01-21T06:25:23Z

templates/consumer-repo/.github/scripts/token_load_balancer.js

+    tokenInfo.rateLimit.percentUsed = tokenInfo.rateLimit.limit > 0
+      ? ((tokenInfo.rateLimit.used / tokenInfo.rateLimit.limit) * 100).toFixed(1)
+      : 0;
+    tokenInfo.rateLimit.percentRemaining = 100 - tokenInfo.rateLimit.percentUsed;


The percentUsed calculation uses toFixed which returns a string, but this string is then used in arithmetic operations. The percentRemaining assignment will perform string-to-number coercion. This is similar to the issue in checkTokenRateLimit. Consider calculating both as numbers directly to avoid type inconsistencies.

Copilot · 2026-01-21T06:25:23Z

templates/consumer-repo/.github/scripts/token_load_balancer.js

+async function initializeTokenRegistry({ secrets, github, core, githubToken }) {
+  tokenRegistry.tokens.clear();
+
+  // Register GITHUB_TOKEN (always available)
+  if (githubToken) {
+    registerToken({
+      id: 'GITHUB_TOKEN',
+      token: githubToken,
+      type: 'GITHUB_TOKEN',
+      source: 'github.token',
+      capabilities: TOKEN_CAPABILITIES.GITHUB_TOKEN,
+      priority: 0, // Lowest priority (most restricted)
+    });
+  }
+
+  // Register PATs (check for PAT1, PAT2, etc. pattern as well as named PATs)
+  const patSources = [
+    { id: 'SERVICE_BOT_PAT', env: secrets.SERVICE_BOT_PAT, account: 'stranske-automation-bot' },
+    { id: 'ACTIONS_BOT_PAT', env: secrets.ACTIONS_BOT_PAT, account: 'stranske-automation-bot' },
+    { id: 'CODESPACES_WORKFLOWS', env: secrets.CODESPACES_WORKFLOWS, account: 'stranske' },
+    { id: 'OWNER_PR_PAT', env: secrets.OWNER_PR_PAT, account: 'stranske' },
+    { id: 'AGENTS_AUTOMATION_PAT', env: secrets.AGENTS_AUTOMATION_PAT, account: 'unknown' },
+    // Numbered PATs for future expansion
+    { id: 'PAT_1', env: secrets.PAT_1, account: 'pool' },
+    { id: 'PAT_2', env: secrets.PAT_2, account: 'pool' },
+    { id: 'PAT_3', env: secrets.PAT_3, account: 'pool' },
+  ];
+
+  for (const pat of patSources) {
+    if (pat.env) {
+      registerToken({
+        id: pat.id,
+        token: pat.env,
+        type: 'PAT',
+        source: pat.id,
+        account: pat.account,
+        capabilities: TOKEN_CAPABILITIES.PAT,
+        priority: 5, // Medium priority
+      });
+    }
+  }


The initializeTokenRegistry function does not validate that the secrets object exists before accessing its properties. If secrets is undefined or null, this will throw an error. Consider adding a guard check at the beginning of the function.

Copilot · 2026-01-21T06:25:23Z

templates/consumer-repo/.github/scripts/keepalive_loop.js

-    .filter(line => /^\s*[-*+]\s*\[\s*\]/.test(line))
    .map(line => {
-      const match = line.match(/^\s*[-*+]\s*\[\s*\]\s*(.+)$/);
+      const match = line.match(/^\s*(?:[-*+]|\d+[.)])\s*\[\s*\]\s*(.+)$/);


The regex pattern used to match numbered checklist items includes both '1.' and '1)' formats. However, the pattern requires whitespace after the number/marker with '\s*', which means '1.[ ]' (without space) won't match. Consider if this is the intended behavior or if the pattern should be more flexible.

Copilot · 2026-01-21T06:25:23Z

templates/consumer-repo/.github/scripts/keepalive_loop.js

+    const strippedIssueTask = task
+      .replace(/\[[^\]]*\]\(([^)]+)\)/g, '$1')
+      .replace(/https?:\/\/\S+/gi, '')
+      .replace(/[#\d]/g, '')


The strippedIssueTask calculation removes all digits with .replace(/[#\d]/g, ''), which could inadvertently remove legitimate digits that are part of the task description, not just the issue number. This might cause false positives for isIssueOnlyTask. Consider a more targeted approach that only removes the specific issue reference pattern.

Suggested change

const strippedIssueTask = task

.replace(/\[[^\]]*\]$([^)]+)$/g, '$1')

.replace(/https?:\/\/\S+/gi, '')

.replace(/[#\d]/g, '')

let strippedIssueTask = task

.replace(/\[[^\]]*\]$([^)]+)$/g, '$1')

.replace(/https?:\/\/\S+/gi, '');

if (issuePattern) {

strippedIssueTask = strippedIssueTask.replace(issuePattern, '');

}

strippedIssueTask = strippedIssueTask

Copilot · 2026-01-21T06:25:24Z

templates/consumer-repo/.github/scripts/token_load_balancer.js

+  let token = best.tokenInfo.token;
+  if (best.tokenInfo.type === 'APP' && !token) {
+    token = await mintAppToken({ tokenInfo: best.tokenInfo, core });
+    best.tokenInfo.token = token;
+  }
+
+  core?.info?.(`Selected token: ${best.id} (${best.remaining} remaining, ${best.percentRemaining.toFixed(1)}% capacity)${best.isPrimary ? ' [primary]' : ''}`);
+
+  return {
+    token,
+    source: best.id,
+    type: best.tokenInfo.type,
+    remaining: best.remaining,
+    percentRemaining: best.percentRemaining,
+    percentUsed: best.tokenInfo.rateLimit?.percentUsed ?? 0,
+    isPrimary: best.isPrimary,
+    task,
+  };


After minting an app token, if the token is null (minting failed), the function continues and returns a response with token: null. This could cause issues for callers expecting a valid token. Consider adding a check after token minting and returning null from getOptimalToken if the token couldn't be minted, or trying the next candidate.

Copilot · 2026-01-21T06:25:24Z

templates/consumer-repo/.github/scripts/token_load_balancer.js

+          if (tokenInfo.type === 'APP' && !token) {
+            token = await mintAppToken({ tokenInfo, core });
+            tokenInfo.token = token;
+          }
+          if (token) {
+            return {
+              token,
+              source: id,
+              type: tokenInfo.type,
+              remaining: tokenInfo.rateLimit?.remaining ?? 0,
+              percentRemaining: tokenInfo.rateLimit?.percentRemaining ?? 0,
+              percentUsed: tokenInfo.rateLimit?.percentUsed ?? 0,
+              exclusive: true,
+              task,
+            };
+          }
+        }


Similarly to the main token selection path, after minting an app token for an exclusive task (line 384-385), if the token is null (minting failed), the code checks 'if (token)' but this means no token is returned for that exclusive task. The function will then fall through to general token selection, which may violate the exclusivity contract. Consider handling the failed minting case more explicitly.

Copilot · 2026-01-21T06:25:24Z

templates/consumer-repo/.github/scripts/token_load_balancer.js

+    tokenInfo.rateLimit = {
+      limit,
+      remaining,
+      used: used || (limit - remaining),
+      reset: reset ? reset * 1000 : tokenInfo.rateLimit.reset,
+      checked: Date.now(),
+      percentUsed: ((limit - remaining) / limit * 100).toFixed(1),
+      percentRemaining: (remaining / limit * 100).toFixed(1),
+    };


The updateFromHeaders function has inconsistent type handling. Lines 522-523 calculate percentUsed and percentRemaining using toFixed which returns strings, while the rest of the rateLimit object uses numbers. This type inconsistency could cause issues in comparisons and calculations elsewhere in the code.

Copilot · 2026-01-21T06:25:24Z

templates/consumer-repo/.github/scripts/keepalive_loop.js


    if (pattern.test(updatedBody)) {
-      updatedBody = updatedBody.replace(pattern, '$1[x]$2');
+      updatedBody = updatedBody.replace(pattern, '$1$2[x]$3');


The regex pattern on line 3000 attempts to match numbered checklist items for replacement. However, the capture groups have changed from the previous version. The pattern now captures '(^|\n)(\s*(?:[-+]|\d+[.)])\s)\\s*\', but the replacement '$1$2[x]$3' may not preserve the original formatting correctly. Specifically, if a match occurs at the start of a line (not after \n), the \n won't be in $1, which could affect the output formatting.

Suggested change

updatedBody = updatedBody.replace(pattern, '$1$2[x]$3');

updatedBody = updatedBody.replace(

pattern,

(fullMatch, lineStart, prefix, taskText) => `${lineStart}${prefix}[x]${taskText}`,

);

Copilot · 2026-01-21T06:25:24Z

templates/consumer-repo/.github/scripts/keepalive_loop.js

+      const prTitle = pr?.title;
+      const prRef = pr?.head?.ref;
+      const prMatch = issueMatchesText(issuePattern, prTitle) || issueMatchesText(issuePattern, prRef);


In the analyzeTaskCompletion function, the pr parameter is added but there's no null/undefined check before accessing pr.title and pr.head.ref on lines 2844-2845. If pr is not provided (which is optional based on the function signature), this will throw an error.

…anifest - Add SERVICE_BOT_PAT to DISPATCH_TOKEN_KEYS for fallback support - Add token_load_balancer.js to sync manifest - Fixes failing keepalive-runner.test.js test - Fixes sync manifest validation

- Initialize token registry before shouldDefer check to prevent always-defer bug (P1) - Fix percentUsed/percentRemaining type inconsistency (use numbers instead of strings) - Add secrets validation in initializeTokenRegistry - Handle failed app token minting for exclusive tasks (don't fall through) - Handle failed app token minting in general selection (try next candidate) - Add pr parameter null check before accessing properties - Use targeted issue number removal (only #number patterns, not all digits) Addresses 8 code review comments from Copilot and Codex bots.

Adds isInitialized() helper to check if token registry contains tokens before attempting to use shouldDefer(). Required by keepalive_loop.js P1 fix.

Add .flake8 config with max-line-length=100 to match black/ruff/isort configuration across all repos.

Fixed line length violations in workflows added/modified in main: - agents-80-pr-event-hub.yml: Split long if conditions (2 long JS lines in script block are unavoidable) - agents-pr-meta.yml: Multiline if conditions - agents-81-gate-followups.yml: Split long env var expressions - agents-verify-to-issue-v2.yml: Multiline if condition

github-actions · 2026-01-21T06:48:32Z

Automated Status Summary

Head SHA: d6a6935
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job	Result	Logs
(no jobs reported)	⏳ pending	—

Coverage Overview

Coverage history entries: 1

Coverage Trend

Metric	Value
Current	93.12%
Baseline	85.00%
Delta	+8.12%
Minimum	70.00%
Status	✅ Pass

Top Coverage Hotspots (lowest coverage)

File	Coverage	Missing
`src/cli_parser.py`	81.8%	4
`src/percentile_calculator.py`	95.0%	1
`src/aggregator.py`	95.0%	2
`src/__init__.py`	100.0%	0
`src/ndjson_parser.py`	100.0%	0

Updated automatically; will refresh on subsequent CI/Docker completions.

Keepalive checklist

Scope

No scope information available

Tasks

No tasks defined

Acceptance criteria

No acceptance criteria defined

github-actions · 2026-01-21T06:48:52Z

🤖 Keepalive Loop Status

PR #1008 | Agent: Codex | Iteration 0/5

Current State

Metric	Value
Iteration progress	[----------] 0/5
Action	wait (missing-agent-label)
Disposition	skipped (transient)
Gate	success
Tasks	0/5 complete
Timeout	45 min (default)
Timeout usage	2m elapsed (5%, 43m remaining)
Keepalive	❌ disabled
Autofix	❌ disabled

🔍 Failure Classification

github-actions · 2026-01-21T06:49:17Z

github-actions · 2026-01-21T06:49:17Z

Autofix updated these files:

tests/test_workflow_validator.py

- Removed leftover <<<<<<< HEAD marker at line 2823 - Fixed duplicate strippedIssueTask declaration - Fixed missing semicolon and improper if statement placement - Removed duplicate isIssueOnlyTask block All JavaScript syntax now valid.

When resolving merge conflicts, accidentally removed the variable declarations for 'confidence' and 'reason' that are used throughout analyzeTaskCompletion. This caused 'ReferenceError: confidence/reason is not defined' in tests.

This commit adds: 1. New unified setup-api-client action (.github/actions/setup-api-client/action.yml) - Combines npm install + token export into one reusable action - Pins @octokit/* versions for consistency (20.0.2, 6.0.1, 9.1.5, 6.0.3) - Supports both JSON secrets and individual inputs - Reports token count for debugging 2. Comprehensive remediation plan (docs/fixes/RATE_LIMIT_REMEDIATION_PLAN.md) - Detailed PR history from #1008 to #1182 - Root cause analysis - Implementation phases - Testing strategy - Handoff protocol Next steps: Apply the action to high-frequency workflows

* feat: add unified setup-api-client action and remediation plan This commit adds: 1. New unified setup-api-client action (.github/actions/setup-api-client/action.yml) - Combines npm install + token export into one reusable action - Pins @octokit/* versions for consistency (20.0.2, 6.0.1, 9.1.5, 6.0.3) - Supports both JSON secrets and individual inputs - Reports token count for debugging 2. Comprehensive remediation plan (docs/fixes/RATE_LIMIT_REMEDIATION_PLAN.md) - Detailed PR history from #1008 to #1182 - Root cause analysis - Implementation phases - Testing strategy - Handoff protocol Next steps: Apply the action to high-frequency workflows * refactor: apply setup-api-client action to keepalive workflow This commit updates agents-keepalive-loop.yml to use the new unified setup-api-client action instead of separate npm install + export steps. Changes: - Replace 4 instances of 'npm install' + 'export-load-balancer-tokens' with single 'setup-api-client' action - Remove duplicate export block in evaluate job - Add setup-api-client to summary job Benefits: - Single point of dependency management (pinned versions) - Consistent token export across all jobs - Reduced workflow file from 965 to 887 lines - Easier maintenance - one action to update, not multiple blocks Jobs updated: - evaluate: lines 78-82 - mark-running: lines 332-336 - run-codex: lines 427-431 - summary: lines 639-643 * chore: add setup-api-client action to sync manifest This ensures the new unified setup action will be synced to consumer repos. Also marks export-load-balancer-tokens as deprecated. * fix: add explicit 'Agent Stopped: API capacity depleted' status When rate limits are exhausted, the summary comment now shows: ### 🛑 Agent Stopped: API capacity depleted This replaces the misleading '🔄 Agent Running' status that previously appeared when the agent was actually blocked by rate limits. The new status clearly indicates: - All token pools are exhausted - This is NOT a code/prompt problem - Automatic recovery when limits reset (~1 hour) Detection logic: - reason === 'rate-limit-exhausted' - action === 'defer' with rate-related reason * chore: sync template scripts * chore(codex-autofix): apply updates (PR #1183) * fix: update API wrapper guard to accept setup-api-client action The check_api_wrapper_guard.py script now accepts either: - export-load-balancer-tokens (old pattern) - setup-api-client (new unified action) This allows workflows to use the new setup-api-client action without triggering lint errors. * chore(autofix): formatting/lint * chore(autofix): formatting/lint * fix: skip node_modules in API guard scan The _collect_all_files fallback was scanning node_modules directories when the git diff failed, causing false positives from third-party library code (e.g., @octokit type definitions containing api.github.com). This fix explicitly skips any path containing 'node_modules' in its parts, which is the correct behavior since we only want to lint project code, not dependencies. * fix: recognize ensureRateLimitWrapped as valid wrapper pattern The github-rate-limited-wrapper.js wrapper internally uses createTokenAwareRetry from github-api-with-retry.js to provide rate limit protection. This change recognizes both wrappers as valid, avoiding false positives for files like keepalive_loop.js that use the higher-level wrapper. * fix: exclude node_modules from _is_target_file check The previous commit added node_modules exclusion only to _collect_all_files(). When running with --base-ref (diff mode), the files come from git diff which can include node_modules paths if lockfiles were updated. Adding the node_modules check to _is_target_file ensures it applies to both code paths. * fix: address code review feedback from Copilot Changes to setup-api-client action: - Fix output description: 'tokens exported to environment' (not 'load balancer') - Create .github/scripts dir if it doesn't exist (avoid clutter at workspace root) - Capture npm stderr for debugging instead of suppressing it - Add jq availability check with warning and fallback to individual inputs - Fix double-counting of GITHUB_TOKEN/GH_TOKEN (same value, count once) - Add clarifying comment about empty values not being counted Documentation updates: - Mark Task A.1 and A.2 as DONE - Update example to match actual implementation - Note that package.json is not needed (inline version pinning) - Update 'to be created' to 'created in this PR' * fix: sync setup-api-client and updated keepalive to templates Critical fix: The sync system copies from templates/consumer-repo/, not from .github/workflows/. Without this commit: - setup-api-client action wouldn't sync to consumer repos - agents-keepalive-loop.yml updates wouldn't sync to consumer repos Consumer repos would continue failing with 'Token registry initialized with 0 tokens' Remaining work (documented in RATE_LIMIT_REMEDIATION_PLAN.md): - Update other workflows to use setup-api-client (belt dispatcher, worker, conveyor, etc.) - Currently 9 other workflows still use export-load-balancer-tokens * docs: add remaining work section to remediation plan Documents: - 9 workflows still using old export-load-balancer-tokens pattern - Priority order for updates - Template sync architecture (templates/consumer-repo/ is the source) - Update pattern to follow - Verification steps after merge * refactor: migrate ALL workflows to setup-api-client with simplified params This comprehensive update migrates all 70 workflow files from the legacy export-load-balancer-tokens action to the new setup-api-client action. Changes: - Replace all sparse-checkout paths from export-load-balancer-tokens to setup-api-client - Replace all action usages to use simplified parameter interface: - secrets: ${{ toJSON(secrets) }} - github_token: ${{ github.token }} - Remove individual secret parameters (service_bot_pat, token_rotation_json, etc) - Update consumer-repo templates to match The new setup-api-client action: - Accepts secrets via toJSON(secrets) for automatic discovery - Properly counts tokens without double-counting GH_TOKEN - Includes jq availability check - Creates .github/scripts directory if needed - Preserves backward compatibility via optional individual secret inputs Workflow files updated: 60+ workflows Template files updated: 10+ consumer templates Action synced: templates/consumer-repo/.github/actions/setup-api-client/ * fix: escape template expressions in action description GitHub Actions doesn't allow template expressions in the description field. Changed from ${{ }} to '{{ }}' for the usage examples. * fix: sync agents-auto-pilot.yml template from main workflow The consumer template was significantly out of date and still had direct gh api calls. Synced from the main workflow which uses the rate-limit wrapped helpers. * chore(codex-autofix): apply updates (PR #1183) * chore: sync template scripts * docs: add Rate Limiting Architecture section to CLAUDE.md Documents the relationship between: - setup-api-client action (exports tokens to GITHUB_ENV) - github-api-with-retry.js (reads env vars, creates token-aware wrapper) - token_load_balancer.js (token registry and selection) This ensures future changes to rate limiting understand the full component chain. * ci: add template drift check workflow Fails CI when templates in templates/consumer-repo/ drift more than 50 lines from their main workflow counterparts in .github/workflows/. This prevents the situation where consumers receive outdated versions because templates weren't updated when main workflows changed. Triggered on: - Push/PR touching agents-*.yml workflows - Push/PR touching consumer templates * docs: add copilot-instructions.md with mandatory read-first rule Explicitly instructs Copilot to read CLAUDE.md before any work. Also documents the template sync requirement. * fix: rename template drift check to follow naming convention - Renamed ci-template-drift.yml → health-74-template-drift.yml - Added to EXPECTED_NAMES in test_workflow_naming.py - Changed to warn-only mode (pre-existing drift shouldn't block PRs) * docs: add health-74-template-drift to workflow inventory Required for test_inventory_docs_list_all_workflows test --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

github-actions bot and others added 11 commits January 21, 2026 04:43

chore(codex): bootstrap PR for issue #1001

09cd8ec

Fix keepalive source section extraction

c68db63

chore: sync template scripts

b2213d7

Fix source extraction for indented headings

d210ead

chore: sync template scripts

423126f

Fix keepalive reconciliation for numbered tasks

b6f13db

Improve issue-link task reconciliation

4b48ff6

chore: sync template scripts

6cd85b7

Fix actionlint head_ref usage

5554ae6

Defer keepalive summary on rate limits

3aed141

Copilot AI review requested due to automatic review settings January 21, 2026 06:20

stranske temporarily deployed to agent-high-privilege January 21, 2026 06:20 — with GitHub Actions Inactive

Copilot started reviewing on behalf of stranske January 21, 2026 06:20 View session

chatgpt-codex-connector bot reviewed Jan 21, 2026

View reviewed changes

Copilot AI reviewed Jan 21, 2026

View reviewed changes

fix: Add SERVICE_BOT_PAT to dispatch fallback chain and update sync m…

778b0cb

…anifest - Add SERVICE_BOT_PAT to DISPATCH_TOKEN_KEYS for fallback support - Add token_load_balancer.js to sync manifest - Fixes failing keepalive-runner.test.js test - Fixes sync manifest validation

stranske temporarily deployed to agent-high-privilege January 21, 2026 06:29 — with GitHub Actions Inactive

stranske added 4 commits January 21, 2026 06:32

fix: Add isInitialized method for registry state check

db3a9c8

Adds isInitialized() helper to check if token registry contains tokens before attempting to use shouldDefer(). Required by keepalive_loop.js P1 fix.

fix: Set flake8 max line length to 100

9f9b96d

Add .flake8 config with max-line-length=100 to match black/ruff/isort configuration across all repos.

stranske temporarily deployed to agent-high-privilege January 21, 2026 06:47 — with GitHub Actions Inactive

github-actions bot added autofix Opt-in automated formatting & lint remediation autofix:patch labels Jan 21, 2026

agents-workflows-bot bot temporarily deployed to agent-high-privilege January 21, 2026 06:48 Inactive

stranske force-pushed the feature/token-load-balancer branch from 2e1be17 to 472fed1 Compare January 21, 2026 06:55

stranske temporarily deployed to agent-high-privilege January 21, 2026 06:56 — with GitHub Actions Inactive

agents-workflows-bot bot temporarily deployed to agent-high-privilege January 21, 2026 06:57 Inactive

stranske force-pushed the feature/token-load-balancer branch from ab8225c to fb804c8 Compare January 21, 2026 07:04

stranske temporarily deployed to agent-high-privilege January 21, 2026 07:04 — with GitHub Actions Inactive

chore(autofix): formatting/lint

5593dee

agents-workflows-bot bot temporarily deployed to agent-high-privilege January 21, 2026 07:05 Inactive

stranske merged commit 8a1f5ca into main Jan 21, 2026
39 checks passed

stranske deleted the feature/token-load-balancer branch January 21, 2026 07:07

stranske mentioned this pull request Feb 2, 2026

feat: add unified setup-api-client action and remediation plan #1183

Merged

Conversation

stranske commented Jan 21, 2026

Summary

Problem

Solution

New token_load_balancer.js Module

Integration

Enhanced Diagnostics

Files Changed

Testing

Uh oh!

github-actions bot commented Jan 21, 2026

❌ Sync Manifest Validation Failed

Why this matters

How to fix

Uh oh!

agents-workflows-bot bot commented Jan 21, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Status Summary

Coverage Overview

Coverage Trend

Top Coverage Hotspots (lowest coverage)

Keepalive checklist

Scope

Tasks

Acceptance criteria

Uh oh!

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Keepalive Loop Status

Current State

🔍 Failure Classification

Uh oh!

github-actions bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

New `token_load_balancer.js` Module

github-actions bot commented Jan 21, 2026 •

edited

Loading

github-actions bot commented Jan 21, 2026 •

edited

Loading

github-actions bot commented Jan 21, 2026 •

edited

Loading