Skip to content

feat: add unified setup-api-client action and remediation plan#1183

Merged
stranske merged 25 commits intomainfrom
fix/rate-limit-comprehensive-remediation
Feb 2, 2026
Merged

feat: add unified setup-api-client action and remediation plan#1183
stranske merged 25 commits intomainfrom
fix/rate-limit-comprehensive-remediation

Conversation

@stranske
Copy link
Copy Markdown
Owner

@stranske stranske commented Feb 2, 2026

Rate Limit Comprehensive Remediation

This PR comprehensively addresses the API rate limit issues across ALL workflows by:

Changes

  1. New setup-api-client Action (.github/actions/setup-api-client/)

    • Unified action that handles npm install + token export
    • Accepts secrets via toJSON(secrets) for automatic discovery
    • Properly counts tokens without double-counting GH_TOKEN
    • Includes jq availability check
    • Creates .github/scripts directory if needed
    • Preserves backward compatibility via optional individual secret inputs
  2. Workflow Migration (60+ workflows)

    • Replaced all export-load-balancer-tokens references with setup-api-client
    • Updated sparse-checkout paths
    • Simplified parameters to just secrets and github_token
    • Removed individual secret parameters
  3. Consumer Template Updates (10+ templates)

    • Synced templates/consumer-repo/.github/actions/setup-api-client/
    • Updated all consumer workflow templates
    • Fixed out-of-date agents-auto-pilot.yml template

Expected CI Behavior

⚠️ Some CI checks may fail because:

  • Reusable workflows (like reusable-18-autofix.yml) check out from main branch
  • The setup-api-client action doesn't exist on main yet
  • Once this PR is merged, these checks will pass

Testing

  • Main workflows that checkout from ./.github/actions/setup-api-client
  • Consumer templates updated ✅
  • YAML syntax validated ✅

Files Changed

  • 70 files changed total
  • 478 insertions, 607 deletions (net simplification)

This commit adds:

1. New unified setup-api-client action (.github/actions/setup-api-client/action.yml)
   - Combines npm install + token export into one reusable action
   - Pins @octokit/* versions for consistency (20.0.2, 6.0.1, 9.1.5, 6.0.3)
   - Supports both JSON secrets and individual inputs
   - Reports token count for debugging

2. Comprehensive remediation plan (docs/fixes/RATE_LIMIT_REMEDIATION_PLAN.md)
   - Detailed PR history from #1008 to #1182
   - Root cause analysis
   - Implementation phases
   - Testing strategy
   - Handoff protocol

Next steps: Apply the action to high-frequency workflows
Copilot AI review requested due to automatic review settings February 2, 2026 01:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 2, 2026

Automated Status Summary

Head SHA: 7901fcd
Latest Runs: ⏳ pending — Gate
Required contexts: Gate / gate, Health 45 Agents Guard / Enforce agents workflow protections
Required: core tests (3.11): ⏳ pending, core tests (3.12): ⏳ pending, docker smoke: ⏳ pending, gate: ⏳ pending

Workflow / Job Result Logs
(no jobs reported) ⏳ pending

Coverage Overview

  • Coverage history entries: 1

Coverage Trend

Metric Value
Current 93.12%
Baseline 85.00%
Delta +8.12%
Minimum 70.00%
Status ✅ Pass

Top Coverage Hotspots (lowest coverage)

File Coverage Missing
src/cli_parser.py 81.8% 4
src/percentile_calculator.py 95.0% 1
src/aggregator.py 95.0% 2
src/__init__.py 100.0% 0
src/ndjson_parser.py 100.0% 0

Updated automatically; will refresh on subsequent CI/Docker completions.


Keepalive checklist

Scope

No scope information available

Tasks

  • No tasks defined

Acceptance criteria

  • No acceptance criteria defined

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 2, 2026

🤖 Keepalive Loop Status

PR #1183 | Agent: Codex | Iteration 0/5

Current State

Metric Value
Iteration progress [----------] 0/5
Action wait (missing-agent-label)
Disposition skipped (transient)
Gate success
Tasks 0/19 complete
Timeout 45 min (default)
Timeout usage 0m elapsed (1%, 45m remaining)
Keepalive ❌ disabled
Autofix ❌ disabled

🔍 Failure Classification

| Error type | infrastructure |
| Error category | resource |
| Suggested recovery | Confirm the referenced resource exists (repo, PR, branch, workflow, or file). |

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces infrastructure to address persistent API rate limiting issues by creating a unified action for dependency installation and token management, accompanied by a comprehensive remediation plan documenting 40+ prior attempts to fix these issues.

Changes:

  • Added new setup-api-client composite action that combines npm dependency installation with token export functionality
  • Created detailed remediation plan documenting the complete history of rate limit fixes from PR #1008 through #1182, root cause analysis, and phased implementation strategy
  • Established pinned versions for @octokit packages (rest@20.0.2, plugin-retry@6.0.1, plugin-paginate-rest@9.1.5, auth-app@6.0.3)

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 9 comments.

File Description
.github/actions/setup-api-client/action.yml New unified composite action that installs @octokit dependencies and exports all available tokens to environment variables, supporting both JSON secrets input and individual token inputs
docs/fixes/RATE_LIMIT_REMEDIATION_PLAN.md Comprehensive 623-line remediation plan documenting PR history, root cause analysis, implementation phases, success criteria, testing strategy, and handoff protocol

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This commit updates agents-keepalive-loop.yml to use the new unified
setup-api-client action instead of separate npm install + export steps.

Changes:
- Replace 4 instances of 'npm install' + 'export-load-balancer-tokens'
  with single 'setup-api-client' action
- Remove duplicate export block in evaluate job
- Add setup-api-client to summary job

Benefits:
- Single point of dependency management (pinned versions)
- Consistent token export across all jobs
- Reduced workflow file from 965 to 887 lines
- Easier maintenance - one action to update, not multiple blocks

Jobs updated:
- evaluate: lines 78-82
- mark-running: lines 332-336
- run-codex: lines 427-431
- summary: lines 639-643
@stranske stranske temporarily deployed to agent-high-privilege February 2, 2026 01:43 — with GitHub Actions Inactive
This ensures the new unified setup action will be synced to consumer repos.
Also marks export-load-balancer-tokens as deprecated.
stranske and others added 3 commits February 2, 2026 01:51
When rate limits are exhausted, the summary comment now shows:

### 🛑 Agent Stopped: API capacity depleted

This replaces the misleading '🔄 Agent Running' status that previously
appeared when the agent was actually blocked by rate limits.

The new status clearly indicates:
- All token pools are exhausted
- This is NOT a code/prompt problem
- Automatic recovery when limits reset (~1 hour)

Detection logic:
- reason === 'rate-limit-exhausted'
- action === 'defer' with rate-related reason
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 2, 2026

✅ Codex Completion Checkpoint

Commit: 26a074d
Recorded: 2026-02-02T02:51:21.035Z

No new completions recorded this round.

About this comment

This comment is automatically generated to track task completions.
The Automated Status Summary reads these checkboxes to update PR progress.
Do not edit this comment manually.

The check_api_wrapper_guard.py script now accepts either:
- export-load-balancer-tokens (old pattern)
- setup-api-client (new unified action)

This allows workflows to use the new setup-api-client action
without triggering lint errors.
@stranske stranske temporarily deployed to agent-high-privilege February 2, 2026 01:56 — with GitHub Actions Inactive
@github-actions github-actions bot added the autofix Opt-in automated formatting & lint remediation label Feb 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 2, 2026

Status | ✅ no new diagnostics
History points | 1
Timestamp | 2026-02-02 02:30:09 UTC
Report artifact | autofix-report-pr-1183
Remaining | 0
New | 0
No additional artifacts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 2, 2026

Autofix updated these files:

  • node_modules/.package-lock.json

@agents-workflows-bot
Copy link
Copy Markdown
Contributor

⚠️ Action Required: Unable to determine source issue for PR #1183. The PR title, branch name, or body must contain the issue number (e.g. #123, branch: issue-123, or the hidden marker ).

github-actions bot and others added 2 commits February 2, 2026 02:51
Documents the relationship between:
- setup-api-client action (exports tokens to GITHUB_ENV)
- github-api-with-retry.js (reads env vars, creates token-aware wrapper)
- token_load_balancer.js (token registry and selection)

This ensures future changes to rate limiting understand the full component chain.
@stranske stranske temporarily deployed to agent-high-privilege February 2, 2026 02:55 — with GitHub Actions Inactive
Fails CI when templates in templates/consumer-repo/ drift more than 50 lines
from their main workflow counterparts in .github/workflows/.

This prevents the situation where consumers receive outdated versions
because templates weren't updated when main workflows changed.

Triggered on:
- Push/PR touching agents-*.yml workflows
- Push/PR touching consumer templates
@stranske stranske temporarily deployed to agent-high-privilege February 2, 2026 02:57 — with GitHub Actions Inactive
Explicitly instructs Copilot to read CLAUDE.md before any work.
Also documents the template sync requirement.
- Renamed ci-template-drift.yml → health-74-template-drift.yml
- Added to EXPECTED_NAMES in test_workflow_naming.py
- Changed to warn-only mode (pre-existing drift shouldn't block PRs)
@stranske stranske temporarily deployed to agent-high-privilege February 2, 2026 03:03 — with GitHub Actions Inactive
Required for test_inventory_docs_list_all_workflows test
@stranske stranske temporarily deployed to agent-high-privilege February 2, 2026 03:05 — with GitHub Actions Inactive
@stranske stranske merged commit 13f8289 into main Feb 2, 2026
40 of 41 checks passed
@stranske stranske deleted the fix/rate-limit-comprehensive-remediation branch February 2, 2026 03:08
stranske added a commit that referenced this pull request Feb 2, 2026
Systematic audit found 8 jobs with github-script that make API calls
but were missing setup-api-client for rate limit mitigation.

Fixed jobs:
- agents-autofix-loop.yml / metrics
- agents-bot-comment-handler.yml / cleanup
- reusable-10-ci-python.yml / logs_summary
- reusable-16-agents.yml / preflight
- reusable-20-pr-meta.yml / keepalive_orchestrator
- reusable-20-pr-meta.yml / keepalive_from_gate
- reusable-20-pr-meta.yml / pr_body_update
- reusable-bot-comment-handler.yml / dispatch

Identified false positive (no fix needed):
- reusable-16-agents.yml / verify_issue_summary (uses core.summary only)

Audit tracked in docs/fixes/setup-api-client-coverage-audit.csv

Refs: #1183
stranske added a commit that referenced this pull request Feb 2, 2026
* docs: add setup-api-client coverage audit spreadsheet

Tracks which workflow jobs have github-script but lack setup-api-client.
Identifies 10 gaps requiring fixes for complete rate limit remediation.

Columns track fix status, PR number, and date for audit trail.

* fix: add setup-api-client to all jobs making GitHub API calls

Systematic audit found 8 jobs with github-script that make API calls
but were missing setup-api-client for rate limit mitigation.

Fixed jobs:
- agents-autofix-loop.yml / metrics
- agents-bot-comment-handler.yml / cleanup
- reusable-10-ci-python.yml / logs_summary
- reusable-16-agents.yml / preflight
- reusable-20-pr-meta.yml / keepalive_orchestrator
- reusable-20-pr-meta.yml / keepalive_from_gate
- reusable-20-pr-meta.yml / pr_body_update
- reusable-bot-comment-handler.yml / dispatch

Identified false positive (no fix needed):
- reusable-16-agents.yml / verify_issue_summary (uses core.summary only)

Audit tracked in docs/fixes/setup-api-client-coverage-audit.csv

Refs: #1183

* docs: update audit spreadsheet with PR#1189

* fix: address review comment - use correct checkout path per job

Review pointed out that in reusable-20-pr-meta.yml, jobs that checkout
workflows-lib should use ./workflows-lib/.github/actions/setup-api-client,
while jobs that checkout consumer first should use ./consumer/.github/...

Corrected:
- keepalive_orchestrator, keepalive_from_gate, pr_body_update: use workflows-lib
  (Workflows repo is checked out to workflows-lib/ with setup-api-client)
- keepalive_dispatch: kept using consumer checkout
  (consumer repo is checked out first, workflows-lib comes later)

Updated audit spreadsheet to reflect the two different patterns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autofix Opt-in automated formatting & lint remediation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants