fix: add retry with backoff for setup-api-client npm install#1656
fix: add retry with backoff for setup-api-client npm install#1656
Conversation
…query ERROR_CATEGORIES.RESOURCE was undefined because the error_classifier exports lowercase keys. This caused the isNotFound check to always fail, so 404/resource errors from missing workflows were reported as "api_error" instead of being handled as expected not-found responses. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Transient npm registry 403 errors (e.g. on safe-buffer) can kill the keepalive loop chain with no recovery. Replace the single-retry logic with a 3-attempt loop using exponential backoff (5s, 10s, 20s). The first failure still tries --legacy-peer-deps as before. https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Automated Status SummaryHead SHA: 43324c3
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopeScope section missing from source issue. Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #1656 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
Keepalive Work Log (click to expand)
|
There was a problem hiding this comment.
Pull request overview
Improves workflow resilience and CI signal quality by hardening the setup-api-client dependency install against transient npm failures and aligning verifier CI error-category casing with the shared error classifier.
Changes:
- Add an exponential-backoff retry loop (plus
--legacy-peer-depsfallback) aroundnpm installinsetup-api-client. - Fix
ERROR_CATEGORIESconstant casing fromRESOURCEtoresourcein verifier CI query logic (in both main + template scripts).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
.github/actions/setup-api-client/action.yml |
Adds multi-attempt retry + backoff to npm install for pinned Octokit dependencies. |
.github/scripts/verifier_ci_query.js |
Uses ERROR_CATEGORIES.resource to correctly classify 404/workflow-not-found as “resource”. |
templates/consumer-repo/.github/scripts/verifier_ci_query.js |
Mirrors the same ERROR_CATEGORIES.resource fix in the consumer template script. |
| NPM_MAX_RETRIES=3 | ||
| NPM_BACKOFF=5 # seconds; doubles each retry (5, 10, 20) | ||
| npm_installed=false |
There was a problem hiding this comment.
The backoff comment/behavior doesn’t match the stated plan: with NPM_MAX_RETRIES=3 the loop only sleeps twice (5s then 10s) and never reaches 20s, even though the comment says "(5, 10, 20)" and the PR description mentions 5/10/20. Either increase the retry count (e.g. 4 attempts / 3 retries) or adjust the comment/plan so it accurately reflects the actual delays.
| fi | ||
| rm -f "$npm_output" |
There was a problem hiding this comment.
If the --legacy-peer-deps fallback install fails, its stderr is currently discarded (the temp file is removed without being logged). That makes diagnosing persistent failures harder, especially when the first failure was transient but the legacy attempt fails for a different reason. Capture and log the legacy attempt’s stderr (at least once) before continuing to the backoff retries.
| fi | |
| rm -f "$npm_output" | |
| fi | |
| npm_err_legacy=$(cat "$npm_output") | |
| rm -f "$npm_output" | |
| echo "::warning::npm install with --legacy-peer-deps failed: $npm_err_legacy" |
…deps stderr - Fix backoff comment: with 3 retries, only 2 sleeps occur (5s, 10s), not 3 (5, 10, 20) as previously stated - Log stderr from --legacy-peer-deps fallback when it fails, so persistent failures are diagnosable in CI logs https://claude.ai/code/session_01JhCWWDJG8PqwaSbVPCGfm6
Automated Status Summary
Scope
Scope section missing from source issue.
Tasks
Acceptance criteria
Head SHA: 85edc46
Latest Runs: ✅ success — Gate
Required: gate: ✅ success