fix: bypass rate-limit-only Gate cancellations - proceed with work#702
fix: bypass rate-limit-only Gate cancellations - proceed with work#702
Conversation
…ately Rate limits are infrastructure noise, not code quality issues. When Gate is cancelled only due to API rate limits (not actual test failures), the keepalive loop should proceed with work immediately rather than deferring or waiting. This change: - Detects when Gate cancellation was due to rate limits only - Immediately continues with 'run' action instead of 'defer' - Sets reason as 'bypass-rate-limit-gate' for tracking - Preserves the defer fallback only for non-rate-limit cancellations This prevents PRs from getting stuck in 'defer' state waiting for scheduled retry workflows when the underlying issue is just temporary rate limiting from GitHub APIs. Affected PRs (examples): - #696, #698, #699 were stuck with 'gate-cancelled-rate-limit-transient'
Automated Status SummaryHead SHA: bc44e0e
Coverage Overview
Coverage Trend
Top Coverage Hotspots (lowest coverage)
Updated automatically; will refresh on subsequent CI/Docker completions. Keepalive checklistScopePart of Phase 3 workflow rollout validation per langchain-post-code-rollout.md Part of Phase 3 workflow rollout validation per langchain-post-code-rollout.md Context for AgentDesign Decisions & Constraints
Related Issues/PRsReferencesBlockers & Dependencies
Context for AgentDesign Decisions & Constraints
Related Issues/PRsReferences
Blockers & Dependencies
Tasks
Acceptance criteria
|
🤖 Keepalive Loop StatusPR #702 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
There was a problem hiding this comment.
Pull request overview
This PR modifies the keepalive loop to automatically bypass Gate workflow cancellations that are caused solely by GitHub API rate limits, allowing work to continue immediately instead of deferring to a scheduled retry. The change treats rate limits as temporary infrastructure constraints rather than code quality issues.
- Adds automatic bypass logic for rate-limit-only Gate cancellations
- Introduces new action reason
bypass-rate-limit-gatefor tracking - Updates decision flow to prioritize rate limit bypass over the existing forceRetry mechanism
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (gateRateLimit && tasksRemaining) { | ||
| action = 'run'; | ||
| reason = 'bypass-rate-limit-gate'; | ||
| if (core) core.info('Gate cancelled due to rate limits only - proceeding with work (rate limits are not code quality issues)'); | ||
| } else if (forceRetry && tasksRemaining) { | ||
| action = 'run'; | ||
| reason = 'force-retry-cancelled'; | ||
| if (core) core.info(`Force retry enabled: bypassing cancelled gate (rate_limit=${gateRateLimit})`); |
There was a problem hiding this comment.
The new rate limit bypass logic takes precedence over the forceRetry flag. When both gateRateLimit and forceRetry are true, this code will return action='run' with reason='bypass-rate-limit-gate' instead of reason='force-retry-cancelled'. This changes the behavior for existing test case at line 493-517 in keepalive-loop.test.js which expects reason='force-retry-cancelled' when forceRetry is enabled with a rate-limited cancellation.
Consider checking forceRetry first (swap the order of these two conditions) to preserve the existing forceRetry behavior and maintain backward compatibility with the existing test expectations.
| if (gateRateLimit && tasksRemaining) { | |
| action = 'run'; | |
| reason = 'bypass-rate-limit-gate'; | |
| if (core) core.info('Gate cancelled due to rate limits only - proceeding with work (rate limits are not code quality issues)'); | |
| } else if (forceRetry && tasksRemaining) { | |
| action = 'run'; | |
| reason = 'force-retry-cancelled'; | |
| if (core) core.info(`Force retry enabled: bypassing cancelled gate (rate_limit=${gateRateLimit})`); | |
| if (forceRetry && tasksRemaining) { | |
| action = 'run'; | |
| reason = 'force-retry-cancelled'; | |
| if (core) core.info(`Force retry enabled: bypassing cancelled gate (rate_limit=${gateRateLimit})`); | |
| } else if (gateRateLimit && tasksRemaining) { | |
| action = 'run'; | |
| reason = 'bypass-rate-limit-gate'; | |
| if (core) core.info('Gate cancelled due to rate limits only - proceeding with work (rate limits are not code quality issues)'); |
.github/scripts/keepalive_loop.js
Outdated
| if (gateRateLimit && tasksRemaining) { | ||
| action = 'run'; | ||
| reason = 'bypass-rate-limit-gate'; | ||
| if (core) core.info('Gate cancelled due to rate limits only - proceeding with work (rate limits are not code quality issues)'); |
There was a problem hiding this comment.
This change will cause existing tests to fail. The tests at lines 423-469 in keepalive-loop.test.js expect action='defer' and reason='gate-cancelled-rate-limit' when rate limit cancellations are detected. With this new code, when tasksRemaining is true, the action will be 'run' and reason will be 'bypass-rate-limit-gate' instead. The tests need to be updated to reflect this new behavior, or new tests should be added to verify the bypass logic works as intended.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…704) The reusable CI workflow had a bug where it assumed dev tools (black, ruff, mypy, pytest, etc.) were included in consumer repos' lock files. This caused CI failures with 'black: command not found' errors. Root cause: When has_lock_file=true, the workflow only recorded tools as 'from lock' for reporting but didn't actually install them. Consumer repos' lock files only contain runtime dependencies, not dev tools. This fix: - Always installs dev tools (black, ruff, mypy, pytest, etc.) - Removes the has_lock_file conditional for tool installation - Lock files still work for runtime dependencies - Affects all 4 CI jobs: lint-format, lint-ruff, typecheck-mypy, tests Impact: Fixes CI failures in Travel-Plan-Permission, Template, trip-planner, Collab-Admin and all other consumer repos with lock files.
Tests now expect action='run' with reason='bypass-rate-limit-gate' instead of action='defer' with reason='gate-cancelled-rate-limit'. Rate limits are infrastructure noise, not code quality issues. Work should proceed automatically when Gate cancellation is due to rate limits. Rate limit bypass takes precedence over forceRetry since: 1. Rate limit bypass is automatic infrastructure handling 2. forceRetry is still honored for non-rate-limit cases (cancelled, failed)
Aligns with JS test updates - rate limits are infrastructure noise that should be bypassed immediately rather than causing deferrals.
|
Status | ✅ no new diagnostics |
Automated Status Summary
Scope
Part of Phase 3 workflow rollout validation per langchain-post-code-rollout.md
Part of Phase 3 workflow rollout validation per langchain-post-code-rollout.md
Context for Agent
Design Decisions & Constraints
Related Issues/PRs
References
Blockers & Dependencies
Context for Agent
Design Decisions & Constraints
Related Issues/PRs
References
Blockers & Dependencies
Tasks
Acceptance criteria
Head SHA: 9c87503
Latest Runs: ✅ success — Gate
Required: gate: ✅ success