fix(workflows): surface Claude usage-limit failures#1464
fix(workflows): surface Claude usage-limit failures#1464kenaku wants to merge 1 commit intocoleam00:devfrom
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThe changes enhance SDK error handling in the DAG executor by centralizing error message formatting that incorporates rate-limit details, and adding a test case that validates proper failure reporting when the SDK reports a misleading success subtype despite rate-limit rejection. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b0a44dc0ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| function formatRateLimitReset(rateLimitInfo: Record<string, unknown>): string { | ||
| const resetsAt = typeof rateLimitInfo.resetsAt === 'number' ? rateLimitInfo.resetsAt : undefined; |
There was a problem hiding this comment.
Normalize rate_limit_info keys before formatting
formatRateLimitReset/formatSdkErrorMessage read camelCase fields (resetsAt, rateLimitType, overageStatus), but the Claude provider passes through rate_limit_info from the SDK without key normalization. In the common case where that payload is snake_case, this branch will miss the reset/type/overage metadata and produce reset time unknown, so usage-limit failures still lose the actionable details this change is intended to surface.
Useful? React with 👍 / 👎.
Summary
When Claude SDK emits a rejected usage-limit event, the DAG executor currently ignores that event and later reports the final SDK result as
SDK returned success. This makes exhausted Claude usage quota look like an internal workflow contradiction instead of an actionable quota/reset problem.This PR stores the latest rejected Claude
rate_limitevent while streaming a DAG node or loop iteration, then uses it when formatting an SDK error result. The user-facing failure now saysClaude usage limit hit, includes the SDK-provided limit type when present, and shows the reset timestamp plus remaining minutes.The implementation is generic over
rateLimitType; it does not assume a five-hour limit.Validation
bun test packages/workflows/src/dag-executor.test.ts -t "uses rejected Claude usage-limit details"bun --filter @archon/workflows type-checkbun x prettier --check packages/workflows/src/dag-executor.ts packages/workflows/src/dag-executor.test.tsI also ran the full
bun test packages/workflows/src/dag-executor.test.ts. The new usage-limit test passes, but the file still has 3 pre-existing loader/discovery failures wherediscoverWorkflows(..., { loadDefaults: false })returns 2 workflows instead of the expected 1.Notes
I did not find an existing issue for this exact Claude usage-limit reporting bug. Related symptoms exist in #1439 and #1425, but those cover context-limit / stop-sequence cases rather than rejected usage-limit events.
Summary by CodeRabbit
Bug Fixes
Tests