Skip to content

fix(workflows): surface Claude usage-limit failures#1464

Closed
kenaku wants to merge 1 commit intocoleam00:devfrom
kenaku:fix/claude-usage-limit-errors
Closed

fix(workflows): surface Claude usage-limit failures#1464
kenaku wants to merge 1 commit intocoleam00:devfrom
kenaku:fix/claude-usage-limit-errors

Conversation

@kenaku
Copy link
Copy Markdown

@kenaku kenaku commented Apr 28, 2026

Summary

When Claude SDK emits a rejected usage-limit event, the DAG executor currently ignores that event and later reports the final SDK result as SDK returned success. This makes exhausted Claude usage quota look like an internal workflow contradiction instead of an actionable quota/reset problem.

This PR stores the latest rejected Claude rate_limit event while streaming a DAG node or loop iteration, then uses it when formatting an SDK error result. The user-facing failure now says Claude usage limit hit, includes the SDK-provided limit type when present, and shows the reset timestamp plus remaining minutes.

The implementation is generic over rateLimitType; it does not assume a five-hour limit.

Validation

  • bun test packages/workflows/src/dag-executor.test.ts -t "uses rejected Claude usage-limit details"
  • bun --filter @archon/workflows type-check
  • bun x prettier --check packages/workflows/src/dag-executor.ts packages/workflows/src/dag-executor.test.ts

I also ran the full bun test packages/workflows/src/dag-executor.test.ts. The new usage-limit test passes, but the file still has 3 pre-existing loader/discovery failures where discoverWorkflows(..., { loadDefaults: false }) returns 2 workflows instead of the expected 1.

Notes

I did not find an existing issue for this exact Claude usage-limit reporting bug. Related symptoms exist in #1439 and #1425, but those cover context-limit / stop-sequence cases rather than rejected usage-limit events.

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of rate-limit failures by ensuring workflows properly fail with accurate rate-limit details in error messages, even when the SDK reports misleading success status.
  • Tests

    • Added test coverage for rate-limit failure scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 28, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The changes enhance SDK error handling in the DAG executor by centralizing error message formatting that incorporates rate-limit details, and adding a test case that validates proper failure reporting when the SDK reports a misleading success subtype despite rate-limit rejection.

Changes

Cohort / File(s) Summary
Rate-limit-aware error formatting
packages/workflows/src/dag-executor.ts
Introduces centralized error message formatting that captures the most recent rate_limit chunk and incorporates reset timestamp, remaining minutes, and overage details. Tracks lastRateLimitInfo during streaming, logs warnings when rejection is detected, and attaches rate-limit information to error log payloads.
Rate-limit error handling test
packages/workflows/src/dag-executor.test.ts
Adds test case validating that workflow failure is properly recorded and failure messages are constructed from rate-limit rejection details rather than the SDK's misleading success subtype.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 When SDK whispers success through the wire,
But rate limits spark and the system must tire,
Our error formats now catch every clue,
Rejections are clear—no more fuzzy "true"!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the problem, solution, and validation steps. However, it is missing most sections of the required template including UX Journey, Architecture Diagram, Label Snapshot, Change Metadata, Security Impact, Compatibility, Human Verification, and Rollback Plan. Fill out the missing template sections including UX Journey (before/after flow), Architecture Diagram, Label Snapshot (risk/size/scope/module), Change Metadata, Security Impact, Compatibility assessment, Human Verification details, Rollback Plan, and Risks/Mitigations.
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(workflows): surface Claude usage-limit failures' directly and clearly summarizes the main change: making Claude usage-limit failures user-facing in the workflows executor.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kenaku kenaku closed this Apr 28, 2026
@kenaku kenaku deleted the fix/claude-usage-limit-errors branch April 28, 2026 10:35
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0a44dc0ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

function formatRateLimitReset(rateLimitInfo: Record<string, unknown>): string {
const resetsAt = typeof rateLimitInfo.resetsAt === 'number' ? rateLimitInfo.resetsAt : undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Normalize rate_limit_info keys before formatting

formatRateLimitReset/formatSdkErrorMessage read camelCase fields (resetsAt, rateLimitType, overageStatus), but the Claude provider passes through rate_limit_info from the SDK without key normalization. In the common case where that payload is snake_case, this branch will miss the reset/type/overage metadata and produce reset time unknown, so usage-limit failures still lose the actionable details this change is intended to surface.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant