Skip to content

revert: use durable objects for sync ratelimiting#2813

Merged
chronark merged 4 commits intomainfrom
do-ratelimits
Jan 13, 2025
Merged

revert: use durable objects for sync ratelimiting#2813
chronark merged 4 commits intomainfrom
do-ratelimits

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Jan 13, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new Durable Object-based rate limiting mechanism for improved request management
    • Enhanced rate limit configuration and testing across different scenarios
  • Improvements

    • Updated rate limit test cases to cover more edge cases and real-world scenarios
    • Refined rate limiting configuration in deployment settings
  • Technical Updates

    • Added new rate limiting classes and supporting infrastructure
    • Updated environment configuration to support new rate limiting approach

@vercel
Copy link

vercel bot commented Jan 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 13, 2025 8:11pm
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 13, 2025 8:11pm
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 13, 2025 8:11pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
dashboard ⬜️ Ignored (Inspect) Visit Preview Jan 13, 2025 8:11pm

@changeset-bot
Copy link

changeset-bot bot commented Jan 13, 2025

⚠️ No Changeset found

Latest commit: 4843736

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive implementation of a Durable Object-based rate limiting mechanism for the API. The changes span multiple files, including environment configuration, middleware initialization, rate limiter client, and durable object implementation. The new approach replaces the previous agent-based rate limiting with a more robust, Cloudflare Durable Object-native solution that provides more precise and scalable rate limit management across different request scenarios.

Changes

File Change Summary
apps/api/src/pkg/env.ts Added DO_RATELIMIT property to zEnv object schema
apps/api/src/pkg/middleware/init.ts Updated rate limiter import and instantiation from AgentRatelimiter to DurableRateLimiter
apps/api/src/pkg/ratelimit/do_client.ts Introduced DurableRateLimiter class implementing rate limiting logic
apps/api/src/pkg/ratelimit/durable_object.ts Added DurableObjectRatelimiter class for managing rate limit state
apps/api/src/pkg/ratelimit/index.ts Added exports for new durable object and client modules
apps/api/wrangler.custom.toml Added new durable object binding and migration for DO_RATELIMIT
apps/api/wrangler.toml Added durable object binding across environments and updated migration tags

Sequence Diagram

sequenceDiagram
    participant Client
    participant RateLimiter
    participant DurableObject
    participant Storage

    Client->>RateLimiter: Request rate limit check
    RateLimiter->>RateLimiter: Generate unique request ID
    RateLimiter->>DurableObject: Check limit via HTTP request
    DurableObject->>Storage: Retrieve current state
    DurableObject->>DurableObject: Validate request against limit
    alt Request within limit
        DurableObject->>Storage: Update state
        DurableObject-->>RateLimiter: Allow request
        RateLimiter-->>Client: Process request
    else Request exceeds limit
        DurableObject-->>RateLimiter: Reject request
        RateLimiter-->>Client: Rate limit exceeded
    end
Loading

Possibly related PRs

Suggested reviewers

  • mcstepp
  • perkinsjr
  • MichaelUnkey

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3be1cb0 and 4843736.

📒 Files selected for processing (1)
  • apps/api/src/pkg/ratelimit/do_client.ts (1 hunks)

Finishing Touches

  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2025

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
apps/api/src/pkg/ratelimit/do_client.ts (2)

145-148: Use this.logger.error instead of console.error for error logging

To maintain consistent logging practices and utilize the provided logger, replace console.error with this.logger.error.

Apply this diff:

- console.error(res.err.message);
+ this.logger.error(res.err.message);

231-236: Handle potential undefined err.cause when logging errors

When logging errors in the catch block, err.cause may be undefined. Consider checking if err.cause exists before including it in the logged data to avoid logging undefined values.

Apply this diff:

this.logger.error("ratelimit failed", {
  identifier: req.identifier,
  error: err.message,
  stack: err.stack,
- cause: err.cause,
+ ...(err.cause && { cause: err.cause }),
});
apps/api/src/pkg/env.ts (1)

21-22: Enhance validation for DO_RATELIMIT and DO_USAGELIMIT environment variables

The current validation for DO_RATELIMIT and DO_USAGELIMIT checks only if the value is of type object, which may be insufficient to ensure that the environment variables are correctly configured. Consider improving the validation to more accurately reflect the structure of a DurableObjectNamespace.

Additionally, the inline comment // pretty loose check but it'll do I think is informal. Consider removing or rephrasing it for clarity and professionalism.

Apply this diff:

- DO_RATELIMIT: z.custom<DurableObjectNamespace>((ns) => typeof ns === "object"), // pretty loose check but it'll do I think
+ DO_RATELIMIT: z.custom<DurableObjectNamespace>((ns) => ns instanceof DurableObjectNamespace),
- DO_USAGELIMIT: z.custom<DurableObjectNamespace>((ns) => typeof ns === "object"),
+ DO_USAGELIMIT: z.custom<DurableObjectNamespace>((ns) => ns instanceof DurableObjectNamespace),
apps/api/src/pkg/ratelimit/durable_object.ts (1)

69-71: Reset this.memory after clearing storage in alarm method

After deleting all data from storage in the alarm method, this.memory still holds the previous values in memory. Resetting this.memory ensures that the in-memory state remains consistent with the cleared storage.

Apply this diff:

public async alarm(): Promise<void> {
  await this.state.storage.deleteAll();
+ this.memory = { current: 0 };
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc90b36 and 7da3c67.

📒 Files selected for processing (14)
  • apps/api/src/pkg/env.ts (1 hunks)
  • apps/api/src/pkg/middleware/init.ts (2 hunks)
  • apps/api/src/pkg/ratelimit/do_client.ts (1 hunks)
  • apps/api/src/pkg/ratelimit/durable_object.ts (1 hunks)
  • apps/api/src/pkg/ratelimit/index.ts (1 hunks)
  • apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts (2 hunks)
  • apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (2 hunks)
  • apps/api/src/worker.ts (1 hunks)
  • apps/api/wrangler.custom.toml (3 hunks)
  • apps/api/wrangler.toml (9 hunks)
  • packages/api/package.json (1 hunks)
  • packages/hono/package.json (1 hunks)
  • packages/nextjs/package.json (1 hunks)
  • packages/ratelimit/package.json (1 hunks)
✅ Files skipped from review due to trivial changes (5)
  • packages/api/package.json
  • packages/hono/package.json
  • packages/ratelimit/package.json
  • apps/api/src/pkg/ratelimit/index.ts
  • packages/nextjs/package.json
⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: Test Packages / Test ./packages/nextjs
  • GitHub Check: Test Packages / Test ./packages/hono
  • GitHub Check: Test Packages / Test ./packages/cache
  • GitHub Check: Test Packages / Test ./packages/api
  • GitHub Check: Test Packages / Test ./internal/clickhouse
  • GitHub Check: Test Packages / Test ./internal/resend
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: Test Packages / Test ./internal/billing
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Build / Build
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts (2)

20-74: Comprehensive test coverage for rate limiting scenarios.

The test cases effectively cover various critical scenarios:

  • Very short windows with high throughput
  • Burst traffic patterns
  • Steady-state traffic
  • Edge cases (tiny and high limits)
  • Real-world API scenarios

115-115: Verify the impact of tighter rate limiting bounds.

The upperLimit multiplier has been reduced from 2.5x to 1.5x, making the rate limiting more strict. Ensure this tighter bound doesn't cause false positives in the tests.

Run this script to analyze historical test results:

apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (2)

20-74: LGTM! Consistent test scenarios with key verification tests.

The test cases maintain consistency with the key verification rate limiting tests, which is crucial for ensuring uniform behavior across different rate-limiting endpoints.


119-119: Good addition of request count in logs.

Adding the total request count to the logging output will help with debugging and monitoring rate limiting behavior.

apps/api/src/pkg/middleware/init.ts (2)

5-5: LGTM! Updated import for new rate limiter implementation.

Correctly imports the new DurableRateLimiter class.


103-107: Verify the configuration of DurableRateLimiter.

The rate limiter now uses durable objects instead of an agent-based approach. Ensure that:

  1. The DO_RATELIMIT namespace is properly configured in your environment
  2. The cache map is appropriately shared between worker instances

Run this script to check the configuration:

✅ Verification successful

DurableRateLimiter configuration verified successfully

The configuration is correct:

  • DO_RATELIMIT namespace is properly configured in wrangler.toml for all environments
  • Cache map is appropriately shared between worker instances using a persistent module-level Map
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the durable object configuration
# Check if DO_RATELIMIT is properly configured in wrangler.toml
rg -l "DO_RATELIMIT" | grep -E "wrangler.*\.toml"

Length of output: 102


Script:

#!/bin/bash
# Check the content of wrangler files for DO_RATELIMIT configuration
echo "=== Main wrangler.toml ==="
rg -A 5 -B 5 "DO_RATELIMIT" "apps/api/wrangler.toml" || true

echo -e "\n=== Custom wrangler.toml ==="
rg -A 5 -B 5 "DO_RATELIMIT" "apps/api/wrangler.custom.toml" || true

echo -e "\n=== Searching for cache map implementation ==="
ast-grep --pattern 'const $_ = new Map<$_,$_>()' || true
rg "new Map\(\)" -A 2 -B 2 || true

Length of output: 10506

apps/api/src/worker.ts (1)

26-26: LGTM! Exported DurableObjectRatelimiter for Cloudflare Workers.

The export is correctly placed alongside other durable object exports, allowing Cloudflare Workers to manage the rate limiting state.

apps/api/wrangler.custom.toml (1)

31-33: Verify migration sequence for production safety

The migration sequence shows that DurableObjectRatelimiter was previously added (v1), then deleted (v2), and is now being re-added (v3). This back-and-forth migration pattern could cause issues during deployment if not properly coordinated.

Run this script to check the deployment history and current state:

✅ Verification successful

Migration sequence is properly configured

The migration sequence is safe as the DurableObjectRatelimiter class is properly implemented and all necessary bindings are correctly configured across environments.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check deployment history and current state of DurableObjectRatelimiter
# across environments

# Check current deployments
gh api -X GET repos/unkey/unkey/deployments \
  -f environment=production \
  -f per_page=5 | jq '.[].ref'

# Check if DurableObjectRatelimiter is currently in use
rg -A 5 "DurableObjectRatelimiter" 

Length of output: 5759

apps/api/wrangler.toml (2)

Line range hint 274-275: Verify rate limit fallback behavior in production

The production environment has SYNC_RATELIMIT_ON_NO_DATA=1. Since we're reverting to durable objects for rate limiting, verify that this setting is still applicable and won't conflict with the durable object implementation.

Run this script to check the usage of this environment variable:

✅ Verification successful

Rate limit fallback behavior verified and compatible

The SYNC_RATELIMIT_ON_NO_DATA setting is still applicable with durable objects. It's used as a cache optimization mechanism in ratelimit/client.ts and doesn't conflict with the core rate limiting implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check how SYNC_RATELIMIT_ON_NO_DATA is used with durable objects

# Search for usage of the environment variable
rg -A 5 "SYNC_RATELIMIT_ON_NO_DATA"

# Search for related fallback logic
ast-grep --pattern 'if ($env.SYNC_RATELIMIT_ON_NO_DATA) {
  $$$
}'

Length of output: 2237


281-282: Verify durable object initialization across environments

The DO_RATELIMIT binding is added consistently across all environments (development, preview, canary, and production). However, since this is a revert PR, we should verify the initialization and state handling.

Run this script to check for proper initialization handling:

Also applies to: 202-203, 122-123, 42-43, 13-14

✅ Verification successful

DurableObject implementation verified and safe for all environments

The DurableObjectRatelimiter implementation follows all best practices:

  • Proper state initialization and persistence
  • Safe concurrent access handling
  • Request validation and routing
  • Cleanup mechanism through alarms
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for proper initialization and state handling in the DurableObjectRatelimiter implementation

# Look for initialization code
ast-grep --pattern 'class DurableObjectRatelimiter {
  $$$
  constructor($state) {
    $$$
  }
  $$$
}'

# Check for state management
rg -A 10 "class DurableObjectRatelimiter"

Length of output: 1052


Script:

#!/bin/bash
# Check for Memory interface and complete implementation
rg -p "type Memory" apps/api/src/pkg/ratelimit/durable_object.ts
rg -p "interface Memory" apps/api/src/pkg/ratelimit/durable_object.ts

# Look for storage write operations and request handling
rg -p "state\.storage\.put" apps/api/src/pkg/ratelimit/durable_object.ts
rg -A 5 "this\.hono\." apps/api/src/pkg/ratelimit/durable_object.ts

# Check error handling
rg "throw|catch" apps/api/src/pkg/ratelimit/durable_object.ts

Length of output: 714

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (1)

36-73: Well-structured test coverage across different scenarios!

The test cases now cover a good range of real-world scenarios and edge cases. The combination of different windows, limits, and RPS values provides comprehensive coverage.

Consider adding JSDoc comments for each test case group to document:

  • The scenario being tested
  • Expected behavior
  • Why these specific values were chosen
+  /**
+   * Medium window test case
+   * Tests steady traffic pattern with 7.5x the limit
+   * Duration: 30s allows for stable measurements while keeping test time reasonable
+   */
   {
     limit: 200,
     duration: 30000, // 30s window
     rps: 50, // 7.5x the limit
     seconds: 420, // 14 windows
   },
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7da3c67 and 3be1cb0.

📒 Files selected for processing (2)
  • apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts (2 hunks)
  • apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts
⏰ Context from checks skipped due to timeout of 90000ms (17)
  • GitHub Check: Test Packages / Test ./packages/rbac
  • GitHub Check: Test Packages / Test ./packages/nextjs
  • GitHub Check: Test Packages / Test ./packages/hono
  • GitHub Check: Test Packages / Test ./packages/cache
  • GitHub Check: Test Packages / Test ./packages/api
  • GitHub Check: Test Packages / Test ./internal/clickhouse
  • GitHub Check: Test Packages / Test ./internal/resend
  • GitHub Check: Test Packages / Test ./internal/keys
  • GitHub Check: Test Packages / Test ./internal/id
  • GitHub Check: Test Packages / Test ./internal/hash
  • GitHub Check: Test Packages / Test ./internal/encryption
  • GitHub Check: Test Packages / Test ./internal/billing
  • GitHub Check: Build / Build
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (2)

117-119: Verify impact of stricter error margin.

The reduction in upperLimit from 2.5x to 1.5x suggests improved rate limiting accuracy. However, we should verify that this stricter limit doesn't cause test flakiness.

Run this script to analyze the actual vs. expected limits in recent test runs:

✅ Verification successful

Stricter rate limit margins are appropriate

The test assertions verify that request counts fall within reasonable bounds:

  • Lower limit: 95% of expected (5% margin for under-limiting)
  • Upper limit: 150% of expected (50% margin for over-limiting)

This provides sufficient flexibility for normal variations while catching significant deviations. The previous 2.5x upper limit was overly permissive.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for test results to analyze actual vs expected limits
rg -A 2 "requests.*passed.*exactLimit" | \
  awk -F'[{},]' '{
    for(i=1;i<=NF;i++) {
      if($i ~ /passed/) {split($i,p,":");passed=p[2]}
      if($i ~ /exactLimit/) {split($i,e,":");exact=e[2]}
    }
    if(passed && exact) {
      ratio=passed/exact;
      printf "Ratio: %.2f (Passed: %d, Expected: %d)\n", ratio, passed, exact
    }
  }'

Length of output: 342


Script:

#!/bin/bash
# Search for the exact logging pattern and test assertions
echo "=== Logging patterns ==="
rg "console.info.*requests.*exactLimit" -A 2

echo -e "\n=== Test assertions and limits ==="
rg "expect.*toBeGreaterThan|expect.*toBeLessThan" apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts

Length of output: 790


20-34: Consider increasing test duration for short window scenarios.

The very short window test (1s) with 15 windows might not provide enough samples for reliable results. Consider:

  1. Increasing the test duration to at least 30 windows
  2. Adding jitter between requests to simulate real-world conditions
  3. Documenting the expected error margin in the test description

This would help reduce test flakiness and provide more reliable results.

Run this script to check if these test cases have been flaky in recent runs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant