revert: use durable objects for sync ratelimiting#2813
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
1 Skipped Deployment
|
|
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis pull request introduces a comprehensive implementation of a Durable Object-based rate limiting mechanism for the API. The changes span multiple files, including environment configuration, middleware initialization, rate limiter client, and durable object implementation. The new approach replaces the previous agent-based rate limiting with a more robust, Cloudflare Durable Object-native solution that provides more precise and scalable rate limit management across different request scenarios. Changes
Sequence DiagramsequenceDiagram
participant Client
participant RateLimiter
participant DurableObject
participant Storage
Client->>RateLimiter: Request rate limit check
RateLimiter->>RateLimiter: Generate unique request ID
RateLimiter->>DurableObject: Check limit via HTTP request
DurableObject->>Storage: Retrieve current state
DurableObject->>DurableObject: Validate request against limit
alt Request within limit
DurableObject->>Storage: Update state
DurableObject-->>RateLimiter: Allow request
RateLimiter-->>Client: Process request
else Request exceeds limit
DurableObject-->>RateLimiter: Reject request
RateLimiter-->>Client: Rate limit exceeded
end
Possibly related PRs
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Thank you for following the naming conventions for pull request titles! 🙏 |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
apps/api/src/pkg/ratelimit/do_client.ts (2)
145-148: Usethis.logger.errorinstead ofconsole.errorfor error loggingTo maintain consistent logging practices and utilize the provided logger, replace
console.errorwiththis.logger.error.Apply this diff:
- console.error(res.err.message); + this.logger.error(res.err.message);
231-236: Handle potential undefinederr.causewhen logging errorsWhen logging errors in the
catchblock,err.causemay beundefined. Consider checking iferr.causeexists before including it in the logged data to avoid loggingundefinedvalues.Apply this diff:
this.logger.error("ratelimit failed", { identifier: req.identifier, error: err.message, stack: err.stack, - cause: err.cause, + ...(err.cause && { cause: err.cause }), });apps/api/src/pkg/env.ts (1)
21-22: Enhance validation forDO_RATELIMITandDO_USAGELIMITenvironment variablesThe current validation for
DO_RATELIMITandDO_USAGELIMITchecks only if the value is of typeobject, which may be insufficient to ensure that the environment variables are correctly configured. Consider improving the validation to more accurately reflect the structure of aDurableObjectNamespace.Additionally, the inline comment
// pretty loose check but it'll do I thinkis informal. Consider removing or rephrasing it for clarity and professionalism.Apply this diff:
- DO_RATELIMIT: z.custom<DurableObjectNamespace>((ns) => typeof ns === "object"), // pretty loose check but it'll do I think + DO_RATELIMIT: z.custom<DurableObjectNamespace>((ns) => ns instanceof DurableObjectNamespace), - DO_USAGELIMIT: z.custom<DurableObjectNamespace>((ns) => typeof ns === "object"), + DO_USAGELIMIT: z.custom<DurableObjectNamespace>((ns) => ns instanceof DurableObjectNamespace),apps/api/src/pkg/ratelimit/durable_object.ts (1)
69-71: Resetthis.memoryafter clearing storage inalarmmethodAfter deleting all data from storage in the
alarmmethod,this.memorystill holds the previous values in memory. Resettingthis.memoryensures that the in-memory state remains consistent with the cleared storage.Apply this diff:
public async alarm(): Promise<void> { await this.state.storage.deleteAll(); + this.memory = { current: 0 }; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
apps/api/src/pkg/env.ts(1 hunks)apps/api/src/pkg/middleware/init.ts(2 hunks)apps/api/src/pkg/ratelimit/do_client.ts(1 hunks)apps/api/src/pkg/ratelimit/durable_object.ts(1 hunks)apps/api/src/pkg/ratelimit/index.ts(1 hunks)apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts(2 hunks)apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts(2 hunks)apps/api/src/worker.ts(1 hunks)apps/api/wrangler.custom.toml(3 hunks)apps/api/wrangler.toml(9 hunks)packages/api/package.json(1 hunks)packages/hono/package.json(1 hunks)packages/nextjs/package.json(1 hunks)packages/ratelimit/package.json(1 hunks)
✅ Files skipped from review due to trivial changes (5)
- packages/api/package.json
- packages/hono/package.json
- packages/ratelimit/package.json
- apps/api/src/pkg/ratelimit/index.ts
- packages/nextjs/package.json
⏰ Context from checks skipped due to timeout of 90000ms (12)
- GitHub Check: Test Packages / Test ./packages/nextjs
- GitHub Check: Test Packages / Test ./packages/hono
- GitHub Check: Test Packages / Test ./packages/cache
- GitHub Check: Test Packages / Test ./packages/api
- GitHub Check: Test Packages / Test ./internal/clickhouse
- GitHub Check: Test Packages / Test ./internal/resend
- GitHub Check: Test Agent Local / test_agent_local
- GitHub Check: Test Packages / Test ./internal/billing
- GitHub Check: Test API / API Test Local
- GitHub Check: Build / Build
- GitHub Check: autofix
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (10)
apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts (2)
20-74: Comprehensive test coverage for rate limiting scenarios.The test cases effectively cover various critical scenarios:
- Very short windows with high throughput
- Burst traffic patterns
- Steady-state traffic
- Edge cases (tiny and high limits)
- Real-world API scenarios
115-115: Verify the impact of tighter rate limiting bounds.The upperLimit multiplier has been reduced from 2.5x to 1.5x, making the rate limiting more strict. Ensure this tighter bound doesn't cause false positives in the tests.
Run this script to analyze historical test results:
apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (2)
20-74: LGTM! Consistent test scenarios with key verification tests.The test cases maintain consistency with the key verification rate limiting tests, which is crucial for ensuring uniform behavior across different rate-limiting endpoints.
119-119: Good addition of request count in logs.Adding the total request count to the logging output will help with debugging and monitoring rate limiting behavior.
apps/api/src/pkg/middleware/init.ts (2)
5-5: LGTM! Updated import for new rate limiter implementation.Correctly imports the new DurableRateLimiter class.
103-107: Verify the configuration of DurableRateLimiter.The rate limiter now uses durable objects instead of an agent-based approach. Ensure that:
- The DO_RATELIMIT namespace is properly configured in your environment
- The cache map is appropriately shared between worker instances
Run this script to check the configuration:
✅ Verification successful
DurableRateLimiter configuration verified successfully
The configuration is correct:
- DO_RATELIMIT namespace is properly configured in wrangler.toml for all environments
- Cache map is appropriately shared between worker instances using a persistent module-level Map
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the durable object configuration # Check if DO_RATELIMIT is properly configured in wrangler.toml rg -l "DO_RATELIMIT" | grep -E "wrangler.*\.toml"Length of output: 102
Script:
#!/bin/bash # Check the content of wrangler files for DO_RATELIMIT configuration echo "=== Main wrangler.toml ===" rg -A 5 -B 5 "DO_RATELIMIT" "apps/api/wrangler.toml" || true echo -e "\n=== Custom wrangler.toml ===" rg -A 5 -B 5 "DO_RATELIMIT" "apps/api/wrangler.custom.toml" || true echo -e "\n=== Searching for cache map implementation ===" ast-grep --pattern 'const $_ = new Map<$_,$_>()' || true rg "new Map\(\)" -A 2 -B 2 || trueLength of output: 10506
apps/api/src/worker.ts (1)
26-26: LGTM! Exported DurableObjectRatelimiter for Cloudflare Workers.The export is correctly placed alongside other durable object exports, allowing Cloudflare Workers to manage the rate limiting state.
apps/api/wrangler.custom.toml (1)
31-33: Verify migration sequence for production safetyThe migration sequence shows that
DurableObjectRatelimiterwas previously added (v1), then deleted (v2), and is now being re-added (v3). This back-and-forth migration pattern could cause issues during deployment if not properly coordinated.Run this script to check the deployment history and current state:
✅ Verification successful
Migration sequence is properly configured
The migration sequence is safe as the
DurableObjectRatelimiterclass is properly implemented and all necessary bindings are correctly configured across environments.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check deployment history and current state of DurableObjectRatelimiter # across environments # Check current deployments gh api -X GET repos/unkey/unkey/deployments \ -f environment=production \ -f per_page=5 | jq '.[].ref' # Check if DurableObjectRatelimiter is currently in use rg -A 5 "DurableObjectRatelimiter"Length of output: 5759
apps/api/wrangler.toml (2)
Line range hint
274-275: Verify rate limit fallback behavior in productionThe production environment has
SYNC_RATELIMIT_ON_NO_DATA=1. Since we're reverting to durable objects for rate limiting, verify that this setting is still applicable and won't conflict with the durable object implementation.Run this script to check the usage of this environment variable:
✅ Verification successful
Rate limit fallback behavior verified and compatible
The
SYNC_RATELIMIT_ON_NO_DATAsetting is still applicable with durable objects. It's used as a cache optimization mechanism inratelimit/client.tsand doesn't conflict with the core rate limiting implementation.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check how SYNC_RATELIMIT_ON_NO_DATA is used with durable objects # Search for usage of the environment variable rg -A 5 "SYNC_RATELIMIT_ON_NO_DATA" # Search for related fallback logic ast-grep --pattern 'if ($env.SYNC_RATELIMIT_ON_NO_DATA) { $$$ }'Length of output: 2237
281-282: Verify durable object initialization across environmentsThe
DO_RATELIMITbinding is added consistently across all environments (development, preview, canary, and production). However, since this is a revert PR, we should verify the initialization and state handling.Run this script to check for proper initialization handling:
Also applies to: 202-203, 122-123, 42-43, 13-14
✅ Verification successful
DurableObject implementation verified and safe for all environments
The DurableObjectRatelimiter implementation follows all best practices:
- Proper state initialization and persistence
- Safe concurrent access handling
- Request validation and routing
- Cleanup mechanism through alarms
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for proper initialization and state handling in the DurableObjectRatelimiter implementation # Look for initialization code ast-grep --pattern 'class DurableObjectRatelimiter { $$$ constructor($state) { $$$ } $$$ }' # Check for state management rg -A 10 "class DurableObjectRatelimiter"Length of output: 1052
Script:
#!/bin/bash # Check for Memory interface and complete implementation rg -p "type Memory" apps/api/src/pkg/ratelimit/durable_object.ts rg -p "interface Memory" apps/api/src/pkg/ratelimit/durable_object.ts # Look for storage write operations and request handling rg -p "state\.storage\.put" apps/api/src/pkg/ratelimit/durable_object.ts rg -A 5 "this\.hono\." apps/api/src/pkg/ratelimit/durable_object.ts # Check error handling rg "throw|catch" apps/api/src/pkg/ratelimit/durable_object.tsLength of output: 714
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (1)
36-73: Well-structured test coverage across different scenarios!The test cases now cover a good range of real-world scenarios and edge cases. The combination of different windows, limits, and RPS values provides comprehensive coverage.
Consider adding JSDoc comments for each test case group to document:
- The scenario being tested
- Expected behavior
- Why these specific values were chosen
+ /** + * Medium window test case + * Tests steady traffic pattern with 7.5x the limit + * Duration: 30s allows for stable measurements while keeping test time reasonable + */ { limit: 200, duration: 30000, // 30s window rps: 50, // 7.5x the limit seconds: 420, // 14 windows },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts(2 hunks)apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- apps/api/src/routes/v1_keys_verifyKey.ratelimit_accuracy.test.ts
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: Test Packages / Test ./packages/rbac
- GitHub Check: Test Packages / Test ./packages/nextjs
- GitHub Check: Test Packages / Test ./packages/hono
- GitHub Check: Test Packages / Test ./packages/cache
- GitHub Check: Test Packages / Test ./packages/api
- GitHub Check: Test Packages / Test ./internal/clickhouse
- GitHub Check: Test Packages / Test ./internal/resend
- GitHub Check: Test Packages / Test ./internal/keys
- GitHub Check: Test Packages / Test ./internal/id
- GitHub Check: Test Packages / Test ./internal/hash
- GitHub Check: Test Packages / Test ./internal/encryption
- GitHub Check: Test Packages / Test ./internal/billing
- GitHub Check: Build / Build
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Agent Local / test_agent_local
- GitHub Check: autofix
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
apps/api/src/routes/v1_ratelimits_limit.accuracy.test.ts (2)
117-119: Verify impact of stricter error margin.The reduction in
upperLimitfrom 2.5x to 1.5x suggests improved rate limiting accuracy. However, we should verify that this stricter limit doesn't cause test flakiness.Run this script to analyze the actual vs. expected limits in recent test runs:
✅ Verification successful
Stricter rate limit margins are appropriate
The test assertions verify that request counts fall within reasonable bounds:
- Lower limit: 95% of expected (5% margin for under-limiting)
- Upper limit: 150% of expected (50% margin for over-limiting)
This provides sufficient flexibility for normal variations while catching significant deviations. The previous 2.5x upper limit was overly permissive.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for test results to analyze actual vs expected limits rg -A 2 "requests.*passed.*exactLimit" | \ awk -F'[{},]' '{ for(i=1;i<=NF;i++) { if($i ~ /passed/) {split($i,p,":");passed=p[2]} if($i ~ /exactLimit/) {split($i,e,":");exact=e[2]} } if(passed && exact) { ratio=passed/exact; printf "Ratio: %.2f (Passed: %d, Expected: %d)\n", ratio, passed, exact } }'Length of output: 342
Script:
#!/bin/bash # Search for the exact logging pattern and test assertions echo "=== Logging patterns ===" rg "console.info.*requests.*exactLimit" -A 2 echo -e "\n=== Test assertions and limits ===" rg "expect.*toBeGreaterThan|expect.*toBeLessThan" apps/api/src/routes/v1_ratelimits_limit.accuracy.test.tsLength of output: 790
20-34: Consider increasing test duration for short window scenarios.The very short window test (1s) with 15 windows might not provide enough samples for reliable results. Consider:
- Increasing the test duration to at least 30 windows
- Adding jitter between requests to simulate real-world conditions
- Documenting the expected error margin in the test description
This would help reduce test flakiness and provide more reliable results.
Run this script to check if these test cases have been flaky in recent runs:
Summary by CodeRabbit
Release Notes
New Features
Improvements
Technical Updates