Conversation
|
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
WalkthroughWalkthroughThe changes involve a significant restructuring of the clock and rate limiting functionalities within the application. The Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Agent
participant RateLimiter
participant Clock
User->>Agent: Request
Agent->>RateLimiter: Check Rate Limit
RateLimiter->>Clock: Get Current Time
Clock-->>RateLimiter: Current Time
RateLimiter-->>Agent: Rate Limit Status
Agent-->>User: Response
Possibly related PRs
Suggested reviewers
Recent review detailsConfiguration used: CodeRabbit UI Files selected for processing (2)
Files not reviewed due to no reviewable changes (1)
Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Thank you for following the naming conventions for pull request titles! 🙏 |
There was a problem hiding this comment.
Actionable comments posted: 7
Outside diff range and nitpick comments (3)
apps/agent/services/ratelimit/ratelimit_test.go (1)
70-70: Remove redundant error check forerrAt line 70, there is a redundant
require.NoError(t, err)statement. The errorerrhas not been modified since the previous check at line 68. This extra check is unnecessary.Apply this diff to remove the redundant line:
- require.NoError(t, err)apps/api/src/pkg/ratelimit/client.ts (1)
Line range hint
130-202: Consider using a more efficient cache eviction strategyIn the
setCacheMaxmethod, cache eviction is performed by iterating over entries and deleting those that have expired when the cache size exceedsmaxEntries. This could become inefficient as the cache grows.Consider using a Least Recently Used (LRU) cache or a similar data structure that handles eviction more efficiently. This can improve performance and reduce latency caused by cache maintenance.
Example using an LRU cache:
Implement an LRU cache mechanism or utilize an existing library to manage cache entries based on their usage and expiry.
apps/agent/services/ratelimit/sliding_window.go (1)
113-114: Typo in comment: 'cachelayer' should be 'cache layer'In the comment, "we are reverting this to fixed-window until we can get rid of the cloudflare cachelayer", "cachelayer" should be "cache layer" for clarity.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (9)
- apps/agent/pkg/clock/real_clock.go (1 hunks)
- apps/agent/pkg/clock/test_clock.go (1 hunks)
- apps/agent/services/ratelimit/mitigate.go (2 hunks)
- apps/agent/services/ratelimit/ratelimit_mitigation_test.go (2 hunks)
- apps/agent/services/ratelimit/ratelimit_replication_test.go (1 hunks)
- apps/agent/services/ratelimit/ratelimit_test.go (1 hunks)
- apps/agent/services/ratelimit/service.go (2 hunks)
- apps/agent/services/ratelimit/sliding_window.go (4 hunks)
- apps/api/src/pkg/ratelimit/client.ts (9 hunks)
Additional comments not posted (27)
apps/agent/pkg/clock/real_clock.go (4)
5-6: LGTM!The
RealClockstruct is defined correctly as an empty struct, indicating that it does not maintain any internal state. This aligns with the transition from a mockable clock to a real-time clock implementation.
8-10: LGTM!The
New()function is implemented correctly as a constructor for creating instances ofRealClock. It returns a pointer to a newRealClockinstance without any parameters, which aligns with the AI-generated summary.
12-12: LGTM!The variable declaration
var _ Clock = &RealClock{}is used correctly to ensure thatRealClockimplements theClockinterface at compile-time. This aligns with the AI-generated summary and is a common pattern in Go for interface checks.
14-16: LGTM!The
Now()method on theRealClockstruct is implemented correctly to return the current time usingtime.Now(). This aligns with the AI-generated summary and provides the expected functionality for a real-time clock.apps/agent/pkg/clock/test_clock.go (6)
5-6: LGTM!The
TestClockstruct is well-defined and serves the purpose of mocking time in tests. Thenowfield accurately represents the current time of the test clock.
9-13: LGTM!The
NewTestClockconstructor function is implemented correctly. It properly handles the optionalnowparameter and defaults to the current time when no initial time is provided. The function returns a pointer to the newly createdTestClockinstance, which is the expected behavior.
16-16: LGTM!The interface implementation at line 16 correctly verifies that
TestClocksatisfies theClockinterface. This ensures thatTestClockcan be used wherever aClockis expected.
18-20: LGTM!The
Nowmethod is implemented correctly. It returns the current time of the test clock by returning the value of thenowfield. The method logic is straightforward and has no issues.
23-26: LGTM!The
Tickmethod is implemented correctly. It advances the clock by the given duration, updates thenowfield to reflect the new time, and returns the updated time. The method logic is sound and serves the purpose of simulating the passage of time in tests.
29-31: LGTM!The
Setmethod is implemented correctly. It sets the clock to the given time, updates thenowfield to reflect the new time, and returns the updated time. The method logic is straightforward and serves the purpose of setting the test clock to a specific time for testing scenarios.apps/agent/services/ratelimit/ratelimit_mitigation_test.go (4)
27-27: LGTM!The expanded range of cluster sizes improves the test coverage by including both small and large clusters. This change enhances the robustness of the rate limiting tests.
97-97: Good catch!The modified loop condition fixes an off-by-one error and ensures that the rate limit is saturated with exactly
limitrequests. This change improves the accuracy of the test.
103-103: Excellent fix!The modified assertion correctly checks that the rate limit response is unsuccessful after saturation. This change improves the correctness and reliability of the test by validating the expected rate limiting behavior.
111-115: Nice touch!Correcting the typo in the comment improves the clarity and readability of the code. While it doesn't affect the functionality, it enhances the overall code quality and maintainability.
apps/agent/services/ratelimit/service.go (2)
41-42: LGTM!The addition of the
mitigateCircuitBreakerfield is a good enhancement to handle mitigation requests using a dedicated circuit breaker. This can improve the resilience and fault tolerance of the service when dealing with mitigation requests.It's also good to see that the existing
syncCircuitBreakerfield is retained, ensuring that the circuit breaker functionality for sync requests remains intact.
68-76: LGTM!The initialization and configuration of the
mitigateCircuitBreakerfield look good. The chosen parameters for the circuit breaker seem reasonable for handling mitigation requests:
- The cyclic period of 10 seconds allows for periodic health checks and state adjustments.
- The timeout of 1 minute provides a sufficient window for the service to respond to mitigation requests.
- The maximum requests limit of 100 and the trip threshold of 50 help prevent overload and trigger the open state when necessary.
It's also good to see that the
syncCircuitBreakerinitialization remains unchanged, indicating that its configuration is still valid.apps/agent/services/ratelimit/ratelimit_replication_test.go (1)
Line range hint
27-138: LGTM!The changes to the test function look good:
- The renaming of the function from
TestReplicationtoTestSyncimproves clarity.- The removal of
t.Skip()ensures that the test is executed as part of the test suite, helping catch any regressions in the rate limit synchronization functionality.The test logic remains unchanged and comprehensive, testing the synchronization of rate limit data across multiple nodes in a cluster.
apps/agent/services/ratelimit/mitigate.go (1)
53-60: Good use of circuit breaker to enhance resilienceWrapping the
peer.client.Mitigatecall withs.mitigateCircuitBreaker.Dointroduces a circuit breaker pattern, which enhances the resilience of the system by preventing cascading failures when peers are unresponsive or experiencing errors.apps/agent/services/ratelimit/ratelimit_test.go (1)
149-152: Verify the calculation ofupperlimit in rate limiting testBetween lines 149-152, the calculation of
upperand its use might not align with the intended test logic. The comment mentions:// At most 150% + 75% per additional ingress node should passHowever, the calculation is:
upper := 1.50 + 1.0*float64(len(ingressNodes)-1)Verify whether this formula accurately represents the intended upper limit based on the comment. There may be a discrepancy that could affect the test's validity.
To ensure the calculation aligns with expectations, please double-check the formula and adjust it or the comment accordingly.
apps/api/src/pkg/ratelimit/client.ts (7)
6-6: Import statement is appropriate and necessaryThe addition of the
retryutility is correct and aligns with the implementation of retry logic in the code.
18-18: Cache structure updated appropriatelyThe
cacheproperty now holds entries withresetandcurrentvalues, which simplifies the caching mechanism by removing theblockedstate. This change enhances clarity and maintainability.
24-24: Constructor parameters updated accordinglyThe constructor now accepts the updated
cachestructure, ensuring consistency throughout the class.
58-62: Verify cache update logic to prevent stale dataIn the
setCacheMaxmethod, the cache is updated only whencurrent > cached.current. Ifcurrentis less than or equal tocached.current, the cache remains unchanged. This could potentially lead to stale cache data ifcurrentdecreases over time.Please confirm if this behavior is intentional. If the goal is to always have the most recent
currentvalue in the cache, consider updating the cache regardless of whethercurrentis greater thancached.current:- if (current > cached.current) { + if (current !== cached.current) { this.cache.set(id, { reset, current }); return current; - } + }
168-168: Cache updated after successful agent callUpdating the cache with the latest
currentandresetvalues from the agent ensures consistency in rate limiting decisions.
179-179: Cache updated in asynchronous operationThe cache is updated within the
waitUntilasynchronous context. This ensures that even when operating asynchronously, the cache remains accurate.
202-202: Consistent cache update after local incrementAfter incrementing
cached.currentwithcost, the cache is updated viasetCacheMax. This maintains consistency in the cached values.apps/agent/services/ratelimit/sliding_window.go (1)
283-283: Addition of 'Sequence' field to 'Window' struct looks goodAdding the
Sequencefield to theWindowstruct enhances sequence tracking and aligns with the changes made elsewhere in the code. This update appears appropriate.
| defer bucket.Unlock() | ||
|
|
||
| bucket.windows[req.Window.GetSequence()] = req.Window | ||
| bucket.Unlock() |
There was a problem hiding this comment.
Consider using defer for unlocking to ensure lock is always released
Replacing defer bucket.Unlock() with an explicit bucket.Unlock() may lead to the lock not being released if a panic occurs between the lock and unlock calls. Using defer ensures that the lock is always released, even in the event of an error or panic.
Apply this diff to revert to using defer:
func (s *service) Mitigate(ctx context.Context, req *ratelimitv1.MitigateRequest) (*ratelimitv1.MitigateResponse, error) {
ctx, span := tracing.Start(ctx, "ratelimit.Mitigate")
defer span.End()
s.logger.Info().Interface("req", req).Msg("mitigating")
duration := time.Duration(req.Duration) * time.Millisecond
bucket, _ := s.getBucket(bucketKey{req.Identifier, req.Limit, duration})
bucket.Lock()
+ defer bucket.Unlock()
bucket.windows[req.Window.GetSequence()] = req.Window
- bucket.Unlock()
return &ratelimitv1.MitigateResponse{}, nil
}Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| bucket.Unlock() | |
| func (s *service) Mitigate(ctx context.Context, req *ratelimitv1.MitigateRequest) (*ratelimitv1.MitigateResponse, error) { | |
| ctx, span := tracing.Start(ctx, "ratelimit.Mitigate") | |
| defer span.End() | |
| s.logger.Info().Interface("req", req).Msg("mitigating") | |
| duration := time.Duration(req.Duration) * time.Millisecond | |
| bucket, _ := s.getBucket(bucketKey{req.Identifier, req.Limit, duration}) | |
| bucket.Lock() | |
| defer bucket.Unlock() | |
| bucket.windows[req.Window.GetSequence()] = req.Window | |
| return &ratelimitv1.MitigateResponse{}, nil | |
| } |
| _, err := s.mitigateCircuitBreaker.Do(ctx, func(innerCtx context.Context) (*connect.Response[ratelimitv1.MitigateResponse], error) { | ||
| return peer.client.Mitigate(innerCtx, connect.NewRequest(&ratelimitv1.MitigateRequest{ | ||
| Identifier: req.identifier, | ||
| Limit: req.limit, | ||
| Duration: req.duration.Milliseconds(), | ||
| Window: req.window, | ||
| })) | ||
| }) |
There was a problem hiding this comment.
Consider using a context with timeout to prevent hanging calls
Currently, context.Background() is used without any timeout or cancellation, which may lead to hanging calls if a peer does not respond. Consider using a context with a timeout to ensure that the Mitigate calls to peers do not block indefinitely.
Apply this diff to use a context with timeout:
func (s *service) broadcastMitigation(req mitigateWindowRequest) {
- ctx := context.Background()
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
node, err := s.cluster.FindNode(bucketKey{req.identifier, req.limit, req.duration}.toString())
if err != nil {
s.logger.Err(err).Msg("failed to find node")
return
}Ensure that the context with timeout is used in the circuit breaker call:
for _, peer := range peers {
_, err := s.mitigateCircuitBreaker.Do(ctx, func(innerCtx context.Context) (*connect.Response[ratelimitv1.MitigateResponse], error) {
- return peer.client.Mitigate(innerCtx, connect.NewRequest(&ratelimitv1.MitigateRequest{
+ return peer.client.Mitigate(ctx, connect.NewRequest(&ratelimitv1.MitigateRequest{
Identifier: req.identifier,
Limit: req.limit,
Duration: req.duration.Milliseconds(),
Window: req.window,
}))
})Committable suggestion was skipped due to low confidence.
| // FIXED-WINDOW | ||
| // uncomment | ||
| // currentWindowPercentage := float64(req.Time.UnixMilli()-currentWindow.Start) / float64(req.Duration.Milliseconds()) | ||
| // previousWindowPercentage := 1.0 - currentWindowPercentage | ||
|
|
||
| // Calculate the current count including all leases | ||
| fromPreviousWindow := float64(previousWindow.Counter) * previousWindowPercentage | ||
| fromCurrentWindow := float64(currentWindow.Counter) | ||
| current := int64(math.Ceil(fromCurrentWindow + fromPreviousWindow)) | ||
| // FIXED-WINDOW | ||
| // uncomment | ||
| // fromPreviousWindow := float64(previousWindow.Counter) * previousWindowPercentage | ||
| // fromCurrentWindow := float64(currentWindow.Counter) | ||
|
|
||
| // FIXED-WINDOW | ||
| // replace this with the following line | ||
| // current := int64(math.Ceil(fromCurrentWindow + fromPreviousWindow)) |
There was a problem hiding this comment.
Consider removing commented-out code or using feature flags
The large block of commented-out code annotated with "FIXED-WINDOW" can clutter the codebase and reduce readability. Consider removing it or using a feature flag or configuration to toggle between sliding window and fixed-window implementations, rather than leaving code commented out.
…nd services feat(workflows): add job_test_agent_local.yaml for testing agent locally feat(workflows): add test_agent_local job to pr.yaml for local agent testing
…nnections test(ratelimit): fix loop condition in TestAccuracy_fixed_time to iterate correctly
…ent Integration' feat(workflows): update test job to run on integration tests directory feat(workflows): add environment variables for cluster test and agent base URL
…in job_test_agent_local.yaml
fix(ratelimit_test.go): fix calculation of upper limit in test
…s to include only 1, 3, and 5 nodes
…tion fix(ratelimit_replication_test): correct index to call Ratelimit on correct node
Summary by CodeRabbit
Release Notes
New Features
RealClockimplementation for accurate timekeeping.Improvements
blockedstate.Bug Fixes
Tests