Skip to content

ratelimit clustering#2986

Merged
chronark merged 5 commits intomainfrom
ratelimit-clustering
Mar 18, 2025
Merged

ratelimit clustering#2986
chronark merged 5 commits intomainfrom
ratelimit-clustering

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Mar 18, 2025

  • ci: release agent
  • perf: add permission and key caches
  • test: run clustered ratelimit tests

Summary by CodeRabbit

  • New Features

    • Introduced enhanced observability with integrated telemetry and improved RPC functionality.
  • Refactor

    • Streamlined version extraction in CI workflows.
    • Standardized cluster identifiers by renaming “node” to “instance” for clearer semantics.
    • Enhanced caching, tracing, and middleware for more robust performance monitoring.
  • Tests

    • Expanded integration tests, including multi-node rate limiting and a new API testing harness.
  • Chores

    • Revised task definitions and utility configurations for improved consistency.

@vercel
Copy link

vercel bot commented Mar 18, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 18, 2025 4:00pm
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 18, 2025 4:00pm
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 18, 2025 4:00pm
1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
dashboard ⬜️ Ignored (Inspect) Visit Preview Mar 18, 2025 4:00pm

@changeset-bot
Copy link

changeset-bot bot commented Mar 18, 2025

⚠️ No Changeset found

Latest commit: c1ff76a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 18, 2025

📝 Walkthrough

Walkthrough

This pull request introduces multiple updates spanning workflow enhancements, service observability, and refactoring. Version extraction in the agent build workflow is simplified using environment variables. A new OpenTelemetry service is added to the Docker Compose configuration, and tasks in the Taskfile have been updated accordingly. Field and variable renaming from “node” to “instance” are applied across API tests, internal services, caching, cluster management, and membership. New integration components (a test harness, HTTP utilities, and a rate-limited attack function) improve testing, while enhanced tracing, caching, and a new RPC server are introduced.

Changes

File(s) Summary of Changes
.github/workflows/agent_build_publish.yaml Modified version extraction: now directly removes refs/tags/agent/ from GITHUB_REF instead of using a git command.
deployment/docker-compose.yaml Introduced a new otel service (with specified ports and image), added dependencies and environment variables in apiv2 for OpenTelemetry.
go/Taskfile.yml Renamed test-e2e to test-integration; replaced SIMULATON_TEST with INTEGRATION_TEST; added new env vars and a dependency/task for otel.
go/apps/api/cancel_test.go, go/apps/api/config.go, go/apps/api/run_test.go Renamed field from ClusterNodeID to ClusterInstanceID and updated UID generation to use uid.New(uid.InstancePrefix).
go/apps/api/integration/harness.go, go/apps/api/integration/http.go, go/apps/api/integration/multi_node_ratelimiting/accuracy_test.go Added new test harness and HTTP client utilities for API cluster tests; introduced multi-node rate limiting accuracy tests.
go/apps/api/routes/v2_ratelimit_*/*.go (200/400/403/404 tests) Updated access to WorkspaceID: changed from direct property access (h.Resources.UserWorkspace.ID) to method call (h.Resources().UserWorkspace.ID).
go/apps/api/run.go Renamed nodeID to instanceID; integrated new RPC service (with logging, ratelimit service) and instantiated a permissions service.
go/internal/services/keys/*, go/internal/services/permissions/*, go/internal/services/ratelimit/* Introduced caching and tracing in key verification, permission checking, and ratelimit functions; updated method signatures to include context and added tracing spans.
go/pkg/assert/all.go, go/pkg/attack/attack.go, go/pkg/cache/* Renamed assertion helper from Multi to All; added a new rate-limited attack function; refactored cache methods to set entries directly and use an op function for error handling.
go/pkg/cluster/*, go/pkg/membership/*, go/pkg/uid/uid.go, go/schema.json Renamed “node” references to “instance” consistently across cluster, membership, discovery, and UID packages (including schema property update).
go/pkg/rpc/rpc.go Introduced a new RPC server package featuring a Server struct, configuration settings, and Listen/Shutdown methods with tracing and error wrapping.
go/pkg/testutil/* Renamed package from testflags to testutil; refactored HTTP test harness to use a seeder; added a new seeding utility (seed.go) to initialize test resources.
go/pkg/zen/* Updated OpenAPI middleware and server configuration: validation methods now accept a context and NodeID renamed to InstanceID.
go/proto/ratelimit/v1/service.proto, tools/migrate/migrate_subscription.ts Minor changes: added a blank line in proto message for clarity and updated a workspace ID value in the migration script.

Sequence Diagram(s)

sequenceDiagram
    participant C as Client
    participant RPC as RPC Server
    participant RLS as Rate Limit Service
    C->>RPC: Send RPC Request
    RPC->>RPC: Start tracing span
    alt Rate Limit Service provided
      RPC->>RLS: Forward request for rate limiting
      RLS-->>RPC: Return rate limit decision
    else No Rate Limit Service
      RPC->>RPC: Process request normally
    end
    RPC-->>C: Return RPC Response
Loading
sequenceDiagram
    participant T as Test Suite
    participant A as Attack Function
    participant W as Worker Goroutines
    T->>A: Invoke Attack(rate, duration, fn)
    A->>W: Spawn 256 workers driven by tick channel
    loop For each tick
      W->>A: Execute provided function & send response
    end
    A-->>T: Aggregate and return responses
Loading

Suggested reviewers

  • mcstepp
  • ogzhanolguncu
  • perkinsjr
  • chronark

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 golangci-lint (1.62.2)

Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.0)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24.0)

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 18, 2025

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.
Here is an example:

<type>[optional scope]: <description>
fix: I fixed something for Unkey

Details:

No release type found in pull request title "ratelimit clustering". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

@vercel vercel bot temporarily deployed to Preview – engineering March 18, 2025 15:49 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard March 18, 2025 15:51 Inactive
@vercel vercel bot temporarily deployed to Preview – play March 18, 2025 15:55 Inactive
@vercel vercel bot temporarily deployed to Preview – www March 18, 2025 15:55 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering March 18, 2025 16:00 Inactive
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (24)
go/internal/services/keys/service.go (1)

28-40: Implemented keyCache initialization with proper error handling

The cache is configured with appropriate TTL values and wrapped with tracing middleware. The error handling is properly implemented.

However, the resource name is set to "permissions" while this is a key cache. This could lead to confusion in tracing and monitoring.

-		Resource: "permissions",
+		Resource: "keys",
go/pkg/discovery/redis.go (1)

118-120: Consider updating parameter name for consistency

While you've renamed the field from nodeID to instanceID, the parameter name in the key method still uses nodeID. Consider updating this for complete consistency with the terminology change.

-func (r *Redis) key(nodeID string) string {
+func (r *Redis) key(instanceID string) string {
-	return fmt.Sprintf("discovery::nodes::%s", nodeID)
+	return fmt.Sprintf("discovery::nodes::%s", instanceID)
go/apps/api/routes/v2_ratelimit_limit/handler.go (2)

54-60: Span usage for namespace query is appropriate.
Consider adding contextual tags (e.g., namespaceName, workspaceID) to the span attributes for better trace visibility and diagnosing potential query issues.


102-108: Span usage for override matches is appropriate.
Similar to the previous span, consider adding key attributes (e.g., namespaceID, identifier) to improve observability and streamline troubleshooting.

go/pkg/db/replica.go (4)

16-17: New mode field for Replica is informative.
Using a separate field to distinguish read-only vs. read-write replicas helps maintain clarity. Consider documenting possible string values for mode (e.g., "ro", "rw") to guide future maintainers.


26-33: Tracing in ExecContext is well-integrated.
All relevant attributes (query, mode) are set. If errors occur, consider adding an error event or status to the span to improve debugging.


48-56: Tracing for QueryContext is correctly implemented.
Reiterating the best practice of adding an error event to the span in case of query failures would help pinpoint issues faster.


72-78: Span creation in Begin is succinct.
The additional mode attribute is useful. Consider capturing the transaction details or status if any transaction-level errors arise later.

go/internal/services/keys/verify.go (1)

26-41: SWR-based caching logic is implemented correctly.
Great use of a fallback function to classify errors into cache write or null writes. In case of persistent or intermittent DB issues, consider logging or adding span events to track the frequency of DB fallback lookups.

go/pkg/zen/middleware_openapi_validation.go (1)

28-31: Clarify naming for the returned struct
You're assigning the validation result to an err variable, but it's actually an openapi.BadRequestError struct, which may cause confusion at a glance. Consider renaming it to be more descriptive.

- err, valid := validator.Validate(ctx, s.r)
+ badReqErr, valid := validator.Validate(ctx, s.r)

- err.RequestId = s.requestID
+ badReqErr.RequestId = s.requestID
go/pkg/cache/middleware/tracing.go (1)

90-90: Switching from %+v to %v reduces detail in logs, but remains acceptable for key tracing.

Consider reverting to %+v if deeper debugging information is needed.

go/apps/api/integration/multi_node_ratelimiting/accuracy_test.go (1)

20-127: Comprehensive multi-node rate limit test scenario.
This test thoroughly exercises a variety of load factors, limits, and durations, ensuring robust coverage. Some recommendations:

  1. Consider shorter test durations (e.g., less than one minute) to reduce test execution time for frequent CI runs.
  2. Evaluate adding edge cases (e.g., zero-limit or extremely large limits).
go/internal/services/ratelimit/service.go (1)

104-119: Remote origin check introduced via askOrigin.
Replacing the direct local rate limit logic with an askOrigin call is a design shift. Ensure latency and network error handling are appropriate for production usage. Also consider removing commented-out code if no longer needed.

go/apps/api/integration/http.go (2)

14-19: Consider documenting response fields.

While this TestResponse type is straightforward, adding brief doc comments for each field may help new contributors understand the expected structure and usage in tests.


57-63: Consider seeding or controlling the random selection of nodes.

rand.IntN is used to pick a node at random. If test reproducibility is desired, set a deterministic seed for the random generator. Otherwise, the presence of nondeterministic randomness might complicate debugging test failures.

go/pkg/attack/attack.go (2)

10-13: Validate rate input.

Consider validating Freq and Per to prevent pathological cases (like zero or negative durations). Rejecting invalid inputs early can prevent potential panics or logical errors.


19-23: Clarify concurrency constraints in doc comment.

You require the consumer to process results promptly. Document in the function comment that stalling the consumer can cause indefinite blocking in worker goroutines.

go/apps/api/run.go (1)

139-145: Avoid panicking on listen errors.

While calling panic on listenErr ensures quick feedback in tests or dev, consider more graceful error handling or returning the error. This helps maintain server responsiveness in production.

-	if listenErr != nil {
-		panic(listenErr)
-	}
+	if listenErr != nil {
+		logger.Error("Failed to listen on RPC service", "error", listenErr)
+		// Possibly trigger an orderly shutdown
+	}
go/pkg/testutil/seed/seed.go (1)

85-135: Potential id collision in the key creation.

The code uses the same prefix (test_root_key) for both the key value and its ID in the database, which could lead to confusion or collision.

Consider using different prefixes to clearly distinguish between the two:

-	key := uid.New("test_root_key")
+	key := uid.New("test_key")

-		ID:                uid.New("test_root_key"),
+		ID:                uid.New("test_key_id"),
go/pkg/testutil/http.go (1)

185-189: Consider adding a test helper with proper error context.

The UnmarshalBody function could be enhanced to provide better error context when unmarshalling fails.

func UnmarshalBody[Body any](t *testing.T, r *httptest.ResponseRecorder, body *Body) {
-	err := json.Unmarshal(r.Body.Bytes(), &body)
-	require.NoError(t, err)
+	responseData := r.Body.Bytes()
+	err := json.Unmarshal(responseData, &body)
+	require.NoErrorf(t, err, "Failed to unmarshal response body: %s", string(responseData))
}
go/internal/services/ratelimit/replay.go (2)

18-18: Consider passing the parent context rather than context.Background().

Using context.Background() here discards any cancellation or deadlines from upstream. Forwarding the existing ctx or a derived context can help maintain proper request lifecycle management.

- s.replayToOrigin(context.Background(), req)
+ s.replayToOrigin(ctx, req)

24-24: Leverage the incoming context for tracing instead of context.Background().

Within replayToOrigin, a new context.Background() is passed to tracing.Start, preventing any upstream context cancellation or deadlines from propagating. Consider using the provided ctx to ensure consistent trace propagation.

- ctx, span := tracing.Start(context.Background(), "replayToOrigin")
+ ctx, span := tracing.Start(ctx, "replayToOrigin")
go/pkg/cache/cache.go (2)

170-179: Consider utilizing ctx in SetNull to allow for logging or trace instrumentation.

Currently, _ context.Context discards all contextual information. Leveraging this context can enable logging or other instrumentation if needed in the future.

-func (c *cache[K, V]) SetNull(_ context.Context, key K) {
+func (c *cache[K, V]) SetNull(ctx context.Context, key K) {
   ...
}

182-191: Consider utilizing ctx in Set to enable potential logging or tracing logic.

Similar to SetNull, ignoring the ctx parameter forfeits potential instrumentation or cancellation flow.

-func (c *cache[K, V]) Set(_ context.Context, key K, value V) {
+func (c *cache[K, V]) Set(ctx context.Context, key K, value V) {
   ...
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2b4258 and c1ff76a.

📒 Files selected for processing (71)
  • .github/workflows/agent_build_publish.yaml (1 hunks)
  • deployment/docker-compose.yaml (3 hunks)
  • go/Taskfile.yml (1 hunks)
  • go/apps/api/cancel_test.go (1 hunks)
  • go/apps/api/config.go (2 hunks)
  • go/apps/api/integration/harness.go (1 hunks)
  • go/apps/api/integration/http.go (1 hunks)
  • go/apps/api/integration/multi_node_ratelimiting/accuracy_test.go (1 hunks)
  • go/apps/api/routes/v2_ratelimit_delete_override/200_test.go (4 hunks)
  • go/apps/api/routes/v2_ratelimit_delete_override/400_test.go (1 hunks)
  • go/apps/api/routes/v2_ratelimit_delete_override/403_test.go (3 hunks)
  • go/apps/api/routes/v2_ratelimit_delete_override/404_test.go (2 hunks)
  • go/apps/api/routes/v2_ratelimit_get_override/200_test.go (3 hunks)
  • go/apps/api/routes/v2_ratelimit_get_override/400_test.go (1 hunks)
  • go/apps/api/routes/v2_ratelimit_get_override/403_test.go (2 hunks)
  • go/apps/api/routes/v2_ratelimit_get_override/404_test.go (2 hunks)
  • go/apps/api/routes/v2_ratelimit_limit/200_test.go (4 hunks)
  • go/apps/api/routes/v2_ratelimit_limit/400_test.go (2 hunks)
  • go/apps/api/routes/v2_ratelimit_limit/403_test.go (1 hunks)
  • go/apps/api/routes/v2_ratelimit_limit/404_test.go (2 hunks)
  • go/apps/api/routes/v2_ratelimit_limit/accuracy_test.go (2 hunks)
  • go/apps/api/routes/v2_ratelimit_limit/handler.go (3 hunks)
  • go/apps/api/routes/v2_ratelimit_set_override/200_test.go (5 hunks)
  • go/apps/api/routes/v2_ratelimit_set_override/400_test.go (1 hunks)
  • go/apps/api/routes/v2_ratelimit_set_override/403_test.go (1 hunks)
  • go/apps/api/routes/v2_ratelimit_set_override/404_test.go (1 hunks)
  • go/apps/api/run.go (9 hunks)
  • go/apps/api/run_test.go (3 hunks)
  • go/internal/services/keys/service.go (1 hunks)
  • go/internal/services/keys/verify.go (2 hunks)
  • go/internal/services/permissions/check.go (1 hunks)
  • go/internal/services/permissions/service.go (2 hunks)
  • go/internal/services/ratelimit/bucket.go (3 hunks)
  • go/internal/services/ratelimit/interface.go (2 hunks)
  • go/internal/services/ratelimit/peers.go (4 hunks)
  • go/internal/services/ratelimit/replay.go (3 hunks)
  • go/internal/services/ratelimit/service.go (8 hunks)
  • go/pkg/assert/all.go (1 hunks)
  • go/pkg/attack/attack.go (1 hunks)
  • go/pkg/cache/cache.go (6 hunks)
  • go/pkg/cache/interface.go (3 hunks)
  • go/pkg/cache/middleware/tracing.go (1 hunks)
  • go/pkg/cache/noop.go (1 hunks)
  • go/pkg/cluster/cluster.go (9 hunks)
  • go/pkg/cluster/cluster_test.go (6 hunks)
  • go/pkg/cluster/interface.go (1 hunks)
  • go/pkg/cluster/noop.go (3 hunks)
  • go/pkg/db/database.go (2 hunks)
  • go/pkg/db/replica.go (2 hunks)
  • go/pkg/discovery/doc.go (1 hunks)
  • go/pkg/discovery/redis.go (5 hunks)
  • go/pkg/membership/member.go (3 hunks)
  • go/pkg/membership/membership_test.go (16 hunks)
  • go/pkg/membership/serf.go (4 hunks)
  • go/pkg/otel/grafana.go (2 hunks)
  • go/pkg/otel/metrics/metrics.go (2 hunks)
  • go/pkg/otel/tracing/trace.go (2 hunks)
  • go/pkg/ring/ring.go (5 hunks)
  • go/pkg/ring/ring_test.go (1 hunks)
  • go/pkg/rpc/rpc.go (1 hunks)
  • go/pkg/testutil/flags.go (1 hunks)
  • go/pkg/testutil/http.go (7 hunks)
  • go/pkg/testutil/seed/seed.go (1 hunks)
  • go/pkg/uid/doc.go (2 hunks)
  • go/pkg/uid/uid.go (1 hunks)
  • go/pkg/zen/middleware_openapi_validation.go (1 hunks)
  • go/pkg/zen/server.go (4 hunks)
  • go/pkg/zen/validation/validator.go (2 hunks)
  • go/proto/ratelimit/v1/service.proto (1 hunks)
  • go/schema.json (1 hunks)
  • tools/migrate/migrate_subscription.ts (1 hunks)
🧰 Additional context used
🧬 Code Definitions (21)
go/pkg/cache/noop.go (2)
go/pkg/cache/cache.go (11) (11)
  • c (98:144)
  • c (146:168)
  • c (170:180)
  • c (182:191)
  • c (193:203)
  • c (205:209)
  • c (211:226)
  • c (228:245)
  • c (247:249)
  • c (251:282)
  • c (284:337)
go/pkg/cache/interface.go (1) (1)
  • Op (50:50)
go/pkg/db/database.go (1)
go/pkg/db/replica.go (1) (1)
  • Replica (15:18)
go/internal/services/keys/verify.go (7)
go/internal/services/keys/service.go (1) (1)
  • service (19:24)
go/internal/services/keys/interface.go (1) (1)
  • VerifyResponse (14:17)
go/pkg/otel/tracing/trace.go (1) (1)
  • Start (59:62)
go/pkg/assert/assert.go (1) (1)
  • NotEmpty (138:147)
go/pkg/hash/sha256.go (1) (1)
  • Sha256 (28:33)
go/pkg/cache/interface.go (5) (5)
  • Key (33:35)
  • Op (50:50)
  • WriteValue (56:56)
  • WriteNull (59:59)
  • Noop (54:54)
go/pkg/cache/cache.go (1) (1)
  • cache (20:34)
go/internal/services/ratelimit/interface.go (1)
go/gen/proto/ratelimit/v1/service.pb.go (3) (3)
  • Window (212:234)
  • Window (247:247)
  • Window (262:264)
go/apps/api/routes/v2_ratelimit_delete_override/404_test.go (3)
go/pkg/testutil/http.go (3) (3)
  • h (122:131)
  • h (134:136)
  • h (138:140)
go/apps/api/integration/harness.go (4) (4)
  • h (97:99)
  • h (102:159)
  • h (162:164)
  • h (167:172)
go/pkg/testutil/seed/seed.go (1) (1)
  • Resources (16:20)
go/apps/api/routes/v2_ratelimit_set_override/403_test.go (3)
go/pkg/testutil/http.go (3) (3)
  • h (122:131)
  • h (134:136)
  • h (138:140)
go/apps/api/integration/harness.go (4) (4)
  • h (97:99)
  • h (102:159)
  • h (162:164)
  • h (167:172)
go/pkg/testutil/seed/seed.go (1) (1)
  • Resources (16:20)
go/pkg/db/replica.go (1)
go/pkg/otel/tracing/trace.go (1) (1)
  • Start (59:62)
go/apps/api/cancel_test.go (4)
go/pkg/cluster/cluster.go (1) (1)
  • New (35:70)
go/pkg/membership/serf.go (1) (1)
  • New (48:101)
go/apps/api/integration/harness.go (1) (1)
  • New (48:95)
go/pkg/uid/uid.go (1) (1)
  • InstancePrefix (34:34)
go/internal/services/keys/service.go (5)
go/pkg/otel/logging/interface.go (1) (1)
  • Logger (11:116)
go/pkg/cache/interface.go (1) (1)
  • Cache (7:31)
go/pkg/cache/cache.go (2) (2)
  • New (58:96)
  • Config (36:53)
go/pkg/db/database.go (2) (2)
  • New (39:84)
  • Config (16:26)
go/pkg/cache/middleware/tracing.go (1) (1)
  • WithTracing (16:18)
go/internal/services/permissions/service.go (4)
go/pkg/cache/cache.go (14) (14)
  • cache (20:34)
  • New (58:96)
  • Config (36:53)
  • c (98:144)
  • c (146:168)
  • c (170:180)
  • c (182:191)
  • c (193:203)
  • c (205:209)
  • c (211:226)
  • c (228:245)
  • c (247:249)
  • c (251:282)
  • c (284:337)
go/pkg/cache/interface.go (1) (1)
  • Cache (7:31)
go/internal/services/keys/service.go (3) (3)
  • service (19:24)
  • New (26:46)
  • Config (13:17)
go/pkg/cache/middleware/tracing.go (1) (1)
  • WithTracing (16:18)
go/pkg/testutil/seed/seed.go (3)
go/pkg/db/database.go (2) (2)
  • New (39:84)
  • database (30:34)
go/pkg/uid/uid.go (1) (1)
  • New (63:83)
go/pkg/db/replica.go (1) (1)
  • _ (21:21)
go/apps/api/run.go (11)
go/pkg/discovery/redis.go (1) (1)
  • RedisConfig (36:49)
go/pkg/cluster/cluster.go (1) (1)
  • Config (18:27)
go/pkg/membership/serf.go (1) (1)
  • Config (20:27)
go/pkg/zen/server.go (1) (1)
  • Config (33:39)
go/pkg/otel/grafana.go (1) (1)
  • Config (30:46)
go/apps/api/config.go (1) (1)
  • Config (8:74)
go/pkg/rpc/rpc.go (1) (1)
  • Config (28:33)
go/apps/api/routes/services.go (2) (2)
  • Services (17:25)
  • EventBuffer (13:15)
go/pkg/zen/middleware_metrics.go (1) (1)
  • EventBuffer (16:18)
go/pkg/zen/validation/validator.go (1) (1)
  • Validator (24:26)
go/pkg/cluster/interface.go (1) (1)
  • Instance (10:16)
go/pkg/membership/member.go (1)
go/pkg/membership/serf.go (8) (8)
  • m (118:155)
  • m (157:159)
  • m (161:163)
  • m (165:167)
  • m (169:175)
  • m (177:240)
  • m (242:259)
  • m (261:263)
go/apps/api/integration/multi_node_ratelimiting/accuracy_test.go (7)
go/apps/api/routes/v2_ratelimit_limit/handler.go (5) (5)
  • limit (115:115)
  • duration (116:116)
  • New (32:162)
  • Request (20:20)
  • Response (21:21)
go/pkg/testutil/flags.go (1) (1)
  • SkipUnlessIntegration (32:37)
go/apps/api/integration/harness.go (5) (5)
  • h (97:99)
  • h (102:159)
  • h (162:164)
  • h (167:172)
  • New (48:95)
go/pkg/testutil/seed/seed.go (2) (2)
  • New (30:36)
  • Resources (16:20)
go/pkg/uid/uid.go (1) (1)
  • New (63:83)
go/pkg/attack/attack.go (2) (2)
  • Attack (24:67)
  • Rate (10:13)
go/apps/api/integration/http.go (2) (2)
  • TestResponse (14:19)
  • res (45:45)
go/internal/services/ratelimit/service.go (2)
go/internal/services/ratelimit/interface.go (2) (2)
  • RatelimitRequest (17:44)
  • RatelimitResponse (49:79)
go/internal/services/ratelimit/bucket.go (4) (4)
  • b (41:43)
  • b (82:95)
  • b (98:112)
  • bucket (24:30)
go/pkg/testutil/http.go (1)
go/pkg/testutil/seed/seed.go (2) (2)
  • Seeder (23:27)
  • Resources (16:20)
go/apps/api/run_test.go (2)
go/pkg/testutil/flags.go (1) (1)
  • SkipUnlessIntegration (32:37)
go/pkg/uid/uid.go (1) (1)
  • InstancePrefix (34:34)
go/pkg/cache/cache.go (3)
go/pkg/cache/noop.go (7) (7)
  • c (9:12)
  • c (13:13)
  • c (14:14)
  • c (16:16)
  • c (18:20)
  • v (10:10)
  • v (26:26)
go/pkg/cache/entry.go (1) (1)
  • swrEntry (7:16)
go/pkg/cache/interface.go (2) (2)
  • Hit (43:43)
  • Null (41:41)
go/apps/api/integration/harness.go (2)
go/apps/api/config.go (1) (1)
  • Config (8:74)
go/apps/api/run.go (1) (1)
  • Run (34:182)
go/internal/services/ratelimit/replay.go (3)
go/internal/services/ratelimit/service.go (1) (1)
  • service (33:49)
go/gen/proto/ratelimit/v1/service.pb.go (12) (12)
  • ReplayRequest (297:315)
  • ReplayRequest (328:328)
  • ReplayRequest (343:345)
  • ReplayResponse (370:386)
  • ReplayResponse (399:399)
  • ReplayResponse (414:416)
  • RatelimitRequest (27:53)
  • RatelimitRequest (66:66)
  • RatelimitRequest (81:83)
  • RatelimitResponse (116:142)
  • RatelimitResponse (155:155)
  • RatelimitResponse (170:172)
go/internal/services/ratelimit/interface.go (2) (2)
  • RatelimitRequest (17:44)
  • RatelimitResponse (49:79)
go/pkg/cluster/cluster.go (2)
go/pkg/cluster/interface.go (2) (2)
  • Instance (10:16)
  • Cluster (26:63)
go/pkg/ring/ring.go (9) (9)
  • r (145:175)
  • r (192:208)
  • r (213:219)
  • r (231:242)
  • r (264:292)
  • New (111:121)
  • Config (59:70)
  • Node (47:56)
  • Ring (86:94)
⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: Build / Build
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Packages / Test ./internal/billing
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: autofix
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (184)
go/pkg/otel/tracing/trace.go (2)

6-8: LGTM: OpenTelemetry imports have been added.

These imports support the new propagation functionality.


23-23: Enhanced tracing with text map propagator.

Setting up the OpenTelemetry text map propagator enables proper trace context propagation across service boundaries, which is crucial for distributed tracing in a microservices environment.

go/pkg/ring/ring_test.go (1)

172-173: Consistent terminology change from node to instance.

The test has been updated to use "instanceID" instead of "nodeID" to maintain consistency with the terminology change throughout the codebase.

go/pkg/ring/ring.go (7)

77-79: Renamed field from nodeID to instanceID.

The token struct field has been updated to use the new terminology, which aligns with the broader renaming initiative across the codebase.


90-91: Updated comment to reflect terminology change.

The comment now refers to "instanceIDs" instead of "nodeIDs", maintaining consistency with the field renaming.


154-154: Updated log message to use new terminology.

Log message now refers to "newInstanceID" instead of "newNodeID", maintaining consistency in terminology throughout the application logs.


161-162: Updated token creation to use instanceID.

Token creation now uses the instanceID field instead of nodeID, reflecting the terminology change.


192-195: Updated method signature and log message.

The RemoveNode method parameter has been renamed from nodeID to instanceID, and the log message has been updated accordingly to maintain consistency with the terminology change.


197-204: Updated node removal logic to use instanceID.

The node deletion and token filtering now use instanceID instead of nodeID, maintaining consistency with the terminology change.


285-287: Updated node lookup and error message to use instanceID.

The node lookup and error message now use instanceID instead of nodeID, completing the consistent terminology change throughout the Ring implementation.

go/pkg/cluster/cluster.go (14)

20-20: Consistent terminology change from Node to Instance

The renaming from Node to Instance in the Config struct is part of a broader terminology standardization across the codebase. This improves clarity and consistency.


37-37: Type parameter update to use Instance in ring initialization

The ring initialization now properly uses the Instance type parameter instead of Node, aligning with the type changes in the rest of the file.


51-52: Updated event topic types for clarity

Event topics now use Instance type parameter instead of Node, maintaining consistency with the rest of the refactoring.


62-65: Consistent type usage in AddNode call

The AddNode call correctly uses ring.Node[Instance] type, showing how the underlying ring still uses Node terminology while the cluster package has transitioned to Instance.


75-78: Updated field types in cluster struct

Field types have been updated from Node to Instance for the self field, and the ring is now typed as *ring.Ring[Instance]. This ensures type consistency throughout the struct.


80-81: Updated event topic type parameters

Event topics for join and leave events now properly use the Instance type, maintaining consistency with the rest of the codebase.


87-89: Updated method signature for Self()

The Self() method now correctly returns Instance instead of Node, completing the type transition.


100-100: Updated metric attribute name

The metric attribute has been renamed from "nodeID" to "instanceID" to maintain consistency with the terminology changes throughout the codebase.


111-114: Updated method documentation and signature for SubscribeJoin

Both the method documentation and signature have been updated to use Instance instead of Node, providing clear and consistent API documentation.


117-120: Updated method documentation and signature for SubscribeLeave

Both the method documentation and signature have been updated to use Instance instead of Node, providing clear and consistent API documentation.


131-141: Updated join event handling with new terminology

Join event handling has been updated to use the Instance terminology, including updating log messages and the struct creation for adding nodes to the ring.


148-152: Updated leave event handling with new terminology

Leave event handling has been updated to use the instance variable and appropriate log messages with "instanceID" instead of "nodeID".


163-165: Renamed method from FindNode to FindInstance

The method has been renamed from FindNode to FindInstance to maintain consistency with the terminology changes, while preserving the same functionality.


168-168: Updated return type for FindInstance

The return type for FindInstance has been updated to Instance{} from Node{} to maintain type consistency.

go/pkg/testutil/flags.go (1)

1-1:

Details

✅ Verification successful

Package renamed from testflags to testutil

The package has been renamed from testflags to testutil, which suggests a broader scope for this utility package beyond just flags. This change requires updating all imports accordingly across the codebase.


🏁 Script executed:

#!/bin/bash
# Check if all imports for this package have been updated from testflags to testutil

echo "Checking for any remaining imports of the old testflags package"
rg -l 'import\s+\(.+\"github.com/unkeyed/unkey/go/pkg/testflags\"' --type go || echo "No files importing old package found - good!"

echo "Verifying new imports of testutil package"
rg -l 'import\s+\(.+\"github.com/unkeyed/unkey/go/pkg/testutil\"' --type go

Length of output: 760


Action Required: Verify All Imports Reflect the Rename to testutil

The package has been correctly renamed from testflags to testutil. However, our initial verification script encountered regex errors due to escape issues. To help ensure that all import statements have been updated across the codebase, please run the revised verification script below which uses fixed-string matching:

  • Confirm that no files reference the old package (testflags).
  • Verify that all intended imports now use the new package (testutil).

Please manually verify the output from this script and adjust any remaining import statements as needed.


🏁 Script executed:

#!/bin/bash
echo "Checking for old package references (testflags)"
rg -F 'github.com/unkeyed/unkey/go/pkg/testflags' --type go || echo "No old package references found."

echo "Checking for new package references (testutil)"
rg -F 'github.com/unkeyed/unkey/go/pkg/testutil' --type go || echo "No new package references found."

Length of output: 3286


Renaming Verified: All import statements have been correctly updated to use the new testutil package.
The revised verification script confirms that no file references the old testflags package, and all occurrences now point to testutil (including its subpackages such as containers and seed). Please ensure that the new package structure is intentional regarding subpackage use.

go/proto/ratelimit/v1/service.proto (1)

117-117: Added blank line for better code structure

Added a blank line after the denied field in the ReplayRequest message to improve code readability and structure. This is a minor formatting improvement and doesn't affect functionality.

go/pkg/uid/doc.go (2)

12-12: Consistent renaming from "node" to "instance"

This change from "node_" to "ins_" prefix is consistent with the broader codebase renaming effort from "node" to "instance" as mentioned in the PR.


23-23:

Details

✅ Verification successful

Verify the example usage of uid.New matches implementation

The example has been changed from uid.New("custom_") to uid.New("custom"), but the expected result still shows "custom_1z4UVH4C7Bgt8NsssqZxTTVIiWf" with an underscore. This seems inconsistent.


🏁 Script executed:

#!/bin/bash
# Check if the uid.New implementation automatically adds an underscore
# or if the example is inconsistent

# Look for the implementation of uid.New
rg -p "func New\(" go/pkg/uid/

Length of output: 86


Clarify Example Usage of uid.New
The implementation of uid.New in go/pkg/uid/uid.go automatically appends an underscore between the provided prefix and the generated ID. As a result, calling uid.New("custom") correctly produces an ID that starts with "custom_" (e.g., "custom_1z4UVH4C7Bgt8NsssqZxTTVIiWf"). No changes to the example are needed.

go/apps/api/routes/v2_ratelimit_delete_override/400_test.go (1)

86-86: Updated to use Resources() method instead of field

This change adapts the test to use the new method-based approach for accessing Resources, which is consistent with similar changes across the codebase.

go/apps/api/routes/v2_ratelimit_set_override/404_test.go (1)

26-26: Updated to use Resources() method instead of field

This change adapts the test to use the new method-based approach for accessing Resources, which is consistent with similar changes across the codebase.

go/pkg/discovery/doc.go (1)

24-24: Consistent renaming from NodeID to InstanceID

This change is aligned with the broader codebase renaming effort from "node" to "instance" and updates the example accordingly with the new "ins_" prefix.

go/apps/api/routes/v2_ratelimit_get_override/200_test.go (3)

26-26: Method call usage instead of property access

The change from h.Resources.UserWorkspace.ID to h.Resources().UserWorkspace.ID aligns with the refactoring across the codebase where Resources is now accessed via a method call instead of direct property access. This change maintains consistency with the updated API.


40-40: Method call usage instead of property access

Change from property access to method call is consistent with the updated API pattern.


58-58: Method call usage instead of property access

Consistent change to use h.Resources() method call pattern, maintaining the same functionality while following the updated API design.

go/apps/api/routes/v2_ratelimit_get_override/404_test.go (2)

27-27: Method call usage instead of property access

The change from direct property access to method call follows the consistent pattern being applied throughout the codebase, aligning with the updated API design.


41-41: Method call usage instead of property access

Consistent change to use the Resources() method instead of direct property access, maintaining functional equivalence while following updated API patterns.

go/apps/api/routes/v2_ratelimit_delete_override/404_test.go (2)

27-27: Method call usage instead of property access

Consistent with the broader refactoring to access resources via the Resources() method instead of direct property access.


42-42: Method call usage instead of property access

Proper implementation of the updated API pattern using h.Resources().UserWorkspace.ID instead of direct property access.

go/apps/api/routes/v2_ratelimit_limit/403_test.go (1)

27-27: Method call usage instead of property access

The change from direct property access h.Resources.UserWorkspace.ID to method call h.Resources().UserWorkspace.ID aligns with the refactoring pattern applied throughout the codebase. Based on the relevant code snippets, this matches the updated method implementation in testutil/http.go and integration/harness.go.

go/apps/api/routes/v2_ratelimit_limit/404_test.go (2)

32-32: Consistent method access to Resources detected

The code has been updated to use method call syntax h.Resources() instead of direct property access. This change appears to be part of a larger refactoring effort to use a method-based approach for accessing resources throughout the test suite.


67-67: Same method-based Resources access pattern applied consistently

The pattern of using h.Resources() method call rather than direct property access is consistently applied here as well. This ensures uniform resource access throughout the test.

go/apps/api/routes/v2_ratelimit_set_override/200_test.go (3)

26-26: Consistent Resources() method usage

This change follows the same pattern seen in other files, using the method call h.Resources() instead of direct property access, providing a consistent approach to resource access.


41-41: Method-based resource access for root key creation

The root key creation now uses h.Resources() method call with proper permission scope added. The consistent application of this pattern enhances the maintainability of the test suite.


64-65: Consistent application of Resources() method pattern in verification steps

All verification steps for override queries have been updated to use the method-based resource access pattern. This consistent approach helps maintain code uniformity throughout the test suite.

Also applies to: 90-91, 116-117

go/apps/api/routes/v2_ratelimit_delete_override/200_test.go (3)

26-26: Consistent resource access pattern in setup

The namespace and override creation steps use the method-based resource access pattern (h.Resources()) consistently, which aligns with the changes observed in other test files.

Also applies to: 37-38


55-55: Method-based resource access for root key with specific permission

The root key creation with a namespace-specific permission now uses the method-based approach for accessing resources. This consistency helps maintain a uniform codebase.


74-75: Consistent method usage in verification step

The verification step for the override query continues the pattern of using method-based resource access, ensuring consistency throughout the test case.

go/apps/api/routes/v2_ratelimit_delete_override/403_test.go (1)

27-27: Consistent implementation of method-based resource access

All instances of resource access in this workspace permission test have been updated to use the method call pattern (h.Resources()) consistently. This is part of a broader refactoring pattern seen across the test suite.

The refactoring from direct property access to method calls suggests an architectural change in how test resources are managed. This approach could provide benefits such as lazy loading, centralized resource management, or additional resource initialization logic.

Also applies to: 38-38, 80-81

go/apps/api/routes/v2_ratelimit_limit/400_test.go (2)

166-166: Access pattern changed to method call.

The code now uses h.Resources() method instead of directly accessing h.Resources field. This change improves encapsulation and allows for potential lazy loading or other optimizations in resource handling.


175-175: Same method call pattern consistently applied.

The method call pattern is consistently applied here as well, maintaining code consistency throughout the test.

go/apps/api/routes/v2_ratelimit_get_override/403_test.go (2)

27-27: Consistent resource access pattern applied.

The code has been updated to use the method call h.Resources() instead of direct field access, which aligns with the changes seen in other test files. This improves encapsulation and allows for more flexible resource management.


38-38: Method call pattern consistently applied.

Same resource access pattern change applied here, ensuring consistency throughout the codebase.

go/pkg/otel/grafana.go (2)

31-33: Documentation updated to reflect terminology change.

The comments have been updated to use "InstanceID" instead of "NodeID", maintaining consistent terminology throughout the codebase. This aligns with the prefix changes seen in the uid package.


86-86: Consistent use of updated field name.

The code now references config.InstanceID instead of config.NodeID when calling semconv.ServiceInstanceID, ensuring that the terminology change is consistently applied in both definitions and usage.

go/apps/api/routes/v2_ratelimit_set_override/403_test.go (1)

27-27: Good refactoring to use method call instead of direct field access.

The change from accessing Resources as a direct field to calling it as a method (h.Resources()) improves encapsulation and aligns with good object-oriented design principles. This change is consistent with the updates in other test files and provides better flexibility for future changes to the resource retrieval mechanism.

go/pkg/assert/all.go (1)

3-4: Function renamed for better semantic clarity.

Renaming from Multi to All improves the clarity of the function's purpose - it performs multiple assertions and requires that all of them pass. The documentation has been consistently updated to reflect this change. The function continues to combine multiple assertion checks and returns the first error encountered.

Also applies to: 7-7, 11-11, 19-19

go/apps/api/routes/v2_ratelimit_limit/200_test.go (4)

26-27: Good refactoring to use method call instead of direct field access.

The change from accessing Resources as a direct field to calling it as a method (h.Resources()) improves encapsulation and aligns with good object-oriented design principles. This approach provides better abstraction of the underlying implementation.


42-42: Consistent method usage for resource access.

This change maintains consistency with other refactored code by using the method call pattern for accessing resources. This consistency makes the codebase more maintainable and easier to understand.


99-100: Consistent application of the resource access pattern.

Consistently applying the method call pattern for resource access throughout the codebase. This systematic approach to refactoring helps prevent bugs that could occur from mixed access patterns.


159-160: Consistent method usage across multiple test cases.

The consistent application of using h.Resources() as a method call across all test cases in this file demonstrates a thorough refactoring approach. This consistency is crucial for maintainability.

go/pkg/db/database.go (3)

53-55: Enhanced replica configuration with explicit mode.

The addition of the mode field improves code clarity by explicitly stating the intended purpose of the database replica. Setting it to "rw" clearly indicates this is a read-write connection.


59-61: Explicit mode declaration for default read replica.

Setting the mode to "rw" when using the write connection as the default read replica provides clear documentation of how the connection is configured. This improves code readability and maintainability.


74-76: Appropriate mode for dedicated read-only replica.

Setting the mode to "ro" for the dedicated read-only connection is correct and clearly documents the intended purpose of this database connection. This explicit labeling will help with debugging and monitoring in production environments.

go/pkg/cache/noop.go (1)

25-25: Parameter type updated for SWR method signature

The parameter type for the error handling function has been changed from translateError func(err error) CacheHit to op func(err error) Op. This aligns with the changes in the Cache interface and the standard implementation in cache.go.

go/apps/api/cancel_test.go (1)

39-39: Updated field name from ClusterNodeID to ClusterInstanceID

This change maintains consistency with the codebase-wide refactoring from "node" to "instance" terminology. The use of the predefined uid.InstancePrefix constant instead of a hardcoded string is also a good practice.

go/internal/services/keys/service.go (4)

4-9: Added necessary imports for caching implementation

The imports have been properly added to support the new caching functionality.


16-16: Added Clock field to Config struct

The Clock interface will be used for time-based operations in the cache, enabling better testability by allowing injection of a mock clock during tests.


22-23: Added keyCache field to service struct

This adds a type-safe cache for keys, which should improve performance by reducing database queries for frequently accessed keys.


41-45: Updated service initialization to include keyCache

The service struct is properly initialized with the new keyCache field, and the cache is wrapped with tracing middleware for better observability.

go/apps/api/routes/v2_ratelimit_get_override/400_test.go (1)

88-88: Updated Resources access from direct field to method call

Changed from h.Resources.UserWorkspace.ID to h.Resources().UserWorkspace.ID, which suggests that Resources is now a method rather than a field. This is a good practice that allows for lazy initialization, validation, or other logic when accessing resources.

.github/workflows/agent_build_publish.yaml (1)

21-23: Improved version extraction approach with helpful comments

The change replaces a likely git command-based approach with a direct environment variable approach using bash parameter expansion. This simplifies the version extraction process by eliminating an external command dependency.

go/apps/api/routes/v2_ratelimit_set_override/400_test.go (1)

162-162: Updated Resource access pattern

The code has been updated to use h.Resources() as a method call instead of directly accessing it as a property. This change ensures consistency with the refactored harness implementation.

go/pkg/discovery/redis.go (5)

27-28: Consistent renaming from "node" to "instance"

Field has been appropriately renamed to maintain consistency with the broader terminology change throughout the codebase.


40-41: Updated field name and documentation

The field and its documentation have been updated to consistently use "instance" terminology instead of "node".


75-75: Constructor updated for renamed field

Constructor initialization properly uses the renamed field.


125-125: Updated method call with renamed field

The advertise method now uses the renamed instanceID field.


192-192: Updated method call with renamed field

The Shutdown method now uses the renamed instanceID field.

go/apps/api/routes/v2_ratelimit_limit/accuracy_test.go (2)

19-19: Updated test utility import

The code now uses testutil.SkipUnlessIntegration(t) instead of testflags.SkipUnlessIntegration(t), aligning with package reorganization.


75-75: Updated Resource access pattern

Access to UserWorkspace.ID has been updated to use h.Resources() as a method call instead of accessing Resources as a property, consistent with the refactored test harness implementation.

Also applies to: 81-81

go/apps/api/routes/v2_ratelimit_limit/handler.go (1)

15-15: New tracing import looks good.
The addition of the tracing import aligns well with the new spans introduced.

go/pkg/db/replica.go (2)

37-44: Good use of tracing in PrepareContext.
Including the mode and query attributes in the span helps identify performance bottlenecks in statement preparation.


60-67: Effective context instrumentation for QueryRowContext.
No issues found. Continue to ensure that any pertinent runtime errors are captured in the span.

go/internal/services/keys/verify.go (3)

9-13: Updated imports to include cache and tracing.
These additions align with the new caching mechanism and observability improvements.


17-18: Tracing in key verification is a helpful addition.
This provides better visibility into the performance of the verification process.


68-68: Minor layout change.
The extra blank line doesn't pose any functional issues. It's acceptable to leave as is.

go/pkg/zen/validation/validator.go (1)

57-59:

Details

❓ Verification inconclusive

Add context for improved tracing
The updated function signature to include ctx context.Context and the new tracing span is a solid best practice for observability and resource management.

Please verify that all references to Validate have been updated to call the new signature:


🏁 Script executed:

#!/bin/bash
rg -A 3 'Validate('

Length of output: 90


Manual Verification Needed for Updated Validate Signature

The updated function signature correctly adds ctx context.Context and a new tracing span, which is a solid best practice for observability and resource management. However, the verification command (rg -A 3 'Validate(') produced a regex parse error ("unclosed group") due to the unescaped parenthesis. Please manually verify that every reference to Validate has been updated to call the new signature. For example, you could try running:

rg -A 3 'Validate\('

to correctly capture the calls. Once confirmed, this change can be approved.

go/internal/services/permissions/check.go (1)

15-28: Good use of trace instrumentation and caching
The approach of checking the cache first and then the database, together with the new tracing span, is a neat enhancement to performance and observability.

go/internal/services/ratelimit/interface.go (2)

6-7: Import addition looks good.

Added the necessary import for the new Window type from the ratelimit proto package.


76-78: Great enhancement to the rate limit response.

Adding the CurrentWindow and PreviousWindow fields significantly improves the API by providing clients with detailed window information. This will enable better client-side decisions based on rate limit statuses and improve observability of rate limiting behavior.

go/pkg/membership/member.go (3)

9-13: Terminology update looks good.

The change from NodeID to InstanceID is consistent with modern cloud terminology and improves clarity. The comment has been appropriately updated to reflect this change.


36-36: Map key updated correctly.

The ToMap method has been properly updated to use "instanceID" as the key, maintaining consistency with the field name change.


48-52: Error handling properly updated.

The memberFromMap function has been correctly updated to check for the "instanceID" field and provide appropriate error messages when it's missing.

go/Taskfile.yml (2)

21-30: Enhanced integration testing setup.

The renaming from test-e2e to test-integration more accurately reflects the test type. The addition of OpenTelemetry environment variables and integration test flag improves observability during cluster testing. The 30-minute timeout accommodates complex cluster tests.


31-33: OpenTelemetry support is a great addition.

Adding a dedicated task for starting the OpenTelemetry service simplifies the testing workflow and ensures proper instrumentation for monitoring cluster behavior during tests.

go/apps/api/config.go (2)

25-26: Terminology update is consistent.

Changing ClusterNodeID to ClusterInstanceID maintains consistency with the naming changes seen in other files, improving clarity throughout the codebase.


79-84: Validation method updated appropriately.

The change from assert.Multi to assert.All while maintaining the same validation checks ensures that the configuration validation remains robust. The checks for non-empty instance ID and valid port numbers are essential for proper cluster configuration.

go/pkg/zen/server.go (3)

34-35: Consistent terminology change from NodeID to InstanceID

The renaming from NodeID to InstanceID in both the struct field and example code is part of a codebase-wide terminology standardization. This change improves clarity and consistency.

Also applies to: 51-51


5-5: Improved error handling for server shutdown

Adding the errors import and using errors.Is() to check for http.ErrServerClosed is a good improvement. This change allows the server to exit gracefully during intentional shutdowns without reporting an error, which is a standard Go practice for HTTP servers.

Also applies to: 152-152


149-149: Enhanced logging with server type information

The logging statement now includes more context by specifying the server type as "http". This additional information improves observability and debugging.

go/pkg/cache/interface.go (2)

21-21: Enhanced cache control with operation-based error handling

The SWR method signature has been refactored to use an op func(error) Op parameter instead of translateError func(error) CacheHit. This change provides more explicit control over cache operations when errors occur during revalidation.


50-60: New Op type for fine-grained cache operation control

The introduction of the Op type with its three constants (Noop, WriteValue, WriteNull) improves the cache interface by providing explicit operation control. This approach allows more specific handling of different cache scenarios:

  • Noop: Skip any cache updates
  • WriteValue: Store the value in the cache
  • WriteNull: Mark the entry as not existing in the origin

This change enhances the flexibility and clarity of the caching system.

go/pkg/membership/serf.go (2)

21-21: Consistent terminology change from NodeID to InstanceID

The renaming from NodeID to InstanceID is consistently applied throughout the code, including in the Config struct, the Member struct instantiation, Serf configuration, and all logging statements. This change maintains terminology consistency across the codebase.

Also applies to: 57-57, 65-65, 125-125, 131-131, 138-139, 143-143, 146-146


6-6: Suppressing Serf internal logs

Using io.Discard for Serf's log output reduces log noise by suppressing internal Serf logs. This is appropriate since the application already handles relevant Serf events through its own logging system, as seen in the handleEvents method.

Also applies to: 72-73

deployment/docker-compose.yaml (1)

55-55: Enhanced observability with OpenTelemetry integration

The addition of the OpenTelemetry service and configuration for the apiv2 service significantly improves the application's observability. Using the Grafana OTEL LGTM stack provides comprehensive monitoring capabilities (Logs, Grafana, Traces, Metrics) that will be valuable for:

  1. Debugging performance issues
  2. Monitoring rate limiting behavior
  3. Tracing requests across clustered services
  4. Visualizing metrics through Grafana dashboards

This integration aligns well with the PR objectives to enhance performance monitoring and testing capabilities for rate limiting.

Also applies to: 65-67, 176-183

go/internal/services/permissions/service.go (6)

4-4: No issues with the new time import.


6-8: New imports for cache, middleware, and clock are appropriate.


19-20: Introducing a dedicated cache field for permissions is a good approach.


28-28: Adding a Clock field aligns well with the cache's stale-while-revalidate mechanism.


31-46: Error handling and initialization of the cache look correct.


52-53: Applying WithTracing ensures cache operations are instrumented for observability.

go/pkg/otel/metrics/metrics.go (4)

157-157: Renaming to instanceID improves consistency with updated terminology.


162-162: Example usage updated to match the new attribute name.


168-170: Introducing Ratelimit struct for tracking origins is a clear extension of metrics coverage.


267-272: Properly initializing Ratelimit.Origin follows the same metric pattern as existing counters.

go/pkg/cache/middleware/tracing.go (2)

87-87: Adopting the op parameter aligns with the updated SWR signature in the cache package.


92-97: Adding a nested tracing span for refreshFromOrigin improves observability without altering logic.

go/pkg/rpc/rpc.go (6)

1-16: New package and imports are well-organized and match the file's functionality.


18-26: Server struct uses a mutex to manage concurrency, which effectively prevents race conditions.


28-33: Config struct is concise and focuses on logger and rate limit handler needs.


35-54: New function correctly sets timeouts to avoid leaking connections and includes tracing interceptors.


81-112: Listen method locks to avoid multiple starts, providing graceful handling of repeated calls.


114-133: Shutdown gracefully stops the server, ensuring in-flight requests are not dropped mid-processing.

go/apps/api/run_test.go (4)

13-13: Use of testutil import looks good.
Switching to testutil aligns with the unified approach for testing helpers.


21-21: Integration test skipping is correctly updated.
Replacing testflags.SkipUnlessIntegration(t) with testutil.SkipUnlessIntegration(t) is consistent with the new library usage.


37-37: Updated instance ID prefix usage is consistent.
Using uid.New(uid.InstancePrefix) helps standardize the naming convention across the cluster. Verify that any references to the old prefix are removed.


46-46: Renamed field to ClusterInstanceID.
Renaming ClusterNodeID to ClusterInstanceID clarifies the new node/instance terminology and improves code consistency.

go/apps/api/integration/multi_node_ratelimiting/accuracy_test.go (1)

1-19: Imports and package declaration are properly structured.
The new test package name multi_node_ratelimiting_test and imported modules align with the existing integration test framework.

go/internal/services/ratelimit/service.go (5)

10-10: Connect-based interface import integration confirmed.
Importing ratelimitv1connect is necessary for implementing RatelimitServiceHandler. No concerns.


51-51: Service now implements RatelimitServiceHandler.
Declaring the interface conformance ensures type-safety and discoverability.


89-97: Aggregated assertions enforced via assert.All.
Switching to assert.All is a clear approach to validate all constraints at once.


125-130: Fallback to local ratelimit logic if remote response is nil.
This branch gracefully handles the scenario when the node is the origin or remote calls fail. Implementation looks good.


153-154: Context-aware window retrieval is correct.
Passing ctx to getCurrentWindow and getPreviousWindow ensures correct tracing. No issues here.

go/apps/api/integration/http.go (1)

1-12:

Details

✅ Verification successful

Confirm usage of math/rand/v2.

The import path math/rand/v2 appears non-standard. Please ensure that this versioned path is intentional and that the correct module is in use; otherwise, this may cause unexpected compilation issues.

Would you like to verify whether math/rand/v2 is truly available, or switch to the standard math/rand package if not?


🏁 Script executed:

#!/bin/bash
# Description: Check the imports referencing `math/rand/v2` across the codebase.
rg 'math/rand/v2'

Length of output: 267


Confirmed: Continued usage of math/rand/v2 is intentional.

After verifying the codebase, we found that math/rand/v2 is used consistently across multiple files (e.g., in go/apps/api/integration/http.go, go/pkg/sim/events.go, go/pkg/sim/rng.go, etc.). This indicates that the versioned import is a deliberate part of the design. No changes are required.

go/apps/api/run.go (7)

26-26: No concerns with new import.

The addition of the rpc package import aligns with the new RPC service functionality.


49-49: Use consistent naming for instance identifiers.

Assigning InstanceID from cfg.ClusterInstanceID is consistent with the rest of the code. No issues here.


61-61: Consistent logging field.

Including "instanceID" in the logger fields improves traceability of logs across distributed systems.


137-137: Conditional cluster logic is appropriate.

Enabling RPC services only when cfg.ClusterEnabled is true is a clear separation of concerns.


155-162: Good introduction of a clock parameter.

Passing clk to permissions.New fosters testability and time-based logic control. No changes needed.


276-279: Fallback to static addresses.

Using static addresses when ClusterDiscoveryRedisURL is empty is consistent. No concerns here.


263-275:

Details

✅ Verification successful

Redis-based discovery looks correct.

Registering the rd.Shutdown in shutdowns is a clean approach. Ensure you have integration tests verifying Redis discovery flows.

Do you want to run a quick script to confirm references to ClusterDiscoveryRedisURL across the project, ensuring no leftover or contradictory usage that might cause confusion?


🏁 Script executed:

#!/bin/bash
rg 'ClusterDiscoveryRedisURL' 

Length of output: 501


Redis discovery configuration verified

All references to ClusterDiscoveryRedisURL across the codebase are consistent:

  • Usage in go/apps/api/run.go uses the URL to initialize Redis discovery as expected.
  • Flag declaration in go/cmd/api/main.go and its configuration in go/apps/api/config.go confirm the intended behavior.
  • The integration harness in go/apps/api/integration/harness.go appropriately sets it to an empty string when not testing Redis flows.

No contradictory or leftover usages were found. Please ensure that integration tests cover the Redis discovery flows thoroughly.

go/pkg/cluster/noop.go (6)

10-10: Consistent renaming of Node to Instance.

The self field type has been correctly updated from Node to Instance, aligning with the broader refactoring effort across the codebase.


30-33: Initialization with the new Instance type looks good.

The struct initialization has been properly updated to use the Instance type instead of Node.


36-38: Return type correctly updated to Instance.

The Self() method's return type has been appropriately changed from Node to Instance.


41-46: Method renamed from FindNode to FindInstance.

The method has been consistently renamed and its return type updated to match the new terminology. The documentation comments have also been adjusted accordingly.


53-56: Channel type updated for SubscribeJoin.

The return type has been correctly changed from <-chan Node to <-chan Instance.


58-61: Channel type updated for SubscribeLeave.

The return type has been correctly changed from <-chan Node to <-chan Instance.

go/pkg/membership/membership_test.go (12)

24-25: Consistent renaming from NodeID to InstanceID.

The field name has been updated from NodeID to InstanceID with appropriate values for the test nodes.

Also applies to: 34-35


103-108: Updated instance naming convention.

Node identifiers have been updated from "node_X" to "instance_X" format, maintaining consistency with the broader refactoring effort.


117-124: Updated event tracking in join events.

The code now correctly tracks instances by InstanceID instead of NodeID in the join event handler.


160-171: Updated event tracking in leave events.

References and tracking have been properly updated to use InstanceID instead of NodeID in the leave event handler.


198-212: Consistent terminology in RPC port test.

Configuration in the RPC port propagation test has been updated to use InstanceID consistently.


240-244: Updated member identification.

Member identification in the test now correctly uses InstanceID for comparison.


263-277: Consistent usage in rejoin test.

References in the rejoin test have been properly updated to use InstanceID consistently.


304-337: Updated configuration in concurrent joins test.

The concurrent joins test now uses InstanceID consistently for node identification.


414-423: Updated member comparison logic.

The member comparison logic in the consistent member list test has been properly updated to use InstanceID.


430-457: Consistent terminology in invalid join addresses test.

The configuration in the invalid join addresses test has been updated to use InstanceID consistently.


481-489: Updated self-discovery test.

The self-discovery test now correctly uses InstanceID for member identification and comparison.


543-548: Updated instance creation in runMany helper.

The helper function for creating multiple instances now uses the consistent naming convention with InstanceID.

go/pkg/cluster/cluster_test.go (6)

25-37: Updated instance configuration and creation.

Configuration has been properly updated to use InstanceID instead of NodeID, and the cluster node creation now uses Instance instead of Node with the appropriate ID.


52-55: Method call updated to FindInstance.

The method call has been correctly updated from FindNode to FindInstance with appropriate variable references.


63-69: Consistent terminology in second node creation.

The second membership node creation now uses InstanceID consistently.


81-84: Updated node lookup test.

The node lookup test now uses FindInstance and the correct instance ID format.


95-100: Updated map and assertions for instance tracking.

The map used for tracking has been renamed from nodeIds to instanceIDs, and the assertions have been updated to check for the new instance ID format.


116-120: Updated noop cluster test.

The test for the noop cluster implementation now uses FindInstance instead of FindNode.

go/pkg/cluster/interface.go (6)

7-16: Updated type definition and documentation.

The Node type has been renamed to Instance with all associated documentation properly updated to reflect this change.


18-30: Updated interface documentation and Self() method.

The Cluster interface documentation and the Self() method have been properly updated to use "instance" terminology consistently.


31-42: Renamed FindNode to FindInstance.

The method has been renamed and all associated documentation has been updated to use the new terminology consistently.


43-49: Updated Shutdown method documentation.

Documentation for the Shutdown method has been properly updated to reference "instance" rather than "node".


50-56: Updated SubscribeJoin method.

The method return type and documentation have been properly updated to use Instance instead of Node.


57-63: Updated SubscribeLeave method.

The method return type and documentation have been properly updated to use Instance instead of Node.

go/pkg/testutil/seed/seed.go (1)

38-83: Excellent implementation of database seeding.

The Seed method provides a clean way to initialize test data with proper error handling using require.NoError. The clear organization of seeding different entities (root workspace, keyring, and user workspace) makes the test setup maintainable.

go/internal/services/ratelimit/bucket.go (2)

58-79: Improved tracing with clean context propagation.

Great addition of OpenTelemetry tracing to the bucket management logic. The Boolean attribute showing whether a bucket was created is valuable for monitoring and debugging.


82-95: Appropriate tracing integration for window management.

The getCurrentWindow method now correctly uses context and tracing, with good attribute to track window creation.

go/internal/services/ratelimit/peers.go (3)

19-22: Clean terminology update from 'node' to 'instance'.

This change aligns the peer struct with broader codebase terminology changes, improving consistency.


38-67: Enhanced observability with metrics collection.

The addition of metrics for tracking the origin instance ID is valuable for monitoring distributed rate limiting operations. Good use of deferred function to ensure metrics are recorded regardless of execution path.


69-98: Consistent terminology in peer creation.

The changes appropriately update all references from "node" to "instance" in the peer creation logic.

go/pkg/testutil/http.go (4)

41-42: Good modularization with the new seeder field.

Adding the seeder field to the Harness struct improves modularity by separating test data initialization from the test harness itself.


90-95: Cleaner test setup with the new seeding approach.

The use of a dedicated seeder simplifies the NewHarness function by delegating database initialization, making the code more maintainable.


134-136: Improved delegation in CreateRootKey method.

The method now properly delegates to the seeder's implementation, following good design principles.


138-140: Good accessor method for resources.

The Resources() method provides a clean way to access the seeded resources, maintaining encapsulation while exposing necessary test data.

go/internal/services/ratelimit/replay.go (1)

155-165:

Details

❓ Verification inconclusive

Verify that Current and Previous windows are consistently populated.

While errors are handled above, ensure res.CurrentWindow and res.PreviousWindow within ReplayResult are always non-nil. Otherwise, code referencing these fields may panic.


🏁 Script executed:

#!/bin/bash
# Description: Verify if the ratelimit method could return nil windows.
# We'll search for ratelimit method implementations to see if they guarantee non-nil windows.
ast-grep --pattern $'func ($_).ratelimit($_, $_, $_) ($_, error) { $$$ }'

Length of output: 74


Below is a new pair of shell scripts you can run to better locate the definition of the ReplayResult struct and inspect how its CurrentWindow and PreviousWindow fields are initialized. These commands should help verify whether these fields are guaranteed to be non-nil.

Action:
Verify that the fields CurrentWindow and PreviousWindow in the ReplayResult struct (used in go/internal/services/ratelimit/replay.go at lines 155-165) are always populated with valid (non-nil) values. The previous automated search did not yield conclusive results, so please run the above commands or perform a manual check to ensure that no execution path would result in these fields being nil, which could lead to a panic.


🏁 Script executed:

#!/bin/bash
# Step 1: Locate the ReplayResult type definition to see how the windows are declared.
rg -n "type ReplayResult" .

# Step 2: Find instances where ReplayResult is initialized to check if nil values might be assigned.
rg -n "ReplayResult{" .

Length of output: 267


Action Required: Verify Initialization of ReplayResult Windows
The current investigation couldn’t conclusively confirm that res.CurrentWindow and res.PreviousWindow in the ReplayResult struct are always non-nil. Please run the following revised shell scripts to search for the type definition and initialization of ReplayResult with correctly escaped patterns, and then manually verify that no execution path permits nil values:

Review the resulting matches to ensure that the fields used in the response (lines 155-165 of go/internal/services/ratelimit/replay.go) are always populated. If any potential nil values are detected along any initialization path, please address them to avoid panics at runtime.


🏁 Script executed:

#!/bin/bash
# Step 1: Locate the ReplayResult type definition (using proper escaping)
rg -n "type ReplayResult\s*\\{" .

# Step 2: Locate instances where ReplayResult is initialized (using proper escaping)
rg -n "ReplayResult\s*\\{" .

Length of output: 186


Action Required: Manual Verification of Replay Window Initialization
The automated checks did not yield conclusive results regarding the initialization of res.CurrentWindow and res.PreviousWindow in the ReplayResult struct. Please manually inspect the code to confirm that these fields are always populated (non-nil) before being used in the response constructed in go/internal/services/ratelimit/replay.go (lines 155-165).

  • Verify the ReplayResult Type: Check the definition of ReplayResult to understand the data types and default values for CurrentWindow and PreviousWindow.
  • Review All Construction Paths: Review all places where ReplayResult is instantiated to ensure no execution path assigns a nil value to these fields.

If there is any possibility that CurrentWindow or PreviousWindow might be nil, please update the code to either enforce non-nil defaults or add appropriate nil checks to prevent runtime panics.

go/apps/api/integration/harness.go (1)

1-172: All changes look good.

The harness creation and node management approach is coherent, with checks for liveness to ensure each node is ready. The concurrency model for starting nodes appears safe, and the test cleanup routine is clearly defined. Great work!

go/pkg/cache/cache.go (2)

273-280: New error-based operation switch looks good.

The op(error) Op pattern simplifies handling different failure scenarios. This approach clearly designates how to proceed with writes or no-ops based on error conditions.


326-333: Neat usage of the error-based operation switch here as well.

Reusing the same WriteValue, WriteNull, and Noop logic keeps the behavior consistent across the refresh logic and the SWR approach.

const db = mysqlDrizzle(conn, { schema, mode: "default" });

const workspaceId = "ws_wB4SmWrYkhSbWE2rH61S6gMseWw";
const workspaceId = "ws_4RN4KfMm8oahuthjfVnWu7bAs4mv";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Workspace ID has been updated.

The workspace identifier has been changed to a new value. Ensure this is intentional and that all dependent systems are aware of this change.

Are there any other references to the previous workspace ID that might need to be updated? Consider running a search across the codebase to verify:


🏁 Script executed:

#!/bin/bash
# Check for any other references to the old workspace ID
rg "ws_wB4SmWrYkhSbWE2rH61S6gMseWw" -A 2 -B 2

Length of output: 322


Action: Update all workspace ID references for consistency

The new workspace ID in tools/migrate/migrate_subscription.ts is correctly updated to "ws_4RN4KfMm8oahuthjfVnWu7bAs4mv". However, the search revealed that tools/migrate/ch_logs.ts still references the old workspace ID "ws_wB4SmWrYkhSbWE2rH61S6gMseWw". Please verify if this is intentional. If not, update this reference to maintain consistency across the codebase, and ensure that all dependent systems are aligned with the change.

"default": "7072"
},
"nodeId": {
"instanceID": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Property renamed from nodeId to instanceID

The property has been renamed from nodeId to instanceID, which:

  1. Maintains consistency with the broader codebase terminology change from "node" to "instance"
  2. Updates the camelCase convention to use uppercase for ID (instanceID vs nodeId), which is more consistent with Go conventions

This change requires updating all references to this field in configuration files and code that parses this schema.


🏁 Script executed:

#!/bin/bash
# Check for any remaining references to nodeId in the codebase
echo "Checking for any remaining references to nodeId in the codebase"
rg "nodeId" --type go
rg "nodeId" --type json
rg "nodeId" --type yaml

echo "Verifying the adoption of instanceID in the codebase"
rg "instanceID" --type go

Length of output: 8144


Action Required: Inconsistent Renaming of 'nodeId' to 'instanceID'

The schema file correctly reflects the new property "instanceID": { in go/schema.json (line 79), which aligns with the intended change. However, multiple references to nodeId still exist throughout the codebase, including in configuration files and test files. To ensure consistency, please update all references, such as:

  • Go Source Files & Tests:

    • apps/agent/services/ratelimit/ratelimit_test.go
    • apps/agent/pkg/membership/member.go
    • Other similar Go files where nodeId is used (e.g., logging, error messages, function parameters)
  • Configuration Files:

    • apps/agent/config.staging.json
    • apps/agent/config.production.json
    • apps/agent/config.docker.json
    • YAML definition files in apps/agent/gen/proto/...

Please ensure that all instances of nodeId are updated to instanceID to maintain consistency across the entire codebase.

AuditLogBucketPrefix Prefix = "buk"
AuditLogPrefix Prefix = "log"
NodePrefix Prefix = "node"
InstancePrefix Prefix = "ins"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Terminology change from "node" to "instance".

The constant has been renamed from NodePrefix to InstancePrefix with a new value of "ins" (previously "node"). This terminology change improves consistency and the shorter prefix makes generated IDs more compact.


🏁 Script executed:

#!/bin/bash
# Verify this terminology change is applied consistently across the codebase
echo "Checking for remaining references to NodePrefix..."
rg -i "NodePrefix|node_" --type go

echo -e "\nChecking for new InstancePrefix references..."
rg -i "InstancePrefix|ins_" --type go

Length of output: 3996


Action Required: Incomplete Terminology Update from "node" to "instance"

The change in go/pkg/uid/uid.go—updating the prefix from "node" to "ins" with the new constant InstancePrefix—is a positive step toward consistency. However, the verification reveals that several parts of the codebase still reference the legacy "node" terminology. In particular:

  • go/pkg/uid/uid.go: The file still contains an outdated definition and usage of NodePrefix (e.g., NodePrefix Prefix = "node" and its subsequent use in return New(string(NodePrefix))).
  • go/pkg/membership/membership_test.go: Test cases are still using IDs with the "rpc_node_" prefix.
  • go/cmd/api/main.go: Comments and CLI flag examples continue to refer to "node", which may lead to inconsistencies.

Please update these remaining references to align with the new terminology. If any of these legacy references are intentionally retained (for backward compatibility, for example), kindly provide clarification in the code or documentation.

Comment on lines +147 to 148
b, _ := r.getOrCreateBucket(ctx, key)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Error return from r.getOrCreateBucket is ignored.
If this method can fail in certain edge cases, consider handling or logging the error to avoid silent failures.

for _, req := range requests {
key := bucketKey{req.Identifier, req.Limit, req.Duration}
bucket, _ := r.getOrCreateBucket(key)
bucket, _ := r.getOrCreateBucket(ctx, key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Revisited bucket creation without error handling.
Here too, consider reporting or handling potential errors from getOrCreateBucket for better robustness.

Comment on lines +21 to +55
func CallNode[Req any, Res any](h *Harness, node ClusterNode, method string, path string, headers http.Header, req Req) TestResponse[Res] {
h.t.Helper()

url := fmt.Sprintf("http://localhost:%d%s", node.HttpPort, path)

body := new(bytes.Buffer)
err := json.NewEncoder(body).Encode(req)
require.NoError(h.t, err)

httpReq, err := http.NewRequest(method, url, body)
require.NoError(h.t, err)

httpReq.Header = headers
if httpReq.Header == nil {
httpReq.Header = http.Header{}
}

httpRes, err := http.DefaultClient.Do(httpReq)
require.NoError(h.t, err)
defer httpRes.Body.Close()

resBody, err := io.ReadAll(httpRes.Body)
require.NoError(h.t, err)

var res Res
err = json.Unmarshal(resBody, &res)
require.NoError(h.t, err, fmt.Sprintf("failed to decode response body: %s", string(resBody)))

return TestResponse[Res]{
Status: httpRes.StatusCode,
Headers: httpRes.Header,
Body: res,
RawBody: string(resBody),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle non-200 status code responses.

Currently, the function decodes JSON and checks for errors only in I/O operations. If an endpoint returns an error status (e.g., HTTP 500), the JSON unmarshal still executes. Consider explicitly checking and handling unexpected status codes to provide clearer failure messages in your tests.

 if httpRes.StatusCode < 200 || httpRes.StatusCode >= 300 {
-   // currently unhandled
+   require.FailNowf(h.t, "unexpected status code %d", httpRes.StatusCode)
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +53 to +62
go func() {
wg.Wait()

close(ticks)
pending := len(responses)
for pending > 0 {
t.Logf("waiting for responses to be processed: %d", pending)
time.Sleep(100 * time.Millisecond)
}
close(responses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the unchanging loop condition to avoid potential hang.

In this block, pending is never updated inside the loop. This means it won’t exit even if items are consumed from the channel. Consider re-checking len(responses) each iteration or using synchronization to ensure all responses are read before closing the channel.

-for pending > 0 {
+for {
+   pending = len(responses)
    if pending == 0 {
      break
    }
    t.Logf("waiting for responses to be processed: %d", pending)
    time.Sleep(100 * time.Millisecond)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
go func() {
wg.Wait()
close(ticks)
pending := len(responses)
for pending > 0 {
t.Logf("waiting for responses to be processed: %d", pending)
time.Sleep(100 * time.Millisecond)
}
close(responses)
go func() {
wg.Wait()
close(ticks)
pending := len(responses)
for {
pending = len(responses)
if pending == 0 {
break
}
t.Logf("waiting for responses to be processed: %d", pending)
time.Sleep(100 * time.Millisecond)
}
close(responses)

Comment on lines +98 to 112
func (b *bucket) getPreviousWindow(ctx context.Context, now time.Time) *ratelimitv1.Window {
ctx, span := tracing.Start(ctx, "getCurrentWindow")
defer span.End()

sequence := calculateSequence(now, b.duration) - 1

w, ok := b.windows[sequence]
if !ok {
w, exists := b.windows[sequence]
if !exists {
w = newWindow(sequence, now.Add(-b.duration).Truncate(b.duration), b.duration)
b.windows[sequence] = w
}
span.SetAttributes(attribute.Bool("created", !exists))

return w
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect span name in getPreviousWindow method.

The span name in getPreviousWindow incorrectly uses "getCurrentWindow" instead of "getPreviousWindow", which will make tracing analysis confusing.

-	ctx, span := tracing.Start(ctx, "getCurrentWindow")
+	ctx, span := tracing.Start(ctx, "getPreviousWindow")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (b *bucket) getPreviousWindow(ctx context.Context, now time.Time) *ratelimitv1.Window {
ctx, span := tracing.Start(ctx, "getCurrentWindow")
defer span.End()
sequence := calculateSequence(now, b.duration) - 1
w, ok := b.windows[sequence]
if !ok {
w, exists := b.windows[sequence]
if !exists {
w = newWindow(sequence, now.Add(-b.duration).Truncate(b.duration), b.duration)
b.windows[sequence] = w
}
span.SetAttributes(attribute.Bool("created", !exists))
return w
}
func (b *bucket) getPreviousWindow(ctx context.Context, now time.Time) *ratelimitv1.Window {
ctx, span := tracing.Start(ctx, "getPreviousWindow")
defer span.End()
sequence := calculateSequence(now, b.duration) - 1
w, exists := b.windows[sequence]
if !exists {
w = newWindow(sequence, now.Add(-b.duration).Truncate(b.duration), b.duration)
b.windows[sequence] = w
}
span.SetAttributes(attribute.Bool("created", !exists))
return w
}

Comment on lines +87 to +89
if p.instance.ID == s.cluster.Self().ID {
// we're the origin, nothing to replay...
return
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Return a valid response object or confirm the caller can handle a nil response.

This block returns (nil, nil) when the instance is the origin, which can cause a downstream nil pointer. Provide a meaningful response or ensure the caller checks for nil.

 if p.instance.ID == s.cluster.Self().ID {
-   return nil, nil
+   // Return an empty replay response instead of nil
+   return &ratelimitv1.ReplayResponse{
+       Current:  &ratelimitv1.Window{},
+       Previous: &ratelimitv1.Window{},
+       Response: &ratelimitv1.RatelimitResponse{},
+   }, nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if p.instance.ID == s.cluster.Self().ID {
// we're the origin, nothing to replay...
return
return nil, nil
if p.instance.ID == s.cluster.Self().ID {
// we're the origin, nothing to replay...
// Return an empty replay response instead of nil
return &ratelimitv1.ReplayResponse{
Current: &ratelimitv1.Window{},
Previous: &ratelimitv1.Window{},
Response: &ratelimitv1.RatelimitResponse{},
}, nil
}

Comment on lines +31 to +47
err = s.SetWindows(ctx,
setWindowRequest{
Identifier: req.GetRequest().GetIdentifier(),
Limit: req.GetRequest().GetLimit(),
Counter: res.GetCurrent().GetCounter(),
Sequence: res.GetCurrent().GetSequence(),
Duration: time.Duration(req.GetRequest().GetDuration()) * time.Millisecond,
Time: now,
},
setWindowRequest{
Identifier: req.GetRequest().GetIdentifier(),
Limit: req.GetRequest().GetLimit(),
Counter: res.GetPrevious().GetCounter(),
Sequence: res.GetPrevious().GetSequence(),
Duration: time.Duration(req.GetRequest().GetDuration()) * time.Millisecond,
Time: now,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Check for a nil res before accessing res.GetCurrent() and res.GetPrevious().

When askOrigin returns (nil, nil)—for instance, if the current instance is the origin—this code will panic on res.GetCurrent(). Ensure res is not nil or handle the scenario to avoid a nil pointer dereference.

 res, err := s.askOrigin(ctx, req)
 if err != nil {
    ...
 }

+ if res == nil {
+   s.logger.Warn("no replay response received from origin", "identifier", req.GetRequest().GetIdentifier())
+   return
+ }

 err = s.SetWindows(ctx,
   setWindowRequest{
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
err = s.SetWindows(ctx,
setWindowRequest{
Identifier: req.GetRequest().GetIdentifier(),
Limit: req.GetRequest().GetLimit(),
Counter: res.GetCurrent().GetCounter(),
Sequence: res.GetCurrent().GetSequence(),
Duration: time.Duration(req.GetRequest().GetDuration()) * time.Millisecond,
Time: now,
},
setWindowRequest{
Identifier: req.GetRequest().GetIdentifier(),
Limit: req.GetRequest().GetLimit(),
Counter: res.GetPrevious().GetCounter(),
Sequence: res.GetPrevious().GetSequence(),
Duration: time.Duration(req.GetRequest().GetDuration()) * time.Millisecond,
Time: now,
},
res, err := s.askOrigin(ctx, req)
if err != nil {
// existing error handling
}
if res == nil {
s.logger.Warn("no replay response received from origin", "identifier", req.GetRequest().GetIdentifier())
return
}
err = s.SetWindows(ctx,
setWindowRequest{
Identifier: req.GetRequest().GetIdentifier(),
Limit: req.GetRequest().GetLimit(),
Counter: res.GetCurrent().GetCounter(),
Sequence: res.GetCurrent().GetSequence(),
Duration: time.Duration(req.GetRequest().GetDuration()) * time.Millisecond,
Time: now,
},
setWindowRequest{
Identifier: req.GetRequest().GetIdentifier(),
Limit: req.GetRequest().GetLimit(),
Counter: res.GetPrevious().GetCounter(),
Sequence: res.GetPrevious().GetSequence(),
Duration: time.Duration(req.GetRequest().GetDuration()) * time.Millisecond,
Time: now,
},
)

@chronark chronark merged commit 03063e1 into main Mar 18, 2025
26 of 30 checks passed
@chronark chronark deleted the ratelimit-clustering branch March 18, 2025 16:21
@coderabbitai coderabbitai bot mentioned this pull request Aug 20, 2025
18 tasks
@coderabbitai coderabbitai bot mentioned this pull request Oct 6, 2025
17 tasks
@coderabbitai coderabbitai bot mentioned this pull request Oct 21, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant