Skip to content

feat: rate limit config per key [WIP]#2089

Closed
SkArchon wants to merge 1 commit intomainfrom
milinda/eng-7165-rate-limiting-configuration-for-subgraphs
Closed

feat: rate limit config per key [WIP]#2089
SkArchon wants to merge 1 commit intomainfrom
milinda/eng-7165-rate-limiting-configuration-for-subgraphs

Conversation

@SkArchon
Copy link
Copy Markdown
Contributor

@SkArchon SkArchon commented Jul 28, 2025

We currently have the ability to specify dynamic rate limiting keys, we want to be able to specify rate limits per subgraphs. Since the expression context can contain the subgraph name, we utilize that information to allow the user to specify a rate limit key using expressions, and specify configurations per keys.

Example Configuration

rate_limit:
  enabled: true
  storage:
    cluster_enabled: false # set to true to use a Redis Cluster
    urls:
      - redis://localhost:6379
  key_suffix_expression: "subgraph.name"
  simple_strategy:
    rate: 10
    burst: 10
    period: 10s
    key_mapping:
      employees:
        enabled: true
        rate: 1
        burst: 1
        period: 5s

Summary by CodeRabbit

  • New Features

    • Introduced support for per-key rate limiting configuration, allowing individual rate limit settings for different keys.
    • Added a global option to hide rate limit stats from response extensions.
  • Improvements

    • Enhanced rate limiting flexibility with a new configuration structure, enabling more granular control.
    • Increased the refresh frequency for unknown JWKS key IDs, improving token validation responsiveness.
  • Bug Fixes

    • Rate limiting logic now correctly skips or applies limits based on the enabled status of each configuration entry.

Checklist

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 28, 2025

Walkthrough

This change introduces a more flexible and configurable rate limiting system. The configuration schema and codebase are refactored to support per-key rate limiting via a new key_mapping structure, replacing hardcoded parameters with dynamic lookup and evaluation. Related logic, tests, and documentation are updated accordingly.

Changes

Cohort / File(s) Change Summary
Core Rate Limiter Refactor
router/core/ratelimiter.go
Refactors the rate limiter to use a new RateLimitConfig and BaseRateLimitKey for dynamic, per-key rate limiting. Removes static parameters, updates key generation logic, and changes how rate limit parameters are resolved and applied. Updates struct fields and constructor accordingly.
Rate Limiter Configuration Schema
router/pkg/config/config.go,
router/pkg/config/config.schema.json
Adds a key_mapping property to the rate limit configuration, introducing the RateLimitSimpleStrategyEntry struct for per-key settings. Moves global HideStatsFromResponseExtension to the top-level config. Updates JSON schema to reflect new structure, allowing multiple named key entries with individual rate limit parameters.
Graph Server and Handler Integration
router/core/graph_server.go,
router/core/graphql_handler.go
Updates how rate limiter options are constructed and passed. Removes references to static configuration fields, switches to using the new config-driven approach, and updates error response handling to use the new global hide-stats flag. Removes or comments out now-obsolete code referencing the old structure.
Rate Limiter Tests
router/core/ratelimiter_test.go
Updates test calls to generateKey to match the new method signature, passing an additional argument. No changes to test logic or assertions.
JWKS Token Decoder
router/pkg/authentication/jwks_token_decoder.go
Changes the refresh rate for unknown KIDs in the JWKS HTTP client from 5 minutes to 2 seconds. No structural or logic changes beyond this rate adjustment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch milinda/eng-7165-rate-limiting-configuration-for-subgraphs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@SkArchon SkArchon marked this pull request as draft July 28, 2025 19:47
@SkArchon SkArchon changed the title feat: rate limit config per key feat: rate limit config per key [WIP] Jul 28, 2025
@github-actions
Copy link
Copy Markdown

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-327ebd984fe6fee67e818727524bf7c41919c0f6

@github-actions
Copy link
Copy Markdown

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-327ebd984fe6fee67e818727524bf7c41919c0f6-nonroot

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🔭 Outside diff range comments (3)
router/core/ratelimiter.go (3)

37-56: Remove dead code for rejectStatusCode initialization.

The check if rl.rejectStatusCode == 0 at lines 46-48 is redundant because rejectStatusCode is never initialized before this check, so it will always be 0. This appears to be leftover code from the refactoring.

Apply this diff to remove the dead code:

 	rl = &CosmoRateLimiter{
 		client:           opts.RedisClient,
 		limiter:          limiter,
 		debug:            opts.Debug,
 		baseRateLimitKey: opts.BaseRateLimitKey,
 		rateLimitConfig:  opts.RateLimitConfig,
 	}
-	if rl.rejectStatusCode == 0 {
-		rl.rejectStatusCode = 200
-	}
 	if opts.KeySuffixExpression != "" {

58-70: Remove unused rejectStatusCode field.

The rejectStatusCode field at line 63 appears to be unused after the refactoring, as this is now part of the per-key configuration in RateLimitSimpleStrategyEntry.

Apply this diff to remove the unused field:

 type CosmoRateLimiter struct {
 	client  rd.RDCloser
 	limiter *redis_rate.Limiter
 	debug   bool
 
-	rejectStatusCode int
-
 	keySuffixProgram *vm.Program
 
 	// TODO: To decouple from the config
 	rateLimitConfig  config.RateLimitSimpleStrategy
 	baseRateLimitKey string
 }

137-139: RejectStatusCode() is still referenced in graphql_handler.go
The RejectStatusCode() method isn’t dead code—it's invoked here:

  • router/core/graphql_handler.go (around the call to httpWriter.WriteHeader(h.rateLimiter.RejectStatusCode()))

If you’ve moved rejectStatusCode into a per-key config API, update this call to use the new per-key accessor (e.g. h.rateLimiter.ConfigFor(key).RejectStatusCode()) before deleting the method and field.

🧹 Nitpick comments (1)
router/pkg/config/config.schema.json (1)

1832-1835: Misleading description for enabled field

The description for enabled currently repeats the text meant for rate, which is confusing.

-        "enabled": {
-          "type": "boolean",
-          "description": "The rate at which the requests are allowed. The rate is specified as a number of requests per second."
+        "enabled": {
+          "type": "boolean",
+          "description": "Enable / disable this specific key mapping entry. When omitted, the global `enabled` setting applies."
         },
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 584e1b1 and e6d721a.

📒 Files selected for processing (7)
  • router/core/graph_server.go (1 hunks)
  • router/core/graphql_handler.go (2 hunks)
  • router/core/ratelimiter.go (5 hunks)
  • router/core/ratelimiter_test.go (8 hunks)
  • router/pkg/authentication/jwks_token_decoder.go (1 hunks)
  • router/pkg/config/config.go (2 hunks)
  • router/pkg/config/config.schema.json (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
router/pkg/authentication/jwks_token_decoder.go (1)

Learnt from: SkArchon
PR: #2067
File: router/pkg/authentication/jwks_token_decoder.go:80-106
Timestamp: 2025-07-21T14:46:34.879Z
Learning: In the Cosmo router project, required field validation for JWKS configuration (Secret, Algorithm, KeyId) is handled at the JSON schema level in config.schema.json rather than through runtime validation in the Go code at router/pkg/authentication/jwks_token_decoder.go.

router/pkg/config/config.schema.json (4)

Learnt from: SkArchon
PR: #2067
File: router/pkg/config/config.schema.json:1637-1644
Timestamp: 2025-07-21T15:06:36.664Z
Learning: In the Cosmo router project, when extending JSON schema validation for security-sensitive fields like JWKS secrets, backwards compatibility is maintained by implementing warnings in the Go code rather than hard validation constraints in the schema. This allows existing configurations to continue working while alerting users to potential security issues.

Learnt from: SkArchon
PR: #2067
File: router/pkg/authentication/jwks_token_decoder.go:80-106
Timestamp: 2025-07-21T14:46:34.879Z
Learning: In the Cosmo router project, required field validation for JWKS configuration (Secret, Algorithm, KeyId) is handled at the JSON schema level in config.schema.json rather than through runtime validation in the Go code at router/pkg/authentication/jwks_token_decoder.go.

Learnt from: endigma
PR: #2009
File: router/pkg/config/config.go:0-0
Timestamp: 2025-07-03T10:33:25.778Z
Learning: The CardinalityLimit field in the Metrics struct (router/pkg/config/config.go) is validated at the JSON schema level in config.schema.json with a minimum value constraint of 1, preventing zero or negative values without requiring runtime validation.

Learnt from: SkArchon
PR: #1929
File: router/internal/circuit/manager.go:16-25
Timestamp: 2025-06-30T20:39:02.387Z
Learning: In the Cosmo router project, parameter validation for circuit breaker configuration is handled at the JSON schema level rather than through runtime validation methods on structs. The config.schema.json file contains comprehensive validation constraints for circuit breaker parameters.

router/pkg/config/config.go (1)

Learnt from: endigma
PR: #2009
File: router/pkg/config/config.go:0-0
Timestamp: 2025-07-03T10:33:25.778Z
Learning: The CardinalityLimit field in the Metrics struct (router/pkg/config/config.go) is validated at the JSON schema level in config.schema.json with a minimum value constraint of 1, preventing zero or negative values without requiring runtime validation.

🔇 Additional comments (9)
router/core/ratelimiter_test.go (1)

43-43: LGTM! Test updates correctly match the new method signature.

The addition of nil as the second parameter to generateKey calls properly reflects the updated method signature that now accepts both *resolve.Context and *resolve.FetchInfo. The tests continue to verify the key generation logic under various scenarios (headers, claims, fallback logic) and the nil FetchInfo parameter is appropriate for these test cases.

Also applies to: 54-54, 65-65, 76-76, 87-87, 98-98, 109-109, 120-120

router/core/graph_server.go (1)

1318-1325: LGTM! Clean transition to the new rate limiting configuration.

The changes properly update the rate limiter options construction to support the new per-key rate limiting system:

  • The commented-out RejectStatusCode indicates this field was removed from the options struct
  • RateLimitConfig and BaseRateLimitKey replace the previous static parameters with a more flexible configuration approach
  • The new structure supports dynamic key resolution and per-key rate limit mappings as outlined in the PR objectives
router/core/graphql_handler.go (2)

272-273: LGTM! Proper simplification of rate limiting configuration access.

The changes correctly remove the nested SimpleStrategy references and access configuration fields directly from rateLimitConfig. The commented-out RateLimitKey assignment aligns with the new per-key rate limiting system where keys are generated dynamically rather than set statically.


316-316: LGTM! Consistent configuration access pattern.

The update to check rateLimitConfig.HideStatsFromResponseExtension directly maintains the same functionality while aligning with the simplified configuration structure that removes nested SimpleStrategy references.

router/pkg/config/config.go (2)

510-514: LGTM! Good refactoring to make stats visibility a global setting.

Moving HideStatsFromResponseExtension to the top-level RateLimitConfiguration makes sense as it's a global setting that should apply across all rate limit strategies.


527-545: Good modular design for per-key rate limiting configuration.

The refactoring effectively supports per-key rate limiting by:

  1. Embedding RateLimitSimpleStrategyEntry for default/global configuration
  2. Adding KeyMapping for per-key overrides

This design maintains backward compatibility while enabling the new per-subgraph rate limiting feature.

router/core/ratelimiter.go (3)

25-35: LGTM! Options struct properly updated for the new configuration model.

The addition of BaseRateLimitKey and RateLimitConfig aligns well with the new per-key rate limiting feature.


72-113: Well-implemented dynamic rate limit resolution logic.

The implementation correctly handles all scenarios:

  • Key-specific configurations take precedence
  • Falls back to default configuration when appropriate
  • Properly checks enabled flags at both levels

The code is clear and maintainable.


115-135: Excellent implementation of dynamic key generation.

The method properly:

  1. Returns both the raw suffix key (for config lookup) and the full key (for Redis)
  2. Clones the expression context to avoid side effects
  3. Augments the context with subgraph information

This enables the per-subgraph rate limiting feature as designed.

},
PrioritizeHTTP: true,
RefreshUnknownKID: rate.NewLimiter(rate.Every(5*time.Minute), 1),
RefreshUnknownKID: rate.NewLimiter(rate.Every(2*time.Second), 1),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify the dramatic increase in JWKS refresh frequency.

The refresh rate for unknown Key IDs has been increased from once every 5 minutes to once every 2 seconds - a 150x increase in frequency. This could significantly impact performance and potentially overwhelm JWKS providers with requests.

Please verify this change is intentional and consider the following:

  • Will external JWKS providers handle this increased request frequency?
  • Has this been tested with production-like traffic patterns?
  • Is there documentation explaining why this aggressive refresh rate is necessary?

🏁 Script executed:

#!/bin/bash
# Description: Search for any configuration or documentation explaining the JWKS refresh rate change
# Expected: Find configuration options, comments, or documentation explaining the rationale

# Search for JWKS refresh rate configurations or comments
rg -A 5 -B 5 "RefreshUnknownKID|JWKS.*refresh|refresh.*JWKS" --type go

Length of output: 1639


Document and validate JWKS unknown‐KID refresh rate change

The hard-coded RefreshUnknownKID limiter was lowered from 5 minutes to 2 seconds (150× increase), but there’s no accompanying comment or configuration entry explaining why. This could lead to a surge of HTTP requests against JWKS endpoints.

Please address the following in router/pkg/authentication/jwks_token_decoder.go:99:

  • Confirm that external JWKS providers can sustain refresh attempts every 2 seconds under production load.
  • Add a code comment or expose this interval via configuration (e.g., in authentication.JWKSConfig) with a clear rationale.
  • Update any related documentation (README or design docs) to explain why such an aggressive refresh rate is necessary.
🤖 Prompt for AI Agents
In router/pkg/authentication/jwks_token_decoder.go at line 99, the
RefreshUnknownKID rate limiter was changed from 5 minutes to 2 seconds without
explanation. Confirm with the team or documentation that external JWKS providers
can handle refreshes every 2 seconds in production. Add a clear code comment
near this line explaining the reason for this aggressive refresh rate.
Additionally, expose this interval as a configurable parameter in
authentication.JWKSConfig instead of hardcoding it. Finally, update any relevant
documentation or README files to describe the rationale behind this change and
its impact.

Comment on lines +1823 to +1827
},
"key_mapping": {
"type": "object",
"description": "The configuration per key entry.",
"additionalProperties": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add basic constraints to key_mapping to avoid silent configuration errors

key_mapping is currently declared as a totally free-form object. A typo such as "raet": 50 or an empty object {} will happily pass validation but later explode at runtime.
Consider tightening validation with e.g. patternProperties, propertyNames, or at least a minProperties: 1 to ensure that every entry contains something meaningful.

 "key_mapping": {
   "type": "object",
   "description": "The configuration per key entry.",
+  "minProperties": 1,
   "additionalProperties": {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
},
"key_mapping": {
"type": "object",
"description": "The configuration per key entry.",
"additionalProperties": {
},
"key_mapping": {
"type": "object",
"description": "The configuration per key entry.",
"minProperties": 1,
"additionalProperties": {
🤖 Prompt for AI Agents
In router/pkg/config/config.schema.json around lines 1823 to 1827, the
key_mapping object lacks constraints, allowing empty or misspelled entries to
pass validation and cause runtime errors. Add validation rules such as
patternProperties or propertyNames to restrict allowed keys, and include
minProperties: 1 to ensure the object is not empty, thereby preventing silent
configuration errors.

Comment on lines +1836 to +1862
"rate": {
"type": "integer",
"description": "The rate at which the requests are allowed. The rate is specified as a number of requests per second.",
"minimum": 1
},
"burst": {
"type": "integer",
"description": "The maximum number of requests that are allowed to exceed the rate. The burst is specified as a number of requests.",
"minimum": 1
},
"period": {
"type": "string",
"description": "The period of time over which the rate limit is enforced. The period is specified as a string with a number and a unit, e.g. 10ms, 1s, 1m, 1h. The supported units are 'ms', 's', 'm', 'h'.",
"duration": {
"minimum": "1s"
}
},
"reject_exceeding_requests": {
"type": "boolean",
"description": "Reject the requests that exceed the rate limit. If the value is true, the requests that exceed the rate limit are rejected."
},
"reject_status_code": {
"type": "integer",
"description": "The status code to return when the request is rejected. The default value is 200 (OK) as we're returning a well formed GraphQL response.",
"default": 200
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Require at least one rate-limiting parameter inside each mapping entry

All properties inside a single entry are optional, so a definition like:

rate_limit:
  simple_strategy:
    key_mapping:
      employees: {}

would pass schema validation yet leave the router with undefined values at runtime.
At minimum, enforce the same trio as the global defaults:

 "additionalProperties": false,
 ...
-"description": "The configuration for all subgraphs. The configuration is used to configure the traffic shaping for all subgraphs.",
+"description": "Per-key rate-limit settings. At least one of rate | burst | period must be supplied.",
 ...
-                "properties": {
+                "properties": {
 ...
                 },
+                "anyOf": [
+                  { "required": ["rate"] },
+                  { "required": ["burst"] },
+                  { "required": ["period"] }
+                ]

Alternatively, replicate the global required: ["rate","burst","period"] if you don’t want to inherit omitted values.

🤖 Prompt for AI Agents
In router/pkg/config/config.schema.json around lines 1836 to 1862, the
rate-limiting entries currently have all properties optional, allowing empty
objects that cause undefined runtime values. To fix this, add a "required" field
to each rate-limiting entry schema that enforces at least the "rate", "burst",
and "period" properties to be present, matching the global defaults. This
ensures each entry has the necessary parameters defined and prevents invalid
configurations.

@SkArchon
Copy link
Copy Markdown
Contributor Author

Closing till requirements are reclarified

@SkArchon SkArchon closed this Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant