Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
37f742a
proof of concept
Sam-tesouro Mar 5, 2025
89923f0
add failOpen log on client init
Sam-tesouro Mar 5, 2025
f7aa17c
linter moving things around
Sam-tesouro Mar 5, 2025
f5b23f0
Update rdcloser_test.go
Sam-tesouro Mar 5, 2025
30fdbc0
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 6, 2025
5206888
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 7, 2025
91bd663
mvp
Sam-tesouro Mar 8, 2025
36fab7d
remove impossible condition
Sam-tesouro Mar 8, 2025
ce3bb3c
add envDefault
Sam-tesouro Mar 8, 2025
ea3002c
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 10, 2025
28bc3bb
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 10, 2025
2caeb38
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 13, 2025
68beaf5
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 13, 2025
77c3909
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 17, 2025
7086c1e
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 19, 2025
ee001b7
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 24, 2025
6fbebc3
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 26, 2025
b92c17e
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 28, 2025
0b42772
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Mar 28, 2025
128fcdc
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 1, 2025
72df63c
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 3, 2025
20b983e
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 4, 2025
42ad877
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 7, 2025
4f2bddb
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 8, 2025
9648ead
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 10, 2025
cf6e041
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 11, 2025
e1277bd
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 14, 2025
b95f851
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 15, 2025
3d09e8c
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 15, 2025
9546cd7
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 16, 2025
0cbe10d
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 17, 2025
8a3ddc3
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 21, 2025
df7eb96
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Apr 28, 2025
3da170a
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro May 19, 2025
ba9dc95
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro May 20, 2025
896f9c0
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Jun 9, 2025
3898392
Merge branch 'main' into feat(router)--enable-fail-open-on-rate-limit…
Sam-tesouro Jul 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions router/core/graph_server.go
Original file line number Diff line number Diff line change
Expand Up @@ -1179,6 +1179,7 @@ func (s *graphServer) buildGraphMux(ctx context.Context,
Debug: s.rateLimit.Debug,
RejectStatusCode: s.rateLimit.SimpleStrategy.RejectStatusCode,
KeySuffixExpression: s.rateLimit.KeySuffixExpression,
FailOpen: s.rateLimit.FailOpen,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to keep this a "generic" FailOpen (could be fine) or do we want to make the feature more specific?
Naming is hard, but something like: IgnoreRedisUnavailableErrors

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am game for naming this however you think would best fit the specific use case. More specificity would almost certainly be a good idea. I like IgnoreRedisUnavailableErrors, IgnoreRateLimitsWhenRedisUnavailable, or FailOpenRateLimitsDependencyError.

ExprManager: exprManager,
})
if err != nil {
Expand Down
16 changes: 12 additions & 4 deletions router/core/ratelimiter.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@ import (
"encoding/json"
"errors"
"fmt"
rd "github.com/wundergraph/cosmo/router/internal/persistedoperation/operationstorage/redis"
"io"
"reflect"
"sync"

rd "github.com/wundergraph/cosmo/router/internal/persistedoperation/operationstorage/redis"

"github.com/expr-lang/expr/vm"
"github.com/go-redis/redis_rate/v10"
"github.com/wundergraph/cosmo/router/internal/expr"
Expand All @@ -28,6 +29,7 @@ type CosmoRateLimiterOptions struct {
RejectStatusCode int

KeySuffixExpression string
FailOpen bool
ExprManager *expr.Manager
}

Expand All @@ -38,6 +40,7 @@ func NewCosmoRateLimiter(opts *CosmoRateLimiterOptions) (rl *CosmoRateLimiter, e
limiter: limiter,
debug: opts.Debug,
rejectStatusCode: opts.RejectStatusCode,
failOpen: opts.FailOpen,
}
if rl.rejectStatusCode == 0 {
rl.rejectStatusCode = 200
Expand All @@ -55,10 +58,11 @@ type CosmoRateLimiter struct {
client rd.RDCloser
limiter *redis_rate.Limiter
debug bool

rejectStatusCode int

keySuffixProgram *vm.Program

failOpen bool
}

func (c *CosmoRateLimiter) RateLimitPreFetch(ctx *resolve.Context, info *resolve.FetchInfo, input json.RawMessage) (result *resolve.RateLimitDeny, err error) {
Expand All @@ -72,11 +76,15 @@ func (c *CosmoRateLimiter) RateLimitPreFetch(ctx *resolve.Context, info *resolve
Period: ctx.RateLimitOptions.Period,
}
key, err := c.generateKey(ctx)
if err != nil {
if err != nil && c.failOpen{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateKey is not interacting with redis, what's the reason you skip here as well?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will spin up my test environment and sort this out then get back to you!

return nil, nil
} else if err != nil {
return nil, err
}
allow, err := c.limiter.AllowN(ctx.Context(), key, limit, requestRate)
if err != nil {
if err != nil && c.failOpen{
return nil, nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check here if redis is truly unavailable?
If you're treating all errors equal, I'm wondering what the downsides could be.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an entirely fair callout, we should be concerned with treating all errors equal here. When implementing this my thoughts were anything is better then dropping the request with an internal service error. I didn't encounter unexpected side effects at that time. I wonder if strict integration testing would be enough or if adding complexity to this solution is the best path forward?

} else if err != nil {
return nil, err
}
c.setRateLimitStats(ctx, key, requestRate, allow.Remaining, allow.RetryAfter.Milliseconds(), allow.ResetAfter.Milliseconds())
Expand Down
2 changes: 2 additions & 0 deletions router/core/router.go
Original file line number Diff line number Diff line change
Expand Up @@ -798,6 +798,7 @@ func (r *Router) bootstrap(ctx context.Context) error {
URLs: r.Config.rateLimit.Storage.URLs,
ClusterEnabled: r.Config.rateLimit.Storage.ClusterEnabled,
Logger: r.logger,
FailOpen: r.Config.rateLimit.FailOpen,
})
if err != nil {
return fmt.Errorf("failed to create redis client: %w", err)
Expand Down Expand Up @@ -1253,6 +1254,7 @@ func (r *Router) Start(ctx context.Context) error {
zap.Int("burst", r.rateLimit.SimpleStrategy.Burst),
zap.Duration("duration", r.Config.rateLimit.SimpleStrategy.Period),
zap.Bool("rejectExceeding", r.Config.rateLimit.SimpleStrategy.RejectExceedingRequests),
zap.Bool("failOpen", r.Config.rateLimit.FailOpen),
)
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@ package rd
import (
"context"
"fmt"
"github.com/redis/go-redis/v9"
"go.uber.org/zap"
"net/url"
"strings"

"github.com/redis/go-redis/v9"
"go.uber.org/zap"
)

// RDCloser is an interface that combines the redis.Cmdable and io.Closer interfaces, ensuring that we can close the
Expand All @@ -20,6 +21,7 @@ type RedisCloserOptions struct {
URLs []string
ClusterEnabled bool
Password string
FailOpen bool
}

func NewRedisCloser(opts *RedisCloserOptions) (RDCloser, error) {
Expand Down Expand Up @@ -73,6 +75,10 @@ func NewRedisCloser(opts *RedisCloserOptions) (RDCloser, error) {
}

if isFunctioning, err := IsFunctioningClient(rdb); !isFunctioning {
if(opts.FailOpen) {
opts.Logger.Warn(fmt.Sprintf("Ratelimit Fail Open activated: redis client is currently not responding with provided URLs: %q", err))
return rdb, nil
}
return rdb, fmt.Errorf("failed to create a functioning redis client with the provided URLs: %w", err)
}

Expand Down
1 change: 1 addition & 0 deletions router/pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,7 @@ type RateLimitConfiguration struct {
Debug bool `yaml:"debug" envDefault:"false" env:"RATE_LIMIT_DEBUG"`
KeySuffixExpression string `yaml:"key_suffix_expression,omitempty" env:"RATE_LIMIT_KEY_SUFFIX_EXPRESSION"`
ErrorExtensionCode RateLimitErrorExtensionCode `yaml:"error_extension_code"`
FailOpen bool `yaml:"fail_open" envDefault:"false" env:"RATE_LIMIT_FAIL_OPEN"`
}

type RateLimitErrorExtensionCode struct {
Expand Down
5 changes: 5 additions & 0 deletions router/pkg/config/config.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -1810,6 +1810,11 @@
"default": "RATE_LIMIT_EXCEEDED"
}
}
},
"fail_open": {
"type": "boolean",
"description": "Enable Rate Limit fail open on redis availability failure. This interacts with Redis timeout configuration parameters, essentially adding to each requests latency in failure.",
"default": false
}
Comment on lines +1813 to 1818
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add conditional schema logic to prevent silent mis-configuration

fail_open is a welcome addition, but with the current schema a user can set
rate_limit.enabled: true, omit storage.urls, and forget to flip fail_open to true.
This will compile but blow up at runtime when Redis is missing.

Guard against that class of error by expressing the relationship directly in the
schema:

         "fail_open": {
           "type": "boolean",
           "description": "Enable Rate Limit fail open on redis availability failure. This interacts with Redis timeout configuration parameters, essentially adding to each requests latency in failure.",
           "default": false
         }
+      },
+      "allOf": [
+        {
+          "if": {
+            "properties": {
+              "enabled": { "const": true },
+              "fail_open": { "const": false }
+            }
+          },
+          "then": {
+            "required": ["storage"]
+          },
+          "else": {
+            "not": {
+              "anyOf": [
+                { "required": ["storage"] }
+              ]
+            }
+          }
+        }
+      ]

This ensures:

  1. When rate-limiting is on and fail_open is false → storage must be present.
  2. When fail_open is true → storage is optional, mirroring the intended “Redis optional” behaviour.

Keeps configuration errors in CI instead of production.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
},
"fail_open": {
"type": "boolean",
"description": "Enable Rate Limit fail open on redis availability failure. This interacts with Redis timeout configuration parameters, essentially adding to each requests latency in failure.",
"default": false
}
},
"fail_open": {
"type": "boolean",
"description": "Enable Rate Limit fail open on redis availability failure. This interacts with Redis timeout configuration parameters, essentially adding to each requests latency in failure.",
"default": false
}
},
"allOf": [
{
"if": {
"properties": {
"enabled": { "const": true },
"fail_open": { "const": false }
}
},
"then": {
"required": ["storage"]
},
"else": {
"not": {
"anyOf": [
{ "required": ["storage"] }
]
}
}
}
]
🤖 Prompt for AI Agents
In router/pkg/config/config.schema.json around lines 1813 to 1818, the schema
allows setting fail_open without requiring storage.urls when rate_limit.enabled
is true, causing runtime errors if Redis is missing. To fix this, add
conditional schema logic using "if", "then", and "else" clauses to enforce that
when rate_limit.enabled is true and fail_open is false, storage.urls must be
present, and when fail_open is true, storage.urls can be omitted. This will
prevent misconfiguration by validating these dependencies at compile time.

}
},
Expand Down
3 changes: 2 additions & 1 deletion router/pkg/config/testdata/config_defaults.json
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,8 @@
"ErrorExtensionCode": {
"Enabled": true,
"Code": "RATE_LIMIT_EXCEEDED"
}
},
"FailOpen": false
},
"LocalhostFallbackInsideDocker": true,
"CDN": {
Expand Down
3 changes: 2 additions & 1 deletion router/pkg/config/testdata/config_full.json
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,8 @@
"ErrorExtensionCode": {
"Enabled": true,
"Code": "RATE_LIMIT_EXCEEDED"
}
},
"FailOpen": false
},
"LocalhostFallbackInsideDocker": true,
"CDN": {
Expand Down