fix: jwt validation blocks on multiple requests#2216
Conversation
WalkthroughAdds configurable rate limiting for unknown JWT KID refreshes, extends JWKSConfig with AllowedAlgorithms and RefreshUnknownKID, wires config through supervisor and config schema, and updates tests and JWKS test server API (TokenForKID gains a boolean flag) to cover new behaviors including algorithm filtering and burst/wait handling. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
router-tests/testenv/testenv.go (1)
2535-2554: Timeout formatting + parent context; consider cancelability to avoid goroutine leaks
- Format specifier: time.Duration with %d prints raw ns. Prefer human‑readable.
- Parent context: tie the timeout to the environment’s context.
- Optional: testFunction never observes cancellation; on timeout it keeps running in a goroutine. Expose a ctx to it or document the risk.
Apply this minimal diff:
-func (e *Environment) WaitForTest(timeout time.Duration, testFunction func()) { +func (e *Environment) WaitForTest(timeout time.Duration, testFunction func()) { e.t.Helper() - ctx, cancel := context.WithTimeout(context.Background(), timeout) + ctx, cancel := context.WithTimeout(e.Context, timeout) defer cancel() @@ case <-ctx.Done(): - e.t.Fatalf("test timed out, want %d", timeout) - return + e.t.Fatalf("test timed out after %s", timeout) } }Optional follow-up (nice-to-have): add a ctx variant and forward it into the closure to enable cooperative cancellation.
// wrapper keeps current signature for call sites func (e *Environment) WaitForTestCtx(timeout time.Duration, fn func(ctx context.Context)) { ctx, cancel := context.WithTimeout(e.Context, timeout) defer cancel() done := make(chan struct{}) go func() { defer close(done); fn(ctx) }() select { case <-done: return; case <-ctx.Done(): e.t.Fatalf("test timed out after %s", timeout) } }router-tests/jwks/jwks.go (1)
53-69: Stabilize provider selection when useInvalidKid=trueCurrent selection iterates over a map, which is randomized in Go. If future tests depend on the specific algorithm chosen for the “invalid kid” token, this may cause flakiness. Pick a deterministic provider (e.g., smallest KID) and guard the empty-map case.
func (s *Server) TokenForKID(kid string, claims map[string]any, useInvalidKid bool) (string, error) { provider, ok := s.providers[kid] if !useInvalidKid && !ok { return "", jwt.ErrInvalidKey } else if useInvalidKid { - // If we don't care about the kid we don't care about the provider - // we just get the first random provider provided - for _, pr := range s.providers { - provider = pr - break - } + if len(s.providers) == 0 { + return "", jwt.ErrInvalidKey + } + // Choose deterministically + keys := make([]string, 0, len(s.providers)) + for k := range s.providers { keys = append(keys, k) } + sort.Strings(keys) + provider = s.providers[keys[0]] }(Remember to import "sort".)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
router-tests/authentication_test.go(5 hunks)router-tests/header_set_test.go(1 hunks)router-tests/jwks/jwks.go(1 hunks)router-tests/testenv/testenv.go(1 hunks)router/pkg/authentication/jwks_token_decoder.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/authentication_test.go (4)
router-tests/testenv/testenv.go (3)
Run(107-124)Config(286-342)Environment(1729-1765)router-tests/utils.go (1)
ConfigureAuth(45-57)router/core/router.go (2)
Option(172-172)WithAccessController(1815-1819)router/core/access_controller.go (1)
NewAccessController(23-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan
- GitHub Check: build_test
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: Analyze (go)
🔇 Additional comments (7)
router/pkg/authentication/jwks_token_decoder.go (1)
92-99: LGTM: removing on-demand refresh on unknown KID to prevent request blockingThe HTTP client now only uses scheduled refresh via jwkset storage. This should avoid stampedes/deadlocks on invalid tokens. One behavioral change to acknowledge: tokens signed with a newly rotated KID may fail until the next RefreshInterval elapses. Make sure this is documented and the interval is acceptable for your deployments.
Would you like a brief doc blurb added to JWKSConfig explaining the trade-off and guidance on RefreshInterval?
router-tests/header_set_test.go (1)
278-279: LGTM: updated TokenForKID call matches new APIExplicit false reads clearly. No concerns.
router-tests/authentication_test.go (5)
1222-1224: LGTM: API migration to 3-arg TokenForKIDMatches new helper semantics; clear intent with false.
1347-1351: LGTM: API migration to 3-arg TokenForKID in OIDC testsUsage is correct.
1461-1465: LGTM: API migration to 3-arg TokenForKID in multiple-keys testsNo issues.
1645-1647: LGTM: API migration to 3-arg TokenForKID in SupportedAlgorithmsWorks as intended.
831-867: Use a classic for-loop for Go <1.22 and close res.Body explicitly per iterationVerification failed — go.mod not found when running the check; confirm the project’s Go version. If the toolchain is <1.22, apply this portable loop + explicit close diff:
- for range 5 { + for i := 0; i < 5; i++ { func() { @@ - defer func() { - _ = res.Body.Close() - }() + // Close explicitly at end of iteration to avoid stacking defers. @@ - data, err := io.ReadAll(res.Body) + data, err := io.ReadAll(res.Body) require.NoError(t, err) require.JSONEq(t, unauthorizedExpectedData, string(data)) + _ = res.Body.Close() }() }
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
router-tests/authentication_test.go (5)
831-872: Good regression test to prevent request‑path blocking; consider a slightly larger timeout.To reduce flakiness on slow CI, bump maxDuration from 5s to 10s. Optional.
- maxDuration := 5 * time.Second + maxDuration := 10 * time.Second
1227-1227: Boolean arg is unclear at call site.TokenForKID(..., false) lacks intent. Consider an options pattern or a tiny helper wrapper in tests to document the flag.
1352-1352: Boolean arg is unclear at call site.Same nit: replace the bare false with a named option/helper for readability in tests.
1466-1466: Boolean arg is unclear at call site.Prefer a named option/helper over a raw false.
1650-1650: Boolean arg is unclear at call site.Use a helper/option to avoid “magic boolean” literals in tests.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
router-tests/authentication_test.go(5 hunks)router-tests/jwks/jwks.go(1 hunks)router/go.mod(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router-tests/jwks/jwks.go
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/authentication_test.go (5)
router-tests/testenv/testenv.go (3)
Run(107-124)Config(286-342)Environment(1729-1765)router-tests/utils.go (1)
ConfigureAuth(45-57)router/core/router.go (2)
Option(172-172)WithAccessController(1815-1819)router/core/access_controller.go (1)
NewAccessController(23-28)router-tests/testenv/utils.go (1)
AwaitChannelWithT(10-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan
- GitHub Check: build_test
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_test
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router/go.mod (1)
168-168: x/time moved to indirect — no lingering direct usage found.Search found only redis_rate.NewLimiter in router/core/ratelimiter.go (NewCosmoRateLimiter); no golang.org/x/time/rate imports or RefreshUnknownKID references detected.
This reverts commit e0f1e53.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
router-tests/jwks/jwks.go (1)
58-63: Fix nondeterministic provider selection (map iteration order is random).Randomly picking a provider will cause flaky tests and unpredictable alg selection when multiple providers exist.
Apply this diff for deterministic behavior (pick the sole provider, otherwise the lexicographically smallest KID):
- // If we don't care about the kid we don't care about the provider - // we just get the first random provider provided - for _, pr := range s.providers { - provider = pr - break - } + // Deterministic selection to avoid flakiness + if len(s.providers) == 1 { + for _, pr := range s.providers { provider = pr } + } else { + keys := make([]string, 0, len(s.providers)) + for k := range s.providers { keys = append(keys, k) } + sort.Strings(keys) + provider = s.providers[keys[0]] + }Add this import as well:
import ( "context" "encoding/json" "fmt" "log" "net" "net/http" "net/http/httptest" "testing" "time" + "sort"
🧹 Nitpick comments (2)
router-tests/jwks/jwks.go (2)
66-69: Guard against nil provider to avoid panic.If no providers are configured (or selection fails), calling methods on a nil Crypto will panic.
- token := jwt.NewWithClaims(provider.SigningMethod(), jwt.MapClaims(claims)) + if provider == nil { + return "", jwt.ErrInvalidKey + } + token := jwt.NewWithClaims(provider.SigningMethod(), jwt.MapClaims(claims))
53-53: Clarify API intent: avoid boolean flag parameter.Boolean flags in public APIs reduce readability. Consider adding a clear wrapper and using it in tests.
Add alongside the existing method:
// TokenForUnknownKID creates a token whose header KID is not present in the JWKS. func (s *Server) TokenForUnknownKID(kid string, claims map[string]any) (string, error) { return s.TokenForKID(kid, claims, true) }Optionally deprecate the boolean form in tests over time.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
router-tests/authentication_test.go(5 hunks)router-tests/jwks/jwks.go(1 hunks)router/go.mod(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router-tests/authentication_test.go
🔇 Additional comments (1)
router/go.mod (1)
168-168: LGTM — golang.org/x/time is indirect; no direct imports found.Scanned repo: no occurrences of golang.org/x/time or golang.org/x/time/rate; go.mod shows it only as indirect (router/go.mod:168, router-tests/go.mod:178, demo/go.mod:174). router/core/ratelimiter.go:36 calls redis_rate.NewLimiter (transitive dependency).
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
router-tests/authentication_test.go (1)
846-865: Stronger signal: issue the 5 calls concurrentlyThe regression involves blocking under concurrent load. Consider running the 5 requests in parallel (WaitGroup + per-call assertions) within the same
AwaitFuncto better reflect the real failure mode. Keep the existing sequential check if you prefer both.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
router-tests/authentication_test.go(5 hunks)router-tests/testenv/utils.go(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/authentication_test.go (4)
router-tests/testenv/testenv.go (3)
Run(107-124)Config(286-342)Environment(1729-1765)router-tests/utils.go (1)
ConfigureAuth(45-57)router/core/access_controller.go (1)
NewAccessController(23-28)router-tests/testenv/utils.go (1)
AwaitFunc(32-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
router-tests/authentication_test.go (2)
1223-1223: LGTM: updated TokenForKID signature usageCalls correctly pass the new boolean parameter (
false) to preserve prior behavior.Also applies to: 1348-1348, 1462-1462, 1645-1645
841-843: No action required — 'wg_static_kid' not present in JWKS fixtures.rg over router-tests/jwks and router-tests returned the KID only in router-tests/authentication_test.go (lines 839–843); no matches in JWKS fixtures, so the test KID is effectively unknown to the JWKS servers.
|
Great PR description! I recommend making this configurable. I agree that, by default, we should be pessimistic and abort requests where the KID can't be found. That means only the Example authentication:
jwt:
jwks:
- refresh_interval: 10m
url: "https://www.googleapis.com/oauth2/v3/certs"
refresh_unknown_kid:
enabled: true # Disabled by default
max_wait: 10s # RateLimitWaitMax (Something reasonable like 10s)
interval: 1m # rate.Every (Something reasonable like 1m)
burst: 10 # burst of rate.Every (Something reasonable like 2)RateLimitWaitMax is an option of |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (7)
router/pkg/config/testdata/config_full.json (1)
483-487: Example config uses zero interval/burst; prefer realistic, schema‑compliant values.To avoid copy/paste of ineffective settings, set non‑zero examples matching the schema defaults (interval ≥ 1s, burst ≥ 1). Example:
"RefreshUnknownKID": { - "Enabled": false, - "Interval": 0, - "Burst": 0 + "Enabled": false, + "Interval": 60000000000, + "Burst": 2 },Ensure this file isn’t used by schema validation; if it is, zero values will violate the new constraints.
Also applies to: 500-504, 514-518
router/core/supervisor_instance.go (1)
268-272: Add defensive guards for bad limiter settings.If users set enabled=true with interval ≤ 0 or burst < 1, the limiter can behave unexpectedly. Consider clamping or disabling with a warning before passing to the decoder.
router/pkg/config/config.go (1)
484-488: Optional: allow env overrides for nested fields.If you want environment control, add an env prefix on the nested block.
-type RefreshUnknownKIDConfig struct { +type RefreshUnknownKIDConfig struct { Enabled bool `yaml:"enabled" envDefault:"false"` Interval time.Duration `yaml:"interval" envDefault:"1m"` Burst int `yaml:"burst" envDefault:"2"` }And (optional) on the parent field:
-RefreshUnknownKID RefreshUnknownKIDConfig `yaml:"refresh_unknown_kid"` +RefreshUnknownKID RefreshUnknownKIDConfig `yaml:"refresh_unknown_kid" envPrefix:"REFRESH_UNKNOWN_KID_"`router/pkg/config/config.schema.json (1)
1719-1745: Schema block looks good; suggest adding max_wait and disallowing for secret mode.
- Add max_wait to cap per‑request wait on the limiter (matches jwkset HTTPClientOptions RateLimitWaitMax).
- Optionally forbid refresh_unknown_kid on secret-based JWK entries to avoid confusion.
"refresh_unknown_kid": { "type": "object", "description": "Controls rate-limited refresh behavior when a JWT KID is unknown.", "additionalProperties": false, "properties": { "enabled": { "type": "boolean", "description": "Enable refresh attempts on unknown KID.", "default": false }, "interval": { "type": "string", "description": "Token refill interval for the rate limiter.", "default": "1m", "duration": { "minimum": "1s" } }, + "max_wait": { + "type": "string", + "description": "Maximum time a request may wait for a refresh slot before skipping the refresh.", + "default": "10s", + "duration": { "minimum": "0s" } + }, "burst": { "type": "integer", "description": "Burst size for the rate limiter.", "default": 2, "minimum": 1 } } }And in the secret-based branch of the oneOf add:
- "not": { "anyOf": [ + "not": { "anyOf": [ { "required": ["url"] }, { "required": ["algorithms"] }, - { "required": ["refresh_interval"] } + { "required": ["refresh_interval"] }, + { "required": ["refresh_unknown_kid"] } ] }router/pkg/authentication/jwks_token_decoder.go (1)
108-114: Optional: wire a max wait to prevent long per‑request blocks.If you add refresh_unknown_kid.max_wait to config, pass it through to the HTTP client options.
- jwksetHTTPClientOptions := jwkset.HTTPClientOptions{ + jwksetHTTPClientOptions := jwkset.HTTPClientOptions{ HTTPURLs: map[string]jwkset.Storage{ c.URL: store }, PrioritizeHTTP: true, RefreshUnknownKID: refreshLimiter, + // RateLimitWaitMax caps per-request wait when refresh is enabled. + // Set when the config grows a corresponding field. + // RateLimitWaitMax: c.RefreshUnknownKID.MaxWait, }router-tests/authentication_test.go (2)
59-107: Stabilize timing assertion to avoid CI flakinessWith Interval=1s and Burst=1, the second request should wait ~1s. A 600ms lower bound is brittle on slower runners. Bump to ~900ms (or similar) to reduce flakes.
- require.True(t, elapsed >= 600*time.Millisecond) + require.True(t, elapsed >= 900*time.Millisecond, "expected refresh limiter to block ~1s; got %v", elapsed)
109-158: Relax non‑blocking threshold; 100ms can be too tight under loadEven when not blocked, end‑to‑end request times can exceed 100ms in CI. Loosen the bound a bit to avoid false failures.
- require.True(t, elapsed < 100*time.Millisecond) + require.True(t, elapsed < 250*time.Millisecond, "should not block when within burst; elapsed=%v", elapsed)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
router-tests/authentication_test.go(5 hunks)router/core/supervisor_instance.go(1 hunks)router/pkg/authentication/jwks_token_decoder.go(3 hunks)router/pkg/config/config.go(2 hunks)router/pkg/config/config.schema.json(1 hunks)router/pkg/config/testdata/config_full.json(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2224
File: router/pkg/authentication/jwks_token_decoder.go:176-199
Timestamp: 2025-09-17T18:13:31.021Z
Learning: In router/pkg/authentication/jwks_token_decoder.go, when AllowEmptyAlgorithm is true, the allowedAlgorithms slice is populated by getSupportedAlgorithms() from validation_store.go, ensuring it's never empty and preventing unintended token rejections.
Learnt from: SkArchon
PR: wundergraph/cosmo#2224
File: router/pkg/authentication/jwks_token_decoder.go:176-199
Timestamp: 2025-09-17T18:13:31.021Z
Learning: In router/pkg/authentication/jwks_token_decoder.go, when AllowEmptyAlgorithm is true, the allowedAlgorithms slice is populated by getSupportedAlgorithms() from validation_store.go, ensuring it's never empty and preventing unintended token rejections. The validation store maintains its own internal supported algorithms map that gets returned regardless of input parameters.
📚 Learning: 2025-09-17T13:51:50.823Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2216
File: router-tests/authentication_test.go:847-847
Timestamp: 2025-09-17T13:51:50.823Z
Learning: Go 1.22 introduced the ability to range over integers, making `for range n` syntax valid for iterating n times in loops.
Applied to files:
router-tests/authentication_test.go
🧬 Code graph analysis (4)
router/core/supervisor_instance.go (2)
router/pkg/authentication/jwks_token_decoder.go (1)
RefreshUnknownKIDConfig(56-60)router/pkg/config/config.go (1)
RefreshUnknownKIDConfig(484-488)
router/pkg/config/config.go (1)
router/pkg/authentication/jwks_token_decoder.go (1)
RefreshUnknownKIDConfig(56-60)
router/pkg/authentication/jwks_token_decoder.go (1)
router/pkg/config/config.go (1)
RefreshUnknownKIDConfig(484-488)
router-tests/authentication_test.go (6)
router-tests/testenv/testenv.go (3)
Run(107-124)Config(286-342)Environment(1729-1765)router-tests/jwks/jwks.go (1)
NewServer(174-180)router-tests/utils.go (1)
ConfigureAuthWithJwksConfig(59-70)router/pkg/authentication/jwks_token_decoder.go (2)
JWKSConfig(42-54)RefreshUnknownKIDConfig(56-60)router/core/access_controller.go (1)
NewAccessController(23-28)router-tests/testenv/utils.go (1)
AwaitFunc(32-42)
🔇 Additional comments (6)
router/core/supervisor_instance.go (1)
268-272: Wiring LGTM, but will be a no‑op until config decoding maps the nested block correctly.Once JWKSConfiguration uses yaml:"refresh_unknown_kid" (not inline), these fields will populate as expected.
router-tests/authentication_test.go (5)
1330-1331: LGTM: updated TokenForKID signature useExplicit third-arg usage looks correct here.
1455-1456: LGTM: updated TokenForKID signature useCallsite reflects the new API and intent.
1569-1570: LGTM: updated TokenForKID signature useConsistent with other callsites.
1753-1754: LGTM: updated TokenForKID signature useMatches the revised API in supported algorithms tests.
160-202: No change needed — repository uses Go 1.25, sofor range 5is supported.
All go.mod files declarego 1.25, satisfying the Go 1.22+ requirement.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
router/pkg/config/config.go (1)
484-490: Consider adding validation constraints for rate limiting parameters.The
RefreshUnknownKIDstruct looks well-structured with sensible defaults. However, consider adding validation to ensure that the rate limiting parameters make sense together (e.g.,Burstshould be positive whenEnabledis true,MaxWaitshould be reasonable, etc.).Based on the search results, I can see that Go provides several approaches for struct validation, with the most popular being the
go-playground/validatorpackage. However, looking at the current codebase structure and the need to keep things simple, I'd recommend adding basic validation logic rather than introducing a new dependency.type RefreshUnknownKID struct { - Enabled bool `yaml:"enabled" envDefault:"false"` - MaxWait time.Duration `yaml:"max_wait" envDefault:"2m"` - Interval time.Duration `yaml:"interval" envDefault:"30s"` - Burst int `yaml:"burst" envDefault:"2"` + Enabled bool `yaml:"enabled" envDefault:"false"` + MaxWait time.Duration `yaml:"max_wait" envDefault:"2m" validate:"min=1s,max=10m"` + Interval time.Duration `yaml:"interval" envDefault:"30s" validate:"min=1s,max=5m"` + Burst int `yaml:"burst" envDefault:"2" validate:"min=1,max=100"` +} + +// Validate checks that the RefreshUnknownKID configuration is valid +func (r *RefreshUnknownKID) Validate() error { + if r.Enabled { + if r.Burst < 1 { + return fmt.Errorf("burst must be at least 1 when refresh_unknown_kid is enabled") + } + if r.MaxWait < time.Second { + return fmt.Errorf("max_wait must be at least 1 second") + } + if r.Interval < time.Second { + return fmt.Errorf("interval must be at least 1 second") + } + if r.MaxWait < r.Interval { + return fmt.Errorf("max_wait (%v) should be greater than or equal to interval (%v)", r.MaxWait, r.Interval) + } + } + return nil }Or add this validation method to be called during configuration loading:
// Add to the Config struct or call during LoadConfig func (c *Config) validateJWKSConfiguration() error { for i, jwks := range c.Authentication.JWT.JWKS { if err := jwks.RefreshUnknownKID.Validate(); err != nil { return fmt.Errorf("invalid JWKS[%d] refresh_unknown_kid configuration: %w", i, err) } } return nil }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
router-tests/authentication_test.go(6 hunks)router/pkg/config/config.go(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-17T13:51:50.823Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2216
File: router-tests/authentication_test.go:847-847
Timestamp: 2025-09-17T13:51:50.823Z
Learning: Go 1.22 introduced the ability to range over integers, making `for range n` syntax valid for iterating n times in loops.
Applied to files:
router-tests/authentication_test.go
🧬 Code graph analysis (1)
router-tests/authentication_test.go (5)
router-tests/testenv/testenv.go (3)
Run(107-124)Config(286-342)Environment(1729-1765)router-tests/jwks/jwks.go (1)
NewServer(174-180)router-tests/utils.go (1)
ConfigureAuthWithJwksConfig(59-70)router/pkg/authentication/jwks_token_decoder.go (2)
JWKSConfig(42-54)RefreshUnknownKIDConfig(56-61)router-tests/testenv/utils.go (1)
AwaitFunc(32-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: integration_test (./telemetry)
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: build_push_image
- GitHub Check: build_test
- GitHub Check: build_push_image (nonroot)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (12)
router/pkg/config/config.go (1)
468-473: LGTM! Configuration structure properly implements the schema.The
RefreshUnknownKIDfield is now properly configured with the correct YAML tagyaml:"refresh_unknown_kid"which matches the expected nested structure in the configuration schema. The field placement withinJWKSConfigurationis appropriate and the integration looks good.router-tests/authentication_test.go (11)
11-12: LGTM: Appropriate imports for concurrency testingThe addition of
syncandsync/atomicimports is correct for the new parallel test cases that exercise rate limiting behaviors with concurrent requests.
61-109: Thorough test coverage for burst limit exceeded scenarioThis test effectively validates the rate limiting behavior when the burst limit is exceeded. The test correctly:
- Configures a restrictive burst limit of 1
- Makes two sequential requests with unknown KIDs
- Verifies the second request is delayed by at least 700ms (matching the 1-second interval minus some buffer)
111-160: Validates proper behavior when burst is within limitsThis test correctly demonstrates that when enough time passes between requests (allowing the rate limiter to regenerate tokens), subsequent requests don't experience blocking delays. The 1200ms sleep appropriately exceeds the 1-second interval.
164-213: Excellent test for max wait boundary conditionsThis test validates an important edge case where the rate limiter can predict that waiting wouldn't help (when required wait time exceeds MaxWait). The configuration is well-designed with MaxWait (700ms) being less than the interval (1s), ensuring immediate return.
215-264: Proper validation of wait behavior when MaxWait allows itThis test complements the previous one by showing the opposite scenario where MaxWait is larger than the interval, allowing the request to actually wait and succeed. The timing assertions are appropriate.
268-342: Sophisticated concurrent testing with atomic countersThis test demonstrates advanced testing techniques:
- Uses goroutines to simulate concurrent requests
- Employs atomic counters to track fast responses (< 100ms)
- Validates that exactly one request returns quickly while others wait
- The logic correctly accounts for the accumulated wait time scenario
344-386: Comprehensive test for disabled refresh behaviorThis test ensures that when
RefreshUnknownKIDis disabled (the default), requests don't block and return immediately with unauthorized status. The use oftestenv.AwaitFuncwith multiple iterations provides good coverage.
369-369: Consider consistent loop syntax usageBased on the retrieved learning, the
for range 5syntax is valid in Go 1.22+. However, for consistency across the codebase, consider whether all similar loops should use the same style.
1514-1514: Consistent usage of updated TokenForKID APIAll calls to
TokenForKIDconsistently usefalseas the third parameter, which appears to align with the test requirements. The usage pattern is consistent throughout the file.Also applies to: 1639-1639, 1753-1753, 1936-1936
2836-2842: Helper function properly supports algorithm constraintsThe
toJWKSConfighelper function correctly accepts variadicallowedAlgorithmsparameter and maps it to theAllowedAlgorithmsfield in the JWKSConfig struct, enabling algorithm filtering in tests.
85-85: No action required — the third parameter is intentional and correctTokenForKID is defined at router-tests/jwks/jwks.go:53 as func (s *Server) TokenForKID(kid string, claims map[string]any, useInvalidKID bool) ...; when useInvalidKID==true it selects any provider to sign the token while still setting token.Header[jwkset.HeaderKID]=kid, so calls like TokenForKID("unknown_kid", nil, true) intentionally produce a token signed by a valid key with an unknown KID header.
We right now have the following line in our JWT configuration
This means whenever a valid formed token (does not mean it is valid for auth) is sent to the router, it will try to find the correct JWKs entry based on the key id, but if it cannot find the key id it will on demand re-query the JWKs endpoint OUTSIDE of the refresh cycle within that request, hoping that the key id has been added to the JWKs. However to prevent misuse of this feature, there is a rate limit, so that if multiple requests are sent with an unknown keyid, we don't call the JWKs endpoint asap. However this blocks each request for 5 minutes cumulatively from my testing (because the burst is 1).
Cumulatively meaning,
if we send 6 requests at once
What happens now.
What happens with this PR
Also note on how max wait behaves with 2m and 30 second interval (0.5m = 30s) is the following
Summary by CodeRabbit
New Features
Configuration
Public API
Tests
Checklist