feat: expression support for retry condition#2167
Conversation
WalkthroughAdds expression-driven subgraph retry logic: new RetryContext and RetryExpressionManager, expression-to-ShouldRetry builder, Retry-After-aware transport retries and OnRetry semantics, config/schema retry expression fields, expanded WithSubgraphRetryOptions API, an expr dependency bump, and related tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (22)
router/pkg/config/testdata/config_defaults.json (1)
149-151: Default retry expression added — ensure empty/invalid expressions gracefully fall back at runtime.The default expression matches the PR intent. Please confirm the loader/evaluator:
- Treats empty or whitespace-only expressions as “use default”.
- On compile/eval error, logs a clear message including the offending config path (All vs Subgraph) and falls back to the default instead of disabling retries.
You could add a tiny regression around defaults to lock this in. Example (Go test outline): load this defaults file, assert that the resolved expression equals the runtime DefaultRetryExpression and that the builder returns a ShouldRetry func which retries on 500 and timeout errors.
router/pkg/config/testdata/config_full.json (1)
353-355: Clarify semantics of empty expression for subgraph-level override.Setting "Expression": "" for products implies fallback-to-default at runtime. Two suggestions:
- Document in code comments/tests that empty string and missing property behave identically.
- Optionally consider omitting the property (or allowing null) in the sample to make “inherit default” self-explanatory.
To verify behavior, assert in tests that:
- Empty string => uses DefaultRetryExpression
- Non-empty invalid expression => logs + falls back to DefaultRetryExpression
- Valid custom expression (e.g., "statusCode == 503") is honored
router/pkg/config/config.schema.json (1)
3242-3247: Strengthen schema docs with examples and explicit field/method list.Great addition. Concrete examples will help users and tooling. Consider enriching the property with examples and a crisper “available fields/methods” list to mirror the PR description.
Apply this diff inside the "expression" property:
- "expression": { - "type": "string", - "description": "The expression used to determine if a request should be retried. The expression can reference status codes, error messages, and helper functions like IsRetryableStatusCode(), IsConnectionError(), IsHttpReadTimeout(), IsTimeout() (includes HTTP read timeouts). See https://expr-lang.org/ for expression syntax. Note: Mutations are never retried regardless of this expression. EOF errors are always retried at the transport layer regardless of this expression.", - "default": "IsRetryableStatusCode() || IsConnectionError() || IsTimeout()" - } + "expression": { + "type": "string", + "description": "Boolean expr-lang expression deciding whether to retry a subgraph request. Available methods: IsTimeout(), IsConnectionError(), IsConnectionRefused(), IsConnectionReset(), Is5xxError(), IsRetryableStatusCode(), IsHttpReadTimeout(). Available fields: status_code (int), error (string). See https://expr-lang.org/ for syntax. Notes: Mutations are never retried regardless of this expression. EOF errors are always retried at the transport layer regardless of this expression.", + "default": "IsRetryableStatusCode() || IsConnectionError() || IsTimeout()", + "examples": [ + "IsRetryableStatusCode()", + "IsRetryableStatusCode() || IsTimeout() || IsConnectionError()", + "Is5xxError() || IsTimeout() || IsConnectionError()", + "IsTimeout()", + "IsHttpReadTimeout()", + "IsConnectionError()", + "status_code == 503", + "error contains \"timeout\"" + ] + }Also ensure the runtime treats empty string as “inherit default”. If that’s the contract, consider adding a short note like “Empty string inherits the default” to the description.
router/internal/expr/retry_context.go (6)
21-31: Broaden HTTP read-timeout detection to cover common net/http phrasing.Some Go versions and environments surface the header wait timeout as “Client.Timeout exceeded while awaiting headers”. We already lowercase the string, so consider matching that variant too, not only “timeout awaiting response headers”.
Apply this diff:
func (ctx RetryContext) IsHttpReadTimeout() bool { // Only check for HTTP-specific timeout awaiting response headers if ctx.Error != "" { errLower := strings.ToLower(ctx.Error) - return strings.Contains(errLower, "timeout awaiting response headers") + return strings.Contains(errLower, "timeout awaiting response headers") || + strings.Contains(errLower, "client.timeout exceeded while awaiting headers") } return false }
3-10: Import context to robustly detect deadline exceeded errors.To be future-proof and explicit, add context to imports; we’ll use it right below in IsTimeout().
import ( "errors" + "context" "net" "net/http" "os" "strings" "syscall" )
33-58: Consider explicitly handling context.DeadlineExceeded in IsTimeout().os.ErrDeadlineExceeded maps to context.DeadlineExceeded in modern Go, but being explicit improves readability and backward compatibility with older toolchains. No functional change for Go ≥1.20; safer on mixed environments.
func (ctx RetryContext) IsTimeout() bool { // Check for HTTP-specific read timeouts if ctx.IsHttpReadTimeout() { return true } // Check for net package timeout errors using the standard Go method if ctx.originalError != nil { if netErr, ok := ctx.originalError.(net.Error); ok && netErr.Timeout() { return true } // Check for deadline exceeded errors - if errors.Is(ctx.originalError, os.ErrDeadlineExceeded) { + if errors.Is(ctx.originalError, os.ErrDeadlineExceeded) || + errors.Is(ctx.originalError, context.DeadlineExceeded) { return true } // Also check for direct syscall timeout errors not wrapped in net.Error if errors.Is(ctx.originalError, syscall.ETIMEDOUT) { return true } } return false }
60-78: Expand connection error classification (DNS/route/unreachable).Today we only cover refused/reset and a few strings. Consider catching common “network is unreachable” classes and DNS failures via errors.As and syscall checks. This reduces reliance on error message wording.
func (ctx RetryContext) IsConnectionError() bool { // Use existing helpers for specific connection errors if ctx.IsConnectionRefused() || ctx.IsConnectionReset() { return true } + if ctx.originalError != nil { + // DNS resolution errors (e.g., NXDOMAIN, temporary failures) + var dnsErr *net.DNSError + if errors.As(ctx.originalError, &dnsErr) { + return true + } + // Common routing/unreachable classes + if errors.Is(ctx.originalError, syscall.EHOSTUNREACH) || + errors.Is(ctx.originalError, syscall.ENETUNREACH) || + errors.Is(ctx.originalError, syscall.ECONNABORTED) { + return true + } + } + // Fall back to string matching for other connection errors not covered by specific helpers if ctx.Error != "" { errLower := strings.ToLower(ctx.Error) return strings.Contains(errLower, "no such host") || + strings.Contains(errLower, "no route to host") || + strings.Contains(errLower, "network is unreachable") || strings.Contains(errLower, "handshake failure") || strings.Contains(errLower, "handshake timeout") } return false }
21-31: Optional: Go initialism style alias (IsHTTPReadTimeout).External expressions currently use IsHttpReadTimeout(). If you want to align with Go naming without breaking expressions, add a small alias that delegates to the existing method.
func (ctx RetryContext) IsHttpReadTimeout() bool { // ... } +// IsHTTPReadTimeout is an alias for IsHttpReadTimeout to satisfy Go initialism conventions. +// It’s exposed so users may call either in expressions. +func (ctx RetryContext) IsHTTPReadTimeout() bool { + return ctx.IsHttpReadTimeout() +}
101-116: Cross-platform errno (Windows) note.If Windows clients/routers are relevant, consider mapping WSAECONNREFUSED/WSAECONNRESET under build tags in a platform-specific file. Referencing those constants directly here would break non-Windows builds, so only do it with per-OS files.
router/pkg/config/config.go (1)
231-238: Default expression duplication risk between config and core.The envDefault here hardcodes the same default as core.DefaultRetryExpression. If one changes and the other doesn’t, behavior and docs will drift.
Options:
- Minimal: Add a unit test (in core or a small integration test) that asserts the two defaults match.
- Better: Remove envDefault here and rely on BuildRetryFunction("") to inject the default; the config stays empty when not specified. This avoids duplication but would require updating config_defaults.json expectations.
Would you like me to draft the test or a refactor plan?
router/core/supervisor_instance.go (1)
194-203: Consider future per-subgraph overrides for retry expressions.Transport and circuit breaker already support SubgraphMap overrides. If product direction allows, adding per-subgraph retry expression overrides would complete the symmetry with other traffic-shaping knobs.
router/core/retry_expression_builder_test.go (1)
93-114: Add coverage for HTTP read-timeout string matching.Given IsHttpReadTimeout uses string inspection, add a test to ensure “timeout awaiting response headers” (and the common “Client.Timeout exceeded…” variant) triggers retries under the default expression.
Apply this diff to append a new test case:
@@ t.Run("empty expression uses default", func(t *testing.T) { fn, err := BuildRetryFunction("", logger) assert.NoError(t, err) assert.NotNil(t, fn) @@ assert.False(t, fn(err, req, nil)) }) + t.Run("http read timeout strings are treated as retryable by default", func(t *testing.T) { + fn, err := BuildRetryFunction(DefaultRetryExpression, logger) + assert.NoError(t, err) + assert.NotNil(t, fn) + + req, _ := http.NewRequest("GET", "http://example.com", nil) + + // net/http canonical phrasing + err = errors.New("net/http: timeout awaiting response headers") + assert.True(t, fn(err, req, nil)) + + // Common alternate phrasing + err = errors.New("Client.Timeout exceeded while awaiting headers") + assert.True(t, fn(err, req, nil)) + })router/internal/retrytransport/retry_transport.go (2)
52-54: Consider adding safeguards against excessive backoff duration.The backoff mechanism could potentially wait for long durations as
MaxDurationincreases. Consider adding documentation about recommended values or implementing a safety check to prevent overly long sleep durations that could impact request latency.Would you like me to suggest appropriate safeguards or documentation for backoff duration limits?
59-59: Consider refactoring the retry condition for better clarity.The retry condition combines EOF retries with expression-based retries using an OR operation. This could be made clearer by extracting the logic into a separate method or adding a comment explaining the precedence.
- for (isDefaultRetryableError(err) || rt.RetryOptions.ShouldRetry(err, req, resp)) && retries < rt.RetryOptions.MaxRetryCount { + // Retry if: (1) it's an EOF error (always retryable) OR (2) the expression evaluates to true + // AND we haven't exceeded the max retry count + for (isDefaultRetryableError(err) || rt.RetryOptions.ShouldRetry(err, req, resp)) && retries < rt.RetryOptions.MaxRetryCount {router/internal/expr/retry_expression_test.go (8)
128-136: Future-proof the table test loop and tighten failure expectations.
- Capture the range variable to prevent the classic t.Run + range pitfall if these ever become parallelized.
- When a compile error is expected, also assert that the manager is nil.
- for _, tt := range tests { - t.Run(tt.name, func(t *testing.T) { + for _, tt := range tests { + tt := tt // capture range variable for safety if parallelization is introduced + t.Run(tt.name, func(t *testing.T) { manager, err := NewRetryExpressionManager(tt.expression) if tt.expectErr { - assert.Error(t, err) + assert.Error(t, err) + assert.Nil(t, manager) return } require.NoError(t, err) require.NotNil(t, manager)
79-104: Normalize subtest names to match exported helper identifiers.Purely cosmetic but keeps names consistent with the public API and improves grepability.
- name: "is5xxError helper", + name: "Is5xxError helper", ... - name: "is5xxError with non-5xx", + name: "Is5xxError with non-5xx", ... - name: "isConnectionError helper", + name: "IsConnectionError helper", ... - name: "isRetryableStatusCode helper", + name: "IsRetryableStatusCode helper",
103-109: Exercise the full decision table for IsRetryableStatusCode (500, 502, 503, 504, 429) and a negative.Right now only 429 is covered. Add explicit status cases to lock the contract.
{ - name: "isRetryableStatusCode helper", + name: "IsRetryableStatusCode helper", expression: "IsRetryableStatusCode()", ctx: RetryContext{StatusCode: 429}, expected: true, }, + { + name: "IsRetryableStatusCode 500", + expression: "IsRetryableStatusCode()", + ctx: RetryContext{StatusCode: 500}, + expected: true, + }, + { + name: "IsRetryableStatusCode 502", + expression: "IsRetryableStatusCode()", + ctx: RetryContext{StatusCode: 502}, + expected: true, + }, + { + name: "IsRetryableStatusCode 503", + expression: "IsRetryableStatusCode()", + ctx: RetryContext{StatusCode: 503}, + expected: true, + }, + { + name: "IsRetryableStatusCode 504", + expression: "IsRetryableStatusCode()", + ctx: RetryContext{StatusCode: 504}, + expected: true, + }, + { + name: "IsRetryableStatusCode negative (501)", + expression: "IsRetryableStatusCode()", + ctx: RetryContext{StatusCode: 501}, + expected: false, + },
3-14: Add context.DeadlineExceeded coverage (imports).You already cover os.ErrDeadlineExceeded; adding context.DeadlineExceeded ensures both common timeout sentinels are recognized.
import ( + "context" "errors" "fmt" "net" "net/http" "os" "syscall" "testing"
351-400: Add a test for context.DeadlineExceeded in IsTimeout syscall section.This is a frequent timeout surface (e.g., context with deadline on client). Ensuring positive detection keeps behavior consistent with os.ErrDeadlineExceeded.
tests := []struct { name string err error expected bool }{ { name: "ETIMEDOUT detected", err: syscall.ETIMEDOUT, expected: true, }, + { + name: "context.DeadlineExceeded detected", + err: context.DeadlineExceeded, + expected: true, + }, { name: "wrapped ETIMEDOUT detected", err: fmt.Errorf("read error: %w", syscall.ETIMEDOUT), expected: true, },
16-149: Optionally test runtime evaluation errors from ShouldRetry.You cover compile-time errors and empty expressions. Add a runtime failure (e.g., division by zero based on context) to validate error propagation from ShouldRetry.
Add this new test (outside the existing table) near other manager tests:
func TestRetryExpressionManager_RuntimeError(t *testing.T) { expression := "(1 / statusCode) > 0" // compiles, but will divide by zero at runtime when statusCode == 0 manager, err := NewRetryExpressionManager(expression) require.NoError(t, err) require.NotNil(t, manager) ctx := RetryContext{StatusCode: 0} _, runErr := manager.ShouldRetry(ctx) assert.Error(t, runErr) }
16-35: Follow-up: consider constraining expression size with expr.MaxNodes at compile time.Small, but valuable hardening: set a reasonable node budget (and unit-test the rejection path) to prevent user-supplied expressions from becoming pathological.
- Example: add expr.MaxNodes(N) to NewRetryExpressionManager compile options and a test asserting compilation fails for an oversized expression.
Would you like a follow-up PR to add expr.MaxNodes and a unit test that exercises the budget breach?
16-126: Optional: add a tiny property loop for Is5xxError.A quick loop from 500..599 would catch accidental off-by-one regressions (e.g., 600 boundary).
Add alongside existing Is5xxError tests:
t.Run("Is5xxError range 500..599", func(t *testing.T) { manager, err := NewRetryExpressionManager("Is5xxError()") require.NoError(t, err) require.NotNil(t, manager) for code := 500; code <= 599; code++ { t.Run(fmt.Sprintf("code=%d", code), func(t *testing.T) { ctx := RetryContext{StatusCode: code} result, err := manager.ShouldRetry(ctx) assert.NoError(t, err) assert.True(t, result) }) } })
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (2)
router-tests/go.sumis excluded by!**/*.sumrouter/go.sumis excluded by!**/*.sum
📒 Files selected for processing (16)
router-tests/go.mod(1 hunks)router/core/graph_server.go(2 hunks)router/core/retry_expression_builder.go(1 hunks)router/core/retry_expression_builder_test.go(1 hunks)router/core/router.go(1 hunks)router/core/supervisor_instance.go(1 hunks)router/go.mod(1 hunks)router/internal/expr/retry_context.go(1 hunks)router/internal/expr/retry_expression.go(1 hunks)router/internal/expr/retry_expression_test.go(1 hunks)router/internal/retrytransport/retry_transport.go(4 hunks)router/internal/retrytransport/retry_transport_test.go(12 hunks)router/pkg/config/config.go(1 hunks)router/pkg/config/config.schema.json(1 hunks)router/pkg/config/testdata/config_defaults.json(1 hunks)router/pkg/config/testdata/config_full.json(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-20T22:13:25.222Z
Learnt from: StarpTech
PR: wundergraph/cosmo#2157
File: router-tests/go.mod:16-16
Timestamp: 2025-08-20T22:13:25.222Z
Learning: github.com/mark3labs/mcp-go v0.38.0 has regressions and should not be used in the wundergraph/cosmo project. v0.36.0 is the stable version that should be used across router-tests and other modules.
Applied to files:
router-tests/go.mod
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router/core/supervisor_instance.go
🧬 Code graph analysis (9)
router/core/retry_expression_builder.go (3)
router/internal/retrytransport/retry_transport.go (1)
ShouldRetryFunc(13-13)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
router/core/retry_expression_builder_test.go (1)
router/core/retry_expression_builder.go (2)
BuildRetryFunction(15-50)DefaultRetryExpression(12-12)
router/core/router.go (1)
router/internal/retrytransport/retry_transport.go (1)
RetryOptions(15-23)
router/internal/expr/retry_expression.go (1)
router/internal/expr/retry_context.go (1)
RetryContext(13-19)
router/internal/expr/retry_context.go (1)
router/pkg/pubsub/datasource/error.go (1)
Error(3-6)
router/core/graph_server.go (2)
router/core/retry_expression_builder.go (1)
BuildRetryFunction(15-50)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(15-23)
router/internal/expr/retry_expression_test.go (2)
router/internal/expr/retry_context.go (2)
RetryContext(13-19)LoadRetryContext(138-152)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)
router/core/supervisor_instance.go (1)
router/pkg/config/config.go (1)
BackoffJitterRetry(231-238)
router/internal/retrytransport/retry_transport_test.go (1)
router/internal/retrytransport/retry_transport.go (2)
RetryHTTPTransport(27-31)RetryOptions(15-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_test
- GitHub Check: build_push_image (nonroot)
- GitHub Check: lint
- GitHub Check: image_scan
- GitHub Check: Analyze (go)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: image_scan (nonroot)
- GitHub Check: integration_test (./events)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (21)
router/go.mod (1)
67-67: Approve expr v1.17.6 upgrade; no breaking evaluator changes detected
- Release notes for v1.17.6 (Aug 12, 2025) show only parser/lexer improvements (#811), struct-tag filtering and “in” behavior (#806), unaddressable‐slice fix (#803), and a VM memory‐leak patch (#813)—no changes to opcode/AST semantics, short-circuiting, type coercion, IsTimeout(), or string-contains behavior (github.com)
- Verified all expr.Compile and expr.Run call sites handle errors appropriately via handleExpressionError or wrapped fmt.Errorf:
- router/internal/expr/resolvers.go (lines 12–14, 23–25, 43–45)
- router/internal/expr/retry_expression.go (lines 28–30, 45–47)
- router/internal/expr/expr_manager.go (lines 41–43)
- No further changes required; merging.
router/pkg/config/testdata/config_full.json (1)
314-316: Top-level “All” expression matches intended default — LGTM.Keeps behavior explicit in the full sample and aligns with defaults.
router/internal/expr/retry_context.go (1)
86-100: Retryable status codes: looks good.The 500/502/503/504/429 set matches common practice. No further action.
router/core/supervisor_instance.go (1)
196-203: Wiring expression: LGTM.Plumbing BackoffJitterRetry.Expression into WithSubgraphRetryOptions is correct and preserves the “empty means default” semantics downstream.
router/internal/retrytransport/retry_transport.go (2)
15-23: LGTM!The addition of the
Expressionfield toRetryOptionsenables expression-based retry configuration while maintaining backward compatibility. The existingShouldRetryfield continues to hold the compiled function for runtime evaluation.
120-130: LGTM!Good implementation of the default retry behavior for EOF errors. The function correctly checks for the "unexpected eof" error which indicates connection issues that should always trigger a retry, regardless of the configured expression.
router/core/retry_expression_builder.go (2)
15-50: LGTM! Well-structured retry expression builder.The implementation properly handles empty expressions by defaulting to
DefaultRetryExpression, ensures mutations are never retried, and includes proper error handling for expression evaluation failures. The error logging is appropriately placed to help debug configuration issues without failing the request.
30-31: No missingisMutationRequest– function exists in the same package
- The function
isMutationRequest(ctx context.Context) boolis implemented in router/core/context.go at lines 638–644.- Since both files reside in the
router/corepackage, no additional import is required.No further action required.
router/core/graph_server.go (3)
1159-1164: LGTM! Proper error handling for retry expression compilation.The code correctly builds the retry function from the expression and handles compilation errors appropriately by returning an error that will prevent the server from starting with an invalid configuration.
1165-1172: LGTM! Clean retry options setup.The
RetryOptionsstruct is properly initialized with all necessary fields, including the compiledShouldRetryfunction and the originalExpressionfor reference/debugging purposes.
1188-1201: Review TransportOptions utilizationBased on the struct definition (router/core/transport.go:354–367) and the graph server initialization (router/core/graph_server.go:1188–1201), all three new fields are indeed declared and populated:
- CircuitBreaker: defined in the struct and set to
s.circuitBreakerManagerin graph_server.go- Logger: defined in the struct and set to
s.loggerin graph_server.go- EnableTraceClient: defined in the struct and set to
enableTraceClientin graph_server.goAdditionally, we confirmed that
opts.Loggeris consumed by the transport layer (router/core/transport.go:384) and that subgraph-circuit-breaker options are exercised in existing circuit-breaker tests. However, I didn’t find an explicit use of:
- opts.CircuitBreaker within the request execution pipeline
- opts.EnableTraceClient to conditionally wrap or inject the trace client
Please review
router/core/transport.go(around the struct literal at lines ~362–384) and ensure that bothCircuitBreakerandEnableTraceClientare actually acted upon when building or executing transports, so that neither flag becomes a silent no-op.router/internal/expr/retry_expression.go (2)
17-36: LGTM! Well-designed expression manager with proper null handling.The constructor properly handles empty expressions by returning
nil, allowing for graceful degradation to default behavior. The expression compilation is correctly configured withRetryContextas the environment and boolean return type enforcement.
38-56: LGTM! Robust evaluation with proper type checking.The
ShouldRetrymethod includes appropriate nil checks, comprehensive error handling, and type assertion to ensure the expression returns a boolean value. The fallback tofalsewhen no expression is configured is a safe default.router/internal/retrytransport/retry_transport_test.go (5)
20-30: LGTM! Good test helper function.The
simpleShouldRetryfunction provides a clear and simple retry logic for testing purposes, making the tests more readable and maintainable.
40-84: LGTM! Comprehensive test for HTTP 5xx retries.The test properly validates that the retry mechanism respects
MaxRetryCountand tracks both retry count and total attempts. The assertions correctly verify that the system retries exactlymaxRetriestimes and makesmaxRetries + 1total attempts.
130-188: LGTM! Excellent test for custom ShouldRetry logic.This test effectively validates that the retry mechanism properly respects the
ShouldRetryfunction's decision to stop retrying when it returnsfalse. The scenario progression from retryable errors to a non-retryable error is well-designed.
218-258: LGTM! Important short-circuit test.This test validates a critical optimization - ensuring that successful responses immediately return without triggering any retries. The assertion that
OnRetryshould not be called is a good verification of this behavior.
448-506: LGTM! Thorough test of OnRetry callback.Excellent test coverage for the
OnRetrycallback, validating that it's called with correct parameters (0-based retry count, error, and response) for each retry attempt. The test properly captures and validates all callback invocations.router/internal/expr/retry_expression_test.go (3)
16-554: Excellent coverage and realistic scenarios — this suite meaningfully de-risks the expression engine.
- Broad table-driven checks, syscall-wrapped errors, mixed-case string matching, and integration-style ShouldRetry assertions are all on point.
- Nice separation of concerns: manager compile semantics, context loader, syscall helpers, and composed expressions are each exercised.
498-554: Nice syscall + status composition tests.These cover the cross-product of helpers and HTTP codes well and verify expression wiring end-to-end.
1-554: Sanity check — defaults & integration verifiedI ran the grep you requested. Evidence shows the default expression, the builder, the mutation guard, EOF handling, and tests are present.
- router/core/retry_expression_builder.go — DefaultRetryExpression = "IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"; BuildRetryFunction falls back to the default when empty and the returned ShouldRetry func early-returns false for mutation requests (mutation guard).
- router/internal/retrytransport/retry_transport.go — transport retry loop uses isDefaultRetryableError(err) || rt.RetryOptions.ShouldRetry(err, req, resp); EOF is handled as always-retryable (comment + strings.Contains(errStr, "unexpected eof")).
- router/core/retry_expression_builder_test.go — unit tests call BuildRetryFunction(DefaultRetryExpression, ...) and explicitly assert EOF is handled at the transport layer (assert.False for "unexpected EOF").
- router/internal/expr/retry_expression_test.go — comprehensive expression tests (status codes, IsTimeout/IsHttpReadTimeout, syscall errors) exercise manager.ShouldRetry.
- router/pkg/config/config.schema.json — documentation/schema includes the same default expression and the explicit note: "Mutations are never retried regardless of this expression. EOF errors are always retried at the transport layer regardless of this expression."
Conclusion: the default expression, builder behavior, mutation guard, EOF transport handling, and tests are implemented and wired as requested. No further changes required from this check.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
router/core/retry_expression_builder.go (3)
28-33: Defensive nil request check to avoid panicShouldRetry dereferences req via req.Context(). If an upstream caller ever passes a nil request (even by mistake), this will panic. Cheap guard recommended.
- return func(err error, req *http.Request, resp *http.Response) bool { - reqContext := getRequestContext(req.Context()) + return func(err error, req *http.Request, resp *http.Response) bool { + // Defensive: ensure we never panic on a nil *http.Request + if req == nil { + return false + } + reqContext := getRequestContext(req.Context())
35-38: Use OperationTypeMutation constant and guard missing operation contextComparing to the string "mutation" is brittle, and Operation() may be nil for some edge flows. Guard and use the typed constant to avoid regressions if the literal ever changes.
- // Never retry mutations, regardless of expression result - if reqContext.Operation().Type() == "mutation" { - return false - } + // Never retry mutations, regardless of expression result + if op := reqContext.Operation(); op == nil || op.Type() == OperationTypeMutation { + return false + }
47-51: Optional: enrich evaluation-error logs with contextIf evaluation fails repeatedly, having quick context helps. Consider adding operation name/type and statusCode/error from the retry context to the log entry.
Example:
- zap.String("operation", reqContext.Operation().Name())
- zap.String("operation_type", string(reqContext.Operation().Type()))
- zap.Int("status_code", ctx.StatusCode)
- zap.String("error", ctx.Error)
router/core/graph_server.go (1)
1159-1172: Keep RetryOptions.Expression in sync with the effective expression (handles empty -> default)You compile a fallback expression when config is empty, but RetryOptions.Expression still carries the original (possibly empty) value. Persist the effective expression for observability/metrics consistency.
- // Build retry options and handle any expression compilation errors - shouldRetryFunc, err := BuildRetryFunction(s.retryOptions.Expression) + // Build retry options and handle any expression compilation errors + effectiveExpr := s.retryOptions.Expression + if effectiveExpr == "" { + effectiveExpr = DefaultRetryExpression + } + shouldRetryFunc, err := BuildRetryFunction(effectiveExpr) if err != nil { return nil, fmt.Errorf("failed to build retry function: %w", err) } - retryOptions := retrytransport.RetryOptions{ + retryOptions := retrytransport.RetryOptions{ Enabled: s.retryOptions.Enabled, MaxRetryCount: s.retryOptions.MaxRetryCount, MaxDuration: s.retryOptions.MaxDuration, Interval: s.retryOptions.Interval, - Expression: s.retryOptions.Expression, + Expression: effectiveExpr, ShouldRetry: shouldRetryFunc, }Optional micro-optimization (no diff necessary): only compile the expression when s.retryOptions.Enabled is true; otherwise, defer the cost. Transport can handle a nil ShouldRetry when retries are disabled.
router/core/retry_expression_builder_test.go (3)
317-363: Add a test for missing request context (nil reqContext) to pin the early-return behaviorThere’s no test that exercises ShouldRetry when the request lacks a bound requestContext. This helps prevent accidental future changes that remove the guard.
Add the following test (outside the current subtests):
func TestBuildRetryFunction_NoRequestContext(t *testing.T) { fn, err := BuildRetryFunction("true") assert.NoError(t, err) assert.NotNil(t, fn) // Intentionally do NOT attach a requestContext req, _ := http.NewRequest("POST", "http://example.com/graphql", nil) resp := &http.Response{StatusCode: 500} // Should short-circuit to false without a request context assert.False(t, fn(nil, req, resp)) }
51-74: Optional: mark subtests parallel and use require for construction errorsSpeed up the suite and fail fast on setup errors.
- Call t.Parallel() at the start of TestBuildRetryFunction and in each t.Run body.
- Use require.NoError for BuildRetryFunction(...) to avoid follow-up nil derefs in the same subtest.
68-73: Optional: use a realistic connection errorTo better reflect IsConnectionError() semantics, consider constructing a net.OpError wrapping syscall.ECONNREFUSED instead of a plain error string. Keeps tests robust if implementation stops matching string contents.
Example:
opErr := &net.OpError{Op: "dial", Err: syscall.ECONNREFUSED} assert.True(t, fn(opErr, req, nil))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
router-tests/go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
router/core/context.go(0 hunks)router/core/graph_server.go(2 hunks)router/core/retry_expression_builder.go(1 hunks)router/core/retry_expression_builder_test.go(1 hunks)
💤 Files with no reviewable changes (1)
- router/core/context.go
🧰 Additional context used
🧬 Code graph analysis (3)
router/core/retry_expression_builder_test.go (3)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/core/retry_expression_builder.go (2)
BuildRetryFunction(15-58)DefaultRetryExpression(12-12)router/core/context.go (3)
OperationTypeQuery(488-488)OperationTypeMutation(489-489)OperationTypeSubscription(490-490)
router/core/retry_expression_builder.go (4)
router/internal/retrytransport/retry_transport.go (1)
ShouldRetryFunc(13-13)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
router/core/graph_server.go (2)
router/core/retry_expression_builder.go (1)
BuildRetryFunction(15-58)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(15-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: lint
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan
- GitHub Check: image_scan (nonroot)
- GitHub Check: Analyze (go)
- GitHub Check: build_push_image
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
🔇 Additional comments (4)
router/core/retry_expression_builder.go (1)
14-26: Good defaulting and compile-time validationUsing DefaultRetryExpression as a fallback and compiling once up front is solid. Clear error wrapping on compile failures is also helpful for ops.
router/core/graph_server.go (1)
1189-1195: LGTM: wiring retry into TransportOptionsRetryOptions plumbing into TransportOptions looks correct and preserves existing pre/post handlers and metrics.
router/core/retry_expression_builder_test.go (2)
49-151: Strong coverage across default/empty/custom expressionsGood breadth: default path, empty expression fallback, compile error, status and error predicates. Clear assertions and readable helpers.
193-217: Mutation guard behavior well coveredValidates that even “true” expressions won’t retry mutations. This preserves the contract regardless of config.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
router-tests/structured_logging_test.go (1)
712-712: Optional: de-duplicate the repeated “no-retry” option.This file repeats core.WithSubgraphRetryOptions(false, 0, 0, 0, "") several times. Consider a small helper var to DRY it up.
Apply at file scope (near imports or top-level consts):
+var noSubgraphRetry = core.WithSubgraphRetryOptions(false, 0, 0, 0, "")And then replace per call site, e.g.:
- core.WithSubgraphRetryOptions(false, 0, 0, 0, ""), + noSubgraphRetry,router-tests/error_handling_test.go (1)
1383-1480: Optional: factor a shared “no-retry” Option for readability.This file also repeats the disabled-retry Option. Consider a file-local var to reduce repetition.
Add near the top of the file:
+var noSubgraphRetry = core.WithSubgraphRetryOptions(false, 0, 0, 0, "")Then replace occurrences within RouterOptions:
- core.WithSubgraphRetryOptions(false, 0, 0, 0, ""), + noSubgraphRetry,router-tests/telemetry/telemetry_test.go (1)
8709-8710: Use explicit default retry expression constant for clarity (nit).Passing an empty string works (retries are disabled here anyway), but using an explicit default improves readability and future-proofs intent if defaults ever change.
- core.WithSubgraphRetryOptions(false, 0, 0, 0, ""), + core.WithSubgraphRetryOptions(false, 0, 0, 0, core.DefaultRetryExpression),If
core.DefaultRetryExpressionisn’t exported in the core package, keep the empty string.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
router-tests/error_handling_test.go(4 hunks)router-tests/panic_test.go(2 hunks)router-tests/structured_logging_test.go(5 hunks)router-tests/telemetry/telemetry_test.go(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router-tests/telemetry/telemetry_test.gorouter-tests/structured_logging_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.
Applied to files:
router-tests/telemetry/telemetry_test.go
🧬 Code graph analysis (4)
router-tests/error_handling_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1752-1762)
router-tests/panic_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1752-1762)
router-tests/telemetry/telemetry_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1752-1762)
router-tests/structured_logging_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1752-1762)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: lint
- GitHub Check: Analyze (go)
🔇 Additional comments (12)
router-tests/panic_test.go (2)
51-51: API arity update applied correctly (expression arg).Passing "" for the new expression while retries are disabled (enabled=false) keeps behavior unchanged. LGTM.
83-83: Consistent update to WithSubgraphRetryOptions.Same note as above; good alignment with the new 5-arg signature.
router-tests/structured_logging_test.go (6)
712-712: Arity bump handled; retries remain disabled.Adding the empty expression argument is correct and preserves the intent of the test (NoRetryClient + disabled subgraph retries).
832-832: Correct propagation of new expression parameter.No change in semantics since enabled=false; good.
964-964: LGTM: expression parameter added while keeping retries off.Matches the updated Option signature.
1100-1100: LGTM: updated WithSubgraphRetryOptions call.Maintains previous behavior of the test.
2214-2214: LGTM: consistent use of the new argument across tests.All call sites here pass the empty expression; consistent with other tests in this suite.
712-712: Verification Passed: AllWithSubgraphRetryOptionsCall Sites Now Use 5 Arguments
Ran the recommended repo-wide verifier script; no instances with fewer than five arguments were found.router-tests/error_handling_test.go (4)
1383-1383: Updated call site matches new signature.Empty expression with retries disabled is correct for this test setup.
1416-1416: LGTM: expression parameter added while keeping retries off.
1448-1448: LGTM: consistent Option usage after API change.
1480-1480: LGTM: correct 5th argument added.Same semantics retained.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (8)
router/internal/expr/retry_context.go (1)
60-78: Recognize DNS resolution failures via errors.AsRelying solely on substring matching (“no such host”) is brittle. Use errors.As to detect net.DNSError explicitly and treat it as a connection error.
Apply this diff:
func (ctx RetryContext) IsConnectionError() bool { // Use existing helpers for specific connection errors if ctx.IsConnectionRefused() || ctx.IsConnectionReset() { return true } + // Detect DNS resolution failures via type assertion + if ctx.originalError != nil { + var dnsErr *net.DNSError + if errors.As(ctx.originalError, &dnsErr) { + return true + } + } + // Fall back to string matching for other connection errors not covered by specific helpers if ctx.Error != "" { errLower := strings.ToLower(ctx.Error) return strings.Contains(errLower, "no such host") || strings.Contains(errLower, "handshake failure") || strings.Contains(errLower, "handshake timeout") }router/internal/retrytransport/retry_transport.go (4)
60-74: Honor Retry-After for 503 Service Unavailable as wellRFC 7231 allows Retry-After on 503 in addition to 429. Using the header for 503 improves backoff fidelity on server overload.
Apply this diff:
-// shouldUseRetryAfter determines if we should use Retry-After header for 429 responses +// shouldUseRetryAfter determines if we should use Retry-After header for +// 429 (Too Many Requests) and 503 (Service Unavailable) responses. func shouldUseRetryAfter(resp *http.Response) (time.Duration, bool) { - if resp == nil || resp.StatusCode != http.StatusTooManyRequests { + if resp == nil || (resp.StatusCode != http.StatusTooManyRequests && resp.StatusCode != http.StatusServiceUnavailable) { return 0, false }
42-46: Trim whitespace when parsing Retry-After secondsMinor robustness improvement: handle headers with surrounding whitespace.
Apply this diff:
// Try parsing as delay-seconds (integer) - if seconds, err := strconv.Atoi(retryAfter); err == nil && seconds >= 0 { + if seconds, err := strconv.Atoi(strings.TrimSpace(retryAfter)); err == nil && seconds >= 0 { return time.Duration(seconds) * time.Second }
158-160: Clarify comment grammar in drainBodyTighten the wording for clarity.
Apply this diff:
- // When we close the body only will go internally marks the persisted connection as true - // which is important so that it can reuse the connection internally for retrying + // Draining and closing the body marks the connection as reusable, + // which is important so the transport can reuse it for the retry.
172-182: Consider expanding default retryable errors (optional)Only “unexpected EOF” is covered. You may also want to consider other transient network errors (e.g., “server closed idle connection”) to improve resilience. If you prefer keeping defaults minimal, leave as-is and rely on expressions.
router/internal/retrytransport/retry_transport_test.go (3)
520-582: These tests will sleep in real time; consider a Sleeper injection to make them fastTests using Retry-After of 1s (and HTTP-date +1s) will introduce multi-second sleeps. Recommend adding an optional Sleeper func(time.Duration) to RetryOptions (defaulting to time.Sleep) and using it in RoundTrip. Tests can then inject a no-op or scaled sleeper to keep the suite fast and deterministic.
I can send a small patch to:
- add RetryOptions.Sleeper,
- use it instead of time.Sleep in RoundTrip,
- update these tests to inject a no-op sleeper,
so they complete near-instantly. Want me to open that follow-up?
272-280: Remove unused mockRoundTripper typeThis type is not referenced anywhere in the file; keeping both MockTransport and mockRoundTripper is redundant.
Apply this diff to delete it:
-// Mock round tripper for testing -type mockRoundTripper struct { - roundTripFunc func(req *http.Request) (*http.Response, error) -} - -func (m *mockRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) { - return m.roundTripFunc(req) -}
447-458: Rename variable for clarityThe local name unusedLogger is misleading; it is returned and used. Rename to logger.
Apply this diff:
- unusedLogger := zap.New(core) - return &buf, unusedLogger + logger := zap.New(core) + return &buf, logger
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
router/internal/expr/retry_context.go(1 hunks)router/internal/expr/retry_expression_test.go(1 hunks)router/internal/retrytransport/retry_transport.go(5 hunks)router/internal/retrytransport/retry_transport_test.go(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/internal/expr/retry_expression_test.go
🧰 Additional context used
🧬 Code graph analysis (2)
router/internal/retrytransport/retry_transport_test.go (1)
router/internal/retrytransport/retry_transport.go (2)
RetryHTTPTransport(28-32)RetryOptions(16-24)
router/internal/expr/retry_context.go (1)
router/pkg/pubsub/datasource/error.go (1)
Error(3-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: image_scan
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan (nonroot)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: build_test
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/internal/retrytransport/retry_transport_test.go (1)
880-929: Nice coverage of 429 opt-out behaviorGood negative test ensuring that 429s are not retried when ShouldRetry excludes them, even with Retry-After present. Aligns with the configurable policy intent.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
router/internal/expr/retry_context.go (2)
3-10: Include context.DeadlineExceeded in IsTimeout() and import contextMany network timeouts surface as context.DeadlineExceeded. Treat it as a timeout alongside os.ErrDeadlineExceeded and ETIMEDOUT.
Apply this diff:
import ( + "context" "errors" "net" "net/http" "os" "strings" "syscall" ) @@ // Check for net package timeout errors using the standard Go method if ctx.originalError != nil { if netErr, ok := ctx.originalError.(net.Error); ok && netErr.Timeout() { return true } // Check for deadline exceeded errors if errors.Is(ctx.originalError, os.ErrDeadlineExceeded) { return true } + // Treat context deadline exceeded as a timeout + if errors.Is(ctx.originalError, context.DeadlineExceeded) { + return true + } // Also check for direct syscall timeout errors not wrapped in net.Error if errors.Is(ctx.originalError, syscall.ETIMEDOUT) { return true } }Also applies to: 42-58
87-99: Include 429 (Too Many Requests) in IsRetryableStatusCode to align default behavior and Retry-After handlingPer the PR objectives and tests around Retry-After, 429 should be considered retryable by the helper, so the default “Balanced” expression can honor Retry-After without custom ShouldRetry overrides.
Apply this diff:
-// IsRetryableStatusCode returns true if the HTTP status code is generally -// considered retryable, including 500, 502, 503, and 504. +// IsRetryableStatusCode returns true if the HTTP status code is generally +// considered retryable, including 429, 500, 502, 503, and 504. func (ctx RetryContext) IsRetryableStatusCode() bool { switch ctx.StatusCode { + case http.StatusTooManyRequests, case http.StatusInternalServerError, http.StatusBadGateway, http.StatusServiceUnavailable, http.StatusGatewayTimeout: return true default: return false } }To verify impact and adjust tests if needed, run:
#!/bin/bash # Where do we reference 429 in tests and expressions? rg -n --no-heading -C2 'StatusTooManyRequests|429' . # Where is IsRetryableStatusCode() used in expression defaults? rg -n --no-heading -C3 'IsRetryableStatusCode\(\)' . # Sanity-check tests that explicitly exclude 429 to ensure intent remains correct rg -n --no-heading -C2 'shouldNotRetry429|shouldRetryWith429' router/internal/retrytransport
🧹 Nitpick comments (4)
router/internal/expr/retry_context.go (1)
61-79: Broaden IsConnectionError to cover unreachable host/network conditionsOptional hardening: include common OS-level connectivity failures (EHOSTUNREACH, ENETUNREACH) and their string variants. This improves signal for transient network partitions without overrelying on string checks.
Apply this diff:
func (ctx RetryContext) IsConnectionError() bool { // Use existing helpers for specific connection errors if ctx.IsConnectionRefused() || ctx.IsConnectionReset() { return true } + // OS-level unreachable/network conditions + if ctx.originalError != nil { + if errors.Is(ctx.originalError, syscall.EHOSTUNREACH) || + errors.Is(ctx.originalError, syscall.ENETUNREACH) { + return true + } + } + // Fall back to string matching for other connection errors not covered by specific helpers if ctx.Error != "" { errLower := strings.ToLower(ctx.Error) return strings.Contains(errLower, "no such host") || + strings.Contains(errLower, "no route to host") || + strings.Contains(errLower, "network is unreachable") || + strings.Contains(errLower, "host is unreachable") || strings.Contains(errLower, "handshake failure") || strings.Contains(errLower, "handshake timeout") } return false }router/internal/retrytransport/retry_transport_test.go (3)
511-572: Tests depend on real time.Sleep for Retry-After; add a test seam to avoid slow/flaky timingThe 429 tests use real Retry-After delays (e.g., 1s). Consider injecting a sleeper/clock into RetryHTTPTransport so tests can stub it and assert exact durations without wall-clock waits.
Proposed approach (high level):
- Add an optional Sleeper func(time.Duration) to RetryOptions (default: time.Sleep).
- Use it wherever backoff/delay is applied (including Retry-After).
- In tests, pass a stub sleeper that records durations without sleeping.
Example diffs (snippets; production file changes required):
- type RetryOptions struct { + type RetryOptions struct { Enabled bool MaxRetryCount int Interval time.Duration MaxDuration time.Duration Expression string OnRetry func(count int, req *http.Request, resp *http.Response, err error) ShouldRetry ShouldRetryFunc + Sleeper func(time.Duration) // nil => time.Sleep }- sleeper := time.Sleep + sleeper := time.Sleep + if rt.RetryOptions.Sleeper != nil { + sleeper = rt.RetryOptions.Sleeper + } // ... later - time.Sleep(delay) + sleeper(delay)Then in tests, use:
var slept []time.Duration noopSleeper := func(d time.Duration) { slept = append(slept, d) } RetryOptions{ Sleeper: noopSleeper, ... }This will speed up CI and let you assert the exact delay logic deterministically.
978-1057: Add a ShouldUseRetryAfter test case for "Retry-After: 0"A header of 0 seconds is valid and should mean “retry immediately.” We already cover parseRetryAfterHeader("0"). Add a case to TestShouldUseRetryAfter ensuring duration == 0 and use == true for 429 with header "0".
Apply this diff to extend the table:
{ name: "429 with invalid Retry-After", resp: func() *http.Response { resp := &http.Response{ StatusCode: http.StatusTooManyRequests, Header: make(http.Header), } resp.Header.Set("Retry-After", "invalid") return resp }(), expectedDur: 0, expectedUse: false, }, + { + name: "429 with zero Retry-After seconds", + resp: func() *http.Response { + resp := &http.Response{ + StatusCode: http.StatusTooManyRequests, + Header: make(http.Header), + } + resp.Header.Set("Retry-After", "0") + return resp + }(), + expectedDur: 0, + // Zero is a valid header and should be honored as "no wait" + expectedUse: true, + },
441-449: Nit: variable name “unusedLogger” is misleadingThis logger is returned and used by callers; rename to logger.
- unusedLogger := zap.New(core) - return &buf, unusedLogger + logger := zap.New(core) + return &buf, logger
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
router/core/retry_expression_builder.go(1 hunks)router/internal/expr/retry_context.go(1 hunks)router/internal/retrytransport/retry_transport_test.go(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/core/retry_expression_builder.go
🧰 Additional context used
🧬 Code graph analysis (2)
router/internal/retrytransport/retry_transport_test.go (1)
router/internal/retrytransport/retry_transport.go (2)
RetryHTTPTransport(28-32)RetryOptions(16-24)
router/internal/expr/retry_context.go (1)
router/pkg/pubsub/datasource/error.go (1)
Error(3-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
- GitHub Check: image_scan
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: integration_test (./telemetry)
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
router/internal/retrytransport/retry_transport.go (2)
109-111: Guard against nil ShouldRetry to avoid panics.RoundTrip dereferences ShouldRetry without a nil check. If wiring forgets to set it (e.g., retries disabled), this panics.
Apply this diff:
- for (isDefaultRetryableError(err) || rt.RetryOptions.ShouldRetry(err, req, resp)) && retries < rt.RetryOptions.MaxRetryCount { + for (isDefaultRetryableError(err) || (rt.RetryOptions.ShouldRetry != nil && rt.RetryOptions.ShouldRetry(err, req, resp))) && retries < rt.RetryOptions.MaxRetryCount {
135-137: Make backoff sleep cancelable via request context.Currently time.Sleep blocks even if the request is canceled, increasing tail latency under client aborts/timeouts.
Apply this diff:
- // Wait for the specified duration - time.Sleep(sleepDuration) + // Wait for the specified duration, but allow context cancellation + if sleepDuration > 0 { + timer := time.NewTimer(sleepDuration) + select { + case <-req.Context().Done(): + if !timer.Stop() { + <-timer.C + } + return resp, req.Context().Err() + case <-timer.C: + } + }
🧹 Nitpick comments (7)
router-tests/panic_test.go (2)
51-51: Make the 7-arg call self-documenting to prevent misordered paramsThe update to the new signature is correct. Given three zeros and two nils, it’s easy to misorder or misread the durations and overrides in future edits. Consider expanding to multi-line with inline comments for clarity.
- core.WithSubgraphRetryOptions(false, 0, 0, 0, "", nil, nil), + core.WithSubgraphRetryOptions( + false, // enabled + 0, // maxRetryCount + 0, // retryMaxDuration + 0, // retryInterval + "", // expression (unused because retries are disabled) + nil, // onRetryFunc + nil, // roundTripOverride + ),
83-83: Repeat readability tweak for the second occurrenceSame reasoning as above; expanding and commenting makes intent explicit and reduces future maintenance risk.
- core.WithSubgraphRetryOptions(false, 0, 0, 0, "", nil, nil), + core.WithSubgraphRetryOptions( + false, // enabled + 0, // maxRetryCount + 0, // retryMaxDuration + 0, // retryInterval + "", // expression (unused because retries are disabled) + nil, // onRetryFunc + nil, // roundTripOverride + ),router-tests/telemetry/telemetry_test.go (1)
8709-8710: Consider a tiny helper to avoid churn if the API changes againThis call shape will likely recur across tests. A small wrapper reads better and centralizes any future signature updates.
Apply this diff in this file:
- core.WithSubgraphRetryOptions(false, 0, 0, 0, "", nil, nil), + withNoSubgraphRetries(),And add this helper (anywhere in this test file or a shared test helper):
func withNoSubgraphRetries() core.Option { return core.WithSubgraphRetryOptions(false, 0, 0, 0, "", nil, nil) }router/core/retry_expression_builder.go (1)
12-12: Expose default expression in docs and config examples.DefaultRetryExpression matches the PR’s decision table “Balanced (default)”. Please mirror this constant in docs/examples so users know what they get when expression is empty.
router/internal/retrytransport/retry_transport.go (1)
180-190: Treat both io.ErrUnexpectedEOF and io.EOF as retryable defaults.PR notes say “EOF errors are always retried,” but the current check only matches “unexpected EOF” by substring. Use errors.Is for both canonical errors.
Apply this diff:
-func isDefaultRetryableError(err error) bool { - if err == nil { - return false - } - - errStr := strings.ToLower(err.Error()) - // EOF errors are always retryable as they indicate connection issues - return strings.Contains(errStr, "unexpected eof") -} +func isDefaultRetryableError(err error) bool { + if err == nil { + return false + } + // EOF errors indicate connection issues; always retry them by default. + return errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) +}Additional change required: add the missing import.
Add this import near the top of the file:
import "errors"router-tests/retry_test.go (2)
133-181: Time-based assertions: consider widening tolerance to reduce flakiness.Backoff and scheduling jitter can vary under load. You already subtract 20–50ms; if CI shows occasional flakes, consider increasing slack or asserting “within a range”.
387-394: Optional: add a mutation case to assert “mutations are never retried.”To lock in the invariant, add a subtest that issues a failing mutation with expression "true" and verifies no retries occur (onRetryCounter == 0, service called once).
Example (add as a new t.Run):
t.Run("mutations are never retried", func(t *testing.T) { t.Parallel() onRetryCounter := atomic.Int32{} retryCounterFunc := CreateRetryCounterFunc(&onRetryCounter) options := core.WithSubgraphRetryOptions(true, 5, 5*time.Second, 200*time.Millisecond, "true", retryCounterFunc, nil) testenv.Run(t, &testenv.Config{ NoRetryClient: true, RouterOptions: []core.Option{ options }, Subgraphs: testenv.SubgraphsConfig{ Employees: testenv.SubgraphConfig{ Middleware: func(_ http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { w.WriteHeader(http.StatusBadGateway) }) }, }, }, }, func(t *testing.T, xEnv *testenv.Environment) { _, err := xEnv.MakeGraphQLRequest(testenv.GraphQLRequest{ Query: `mutation { doSomething }`, }) require.NoError(t, err) require.Equal(t, 0, int(onRetryCounter.Load())) }) })
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (12)
router-tests/error_handling_test.go(4 hunks)router-tests/panic_test.go(2 hunks)router-tests/retry_test.go(1 hunks)router-tests/structured_logging_test.go(5 hunks)router-tests/telemetry/telemetry_test.go(1 hunks)router/core/graph_server.go(2 hunks)router/core/retry_expression_builder.go(1 hunks)router/core/retry_expression_builder_test.go(1 hunks)router/core/router.go(1 hunks)router/core/supervisor_instance.go(1 hunks)router/internal/expr/retry_expression_test.go(1 hunks)router/internal/retrytransport/retry_transport.go(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- router/core/graph_server.go
- router/core/supervisor_instance.go
- router/core/retry_expression_builder_test.go
- router/internal/expr/retry_expression_test.go
- router-tests/error_handling_test.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router-tests/telemetry/telemetry_test.gorouter-tests/structured_logging_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.
Applied to files:
router-tests/telemetry/telemetry_test.gorouter-tests/structured_logging_test.go
🧬 Code graph analysis (6)
router-tests/telemetry/telemetry_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1752-1773)
router-tests/retry_test.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Body(92-94)router-tests/testenv/testenv.go (3)
Run(105-122)SubgraphsConfig(365-378)Environment(1727-1763)router/core/router.go (2)
WithSubgraphRetryOptions(1752-1773)Option(172-172)router/pkg/config/config.go (1)
CustomAttribute(43-47)
router-tests/panic_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1752-1773)
router/core/router.go (1)
router/internal/retrytransport/retry_transport.go (2)
OnRetryFunc(15-15)RetryOptions(17-28)
router/core/retry_expression_builder.go (3)
router/internal/retrytransport/retry_transport.go (2)
RetryOptions(17-28)ShouldRetryFunc(14-14)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
router-tests/structured_logging_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1752-1773)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: build-router
- GitHub Check: build_push_image
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: build_push_image (nonroot)
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
- GitHub Check: build_test
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (5)
router-tests/telemetry/telemetry_test.go (2)
8709-8710: API update to WithSubgraphRetryOptions looks correctThe expanded call matches the new signature and keeps retries disabled for this scenario. Passing 0 for the duration parameters and nil for the optional hooks is acceptable here.
8709-8710: All WithSubgraphRetryOptions calls use the 7-arg signature
Verified via ripgrep and comma-count check that everyWithSubgraphRetryOptions(...)invocation across the tests and core code has exactly six commas (seven arguments), so no instances of the old 4-arg form remain.router-tests/structured_logging_test.go (1)
712-713: All WithSubgraphRetryOptions calls are now using the new 7-arg signatureVerified across the entire repo that every invocation of
WithSubgraphRetryOptionsincludes all seven parameters—no outdated 5-arg or lower-arity calls remain. Passing an empty expression ("") andnilhooks is appropriate for these test scenarios.• No action required on call sites in:
– router-tests/panic_test.go
– router-tests/retry_test.go
– router-tests/error_handling_test.go
– router-tests/structured_logging_test.go
– router-tests/telemetry/telemetry_test.go
– router/core/supervisor_instance.go
– router/core/router.goOptional: you may implement a short-lived compatibility shim (see comment in core/router.go) to ease the transition for external integrators.
router/core/retry_expression_builder.go (1)
35-64: Expression evaluation and mutation guard look solid.
- Never retrying mutations aligns with PR notes.
- Logging on evaluation error and failing closed is reasonable since the transport still retries default EOFs.
No changes needed.
router-tests/retry_test.go (1)
186-217: EOF default retry test looks good.Using RoundTripOverride to inject io.ErrUnexpectedEOF validates the “always retry EOF” path independent of expression. Assertions on retry counts are correct.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
router/internal/retrytransport/retry_transport.go (2)
145-146: Make backoff sleep cancelable via request contextIf the client cancels the request, we should stop sleeping and return promptly. Replace time.Sleep with a context-aware wait.
Apply this diff:
- // Wait for the specified duration - time.Sleep(sleepDuration) + // Wait for the specified duration, but allow context cancellation + if sleepDuration > 0 { + timer := time.NewTimer(sleepDuration) + select { + case <-req.Context().Done(): + if !timer.Stop() { + <-timer.C + } + return resp, req.Context().Err() + case <-timer.C: + } + }
117-121: Fix potential nil dereference when retries are disabled (ShouldRetry can be nil)When retries are disabled, BuildRetryFunction returns nil and ShouldRetry remains nil. The loop dereferences it unconditionally and will panic on first failure response. Guard the call.
Apply this diff:
- for (rt.RetryOptions.ShouldRetry(err, req, resp)) && retries < rt.RetryOptions.MaxRetryCount { + for (rt.RetryOptions.ShouldRetry != nil && rt.RetryOptions.ShouldRetry(err, req, resp)) && retries < rt.RetryOptions.MaxRetryCount {
🧹 Nitpick comments (9)
router/internal/retrytransport/retry_transport.go (3)
115-116: Guard against nil request logger providerIf getRequestLogger is nil or returns nil, later Debug/Error calls will panic. Default to zap.NewNop().
Apply this diff:
- requestLogger := rt.getRequestLogger(req) + requestLogger := zap.NewNop() + if rt.getRequestLogger != nil { + if l := rt.getRequestLogger(req); l != nil { + requestLogger = l + } + }
66-72: Nil-safe logging for malformed Retry-AfterparseRetryAfterHeader logs with logger.Error without verifying logger is non-nil. Add a guard.
Apply this diff:
- if errJoin != nil { - logger.Error("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin)) - } + if errJoin != nil && logger != nil { + logger.Error("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin)) + }
141-143: Align OnRetry count semantics with tests (0-based vs 1-based)OnRetry currently receives count starting at 1 (because retries++ happens before the call). The tests expect a 0-based count. Pass retries-1 to the callback.
Apply this diff:
- if rt.RetryOptions.OnRetry != nil { - rt.RetryOptions.OnRetry(retries, req, resp, sleepDuration, err) - } + if rt.RetryOptions.OnRetry != nil { + rt.RetryOptions.OnRetry(retries-1, req, resp, sleepDuration, err) + }If you prefer 1-based semantics, adjust the tests instead. Pick one convention and document it on OnRetryFunc.
router/internal/retrytransport/retry_transport_test.go (1)
348-349: Reduce flakiness in Retry-After/interval assertionsAsserting NotEqual to retryInterval can still occasionally match due to jitter. Prefer range checks, e.g., 0 < sleepDuration < retryInterval+some_tolerance, or explicitly assert shouldUseRetryAfter(...) outcome.
Example:
assert.Greater(t, sleepDuration.Load(), int64(0)) assert.InDelta(t, int64(retryInterval/2), sleepDuration.Load(), int64(retryInterval)) // within a loose windowAlso applies to: 397-398
router-tests/retry_test.go (2)
198-245: Good: validates 429 Retry-After seconds and captures sleepDuration via OnRetryClear, deterministic check using headerRetryIntervalInSeconds. As a tiny enhancement, consider validating that all retries used the same duration (not only the last observed), e.g., accumulate durations into a slice and assert them all equal to the header value.
352-400: Zero Retry-After semantics: clarify expectationWith “Retry-After: 0”, current transport falls back to normal backoff; this test only asserts it’s not equal to the fixed retryInterval. If you intend “immediate retry” on zero, assert sleepDuration == 0 and update transport accordingly. Otherwise, consider a tighter range assertion (e.g., sleepDuration > 0 and != retryInterval).
router/core/retry_expression_builder.go (3)
13-13: Default expression matches the “Balanced” profile; add a doc comment for discoverabilityThis constant aligns with the PR’s decision table. Add a short doc comment so it’s easy for contributors to find in IDE hovers and godoc.
-const DefaultRetryExpression = "IsRetryableStatusCode() || IsConnectionError() || IsTimeout()" +// DefaultRetryExpression is applied when RetryOptions.Expression is empty. +// It corresponds to the "Balanced" profile in the retry decision table. +const DefaultRetryExpression = "IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"
35-47: Guard against nil http.Request to avoid a potential panicIf ShouldRetry is ever called with a nil req (unlikely but possible during error paths), getRequestContext(req.Context()) would panic. A minimal guard is cheap and safe.
- return func(err error, req *http.Request, resp *http.Response) bool { - reqContext := getRequestContext(req.Context()) + return func(err error, req *http.Request, resp *http.Response) bool { + if req == nil { + return false + } + reqContext := getRequestContext(req.Context())Would you confirm the retry transport never calls ShouldRetry with a nil request?
55-65: Avoid potential nil logger dereference on eval errorIf reqContext.logger can ever be nil, the error logging would panic and mask the evaluation failure. Defensive check keeps this path safe.
- if evalErr != nil { - reqContext.logger.Error("Failed to evaluate retry expression", + if evalErr != nil { + if reqContext.logger != nil { + reqContext.logger.Error("Failed to evaluate retry expression", zap.Error(evalErr), zap.String("expression", expression), - ) + ) + } // Disable retries on evaluation error return false }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (12)
router-tests/error_handling_test.go(4 hunks)router-tests/panic_test.go(2 hunks)router-tests/retry_test.go(1 hunks)router-tests/structured_logging_test.go(5 hunks)router-tests/telemetry/telemetry_test.go(1 hunks)router/core/graph_server.go(2 hunks)router/core/retry_expression_builder.go(1 hunks)router/core/retry_expression_builder_test.go(1 hunks)router/core/router.go(1 hunks)router/core/supervisor_instance.go(1 hunks)router/internal/retrytransport/retry_transport.go(3 hunks)router/internal/retrytransport/retry_transport_test.go(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- router-tests/error_handling_test.go
- router/core/router.go
- router/core/retry_expression_builder_test.go
- router-tests/panic_test.go
- router-tests/telemetry/telemetry_test.go
- router/core/supervisor_instance.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router-tests/structured_logging_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.
Applied to files:
router-tests/structured_logging_test.go
🧬 Code graph analysis (4)
router/core/graph_server.go (2)
router/core/retry_expression_builder.go (1)
BuildRetryFunction(16-69)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(17-28)
router/core/retry_expression_builder.go (4)
router/internal/retrytransport/retry_transport.go (2)
RetryOptions(17-28)ShouldRetryFunc(14-14)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
router-tests/retry_test.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Body(92-94)router-tests/testenv/testenv.go (3)
Run(105-122)SubgraphsConfig(365-378)Environment(1727-1763)router/core/router.go (2)
WithSubgraphRetryOptions(1752-1765)Option(172-172)router/pkg/config/config.go (1)
CustomAttribute(43-47)
router/internal/retrytransport/retry_transport_test.go (1)
router/internal/retrytransport/retry_transport.go (2)
RetryHTTPTransport(32-36)RetryOptions(17-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: image_scan
- GitHub Check: build_push_image (nonroot)
- GitHub Check: image_scan (nonroot)
- GitHub Check: Analyze (go)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
🔇 Additional comments (5)
router-tests/structured_logging_test.go (1)
712-713: LGTM: test call sites updated to new WithSubgraphRetryOptions signatureAll updated invocations correctly pass the additional expression string and onRetry func (using empty string and nil where not needed). No further changes required here.
Also applies to: 831-832, 963-964, 1099-1100, 2215-2216
router/internal/retrytransport/retry_transport_test.go (1)
513-574: Comprehensive coverage for Retry-After and backoff behavior looks solidThe new tests cover numeric and HTTP-date Retry-After parsing, invalid/negative headers, and mixed 429/non-429 scenarios. They also validate OnRetry and attempt counts. Nicely done.
Also applies to: 576-638, 640-684, 686-751, 753-801, 802-849, 850-935, 937-984, 986-1041, 1043-1142
router-tests/retry_test.go (1)
252-300: Nice bound checks for max duration vs intervalThe test asserts both upper and lower bounds, leaving jitter slack. Good balance between determinism and real-world backoff behavior.
router/core/retry_expression_builder.go (2)
15-21: Early exit when retries are disabled: LGTMReturning (nil, nil) when retries are disabled is clear and prevents unnecessary expression compilation. This also intentionally ignores bad expressions when disabled, which is reasonable.
23-34: Defaulting and compilation path looks solidFalling back to DefaultRetryExpression and surfacing a wrapped compile error keeps failure modes explicit. Nice.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
router/internal/retrytransport/retry_transport.go (3)
116-119: Nil-guard ShouldRetry to avoid panics.RoundTrip dereferences RetryOptions.ShouldRetry without a nil check. If wiring forgets to set it, this will panic. Short-circuit the call when it’s nil.
Apply:
- for (rt.RetryOptions.ShouldRetry(err, req, resp)) && retries < rt.RetryOptions.MaxRetryCount { + for (rt.RetryOptions.ShouldRetry != nil && rt.RetryOptions.ShouldRetry(err, req, resp)) && retries < rt.RetryOptions.MaxRetryCount {
145-147: Make sleep cancelable via request context.Blocking time.Sleep ignores request cancellation and can hold resources unnecessarily.
Apply:
- // Wait for the specified duration - time.Sleep(sleepDuration) + // Wait for the specified duration, but allow context cancellation + if sleepDuration > 0 { + timer := time.NewTimer(sleepDuration) + select { + case <-req.Context().Done(): + if !timer.Stop() { + <-timer.C + } + return resp, req.Context().Err() + case <-timer.C: + } + }
84-90: Fix clamp-to-zero bug when MaxDuration is unset (0).If MaxDuration is 0 (meaning “no cap”), the current logic forces duration to 0 and disables Retry-After handling. Guard the clamp with maxDuration > 0.
Apply:
- if duration > maxDuration { - duration = maxDuration - } + if maxDuration > 0 && duration > maxDuration { + duration = maxDuration + }
🧹 Nitpick comments (9)
router/internal/retrytransport/retry_transport.go (4)
73-91: Honor Retry-After for 503 Service Unavailable as well.Per RFC 7231, Retry-After applies to both 429 and 503. Supporting 503 improves interoperability with common upstreams/CDNs.
Apply:
-func shouldUseRetryAfter(logger *zap.Logger, resp *http.Response, maxDuration time.Duration) (time.Duration, bool) { - if resp == nil || resp.StatusCode != http.StatusTooManyRequests { +func shouldUseRetryAfter(logger *zap.Logger, resp *http.Response, maxDuration time.Duration) (time.Duration, bool) { + if resp == nil || (resp.StatusCode != http.StatusTooManyRequests && resp.StatusCode != http.StatusServiceUnavailable) { return 0, false }And update the debug message at Line 125 to avoid hardcoding “429” (optional).
114-115: Defensive: default to a no-op logger if getRequestLogger returns nil.Avoid potential nil deref in logging paths.
Apply:
- requestLogger := rt.getRequestLogger(req) + requestLogger := rt.getRequestLogger(req) + if requestLogger == nil { + requestLogger = zap.NewNop() + }
117-121: Optional: respect RetryOptions.Enabled to bypass retries entirely.Right now retries run solely based on ShouldRetry. If Enabled=false is a supported config, short-circuit before retry loop.
Apply:
// Retry logic retries := 0 + if !rt.RetryOptions.Enabled { + return resp, err + }
65-71: Right-size log level for malformed Retry-After headers.Parsing failures on third-party headers are frequent and non-fatal. Consider downgrading to Warn/Debug or sampling to reduce noise.
For example:
- logger.Error("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin)) + logger.Warn("Failed to parse Retry-After header", zap.String("retry-after", retryAfter), zap.Error(errJoin))(or wrap with a sampled logger).
router/internal/retrytransport/retry_transport_test.go (5)
551-559: Speed up 429 seconds-based test by capping MaxDuration to milliseconds.This test currently incurs ~2s of real sleep (1s per retry). Reduce runtime by capping transport MaxDuration so the sleep is bounded while still exercising Retry-After logic.
Apply:
- MaxDuration: 10 * time.Second, + MaxDuration: 10 * time.Millisecond,If you also want to assert the actual chosen delay, capture sleepDuration via OnRetry and assert it equals the cap.
615-623: Align transport MaxDuration with test’s local maxDuration to keep it fast and verify capping end-to-end.Right now the handler validates shouldUseRetryAfter with a 500ms cap, but the transport still sleeps up to 1s. Use the same cap in RetryOptions.
Apply:
- MaxDuration: 10 * time.Second, + MaxDuration: maxDuration,Optionally, assert that the OnRetry sleepDuration equals maxDuration.
729-737: Reduce HTTP-date based 429 test duration.Bound transport MaxDuration to a few milliseconds; the header path is still taken, but the sleep is short.
Apply:
- MaxDuration: 10 * time.Second, + MaxDuration: 10 * time.Millisecond,
909-917: Shrink mixed 429/other-errors test runtime.Use millisecond-level MaxDuration instead of 10s to avoid long sleeps for the 429-with-header case.
Apply:
- MaxDuration: 10 * time.Second, + MaxDuration: 10 * time.Millisecond,
513-575: Avoid real-time sleeps in tests by asserting sleepDuration via OnRetry or by injecting a sleeper.These 429 tests currently incur real sleeps (seconds or ~1s) which slow the suite. Two options:
- Minimal change: capture the sleepDuration parameter in OnRetry and assert it equals the expected Retry-After (or the cap). Then reduce MaxDuration so real sleep is tiny.
- Better: inject a Sleeper func into the transport for tests (e.g., in RetryOptions) so tests can bypass sleeping entirely.
Example transport tweak (requires small production change):
type RetryOptions struct { ... OnRetry OnRetryFunc + // Test/DI: optional override for sleeping between retries + Sleep func(ctx context.Context, d time.Duration) error } // in RoundTrip in place of time.Sleep(sleepDuration): - time.Sleep(sleepDuration) + if rt.RetryOptions.Sleep != nil { + if err := rt.RetryOptions.Sleep(req.Context(), sleepDuration); err != nil { + return resp, err + } + } else { + // existing cancelable timer path + }Then in tests set Sleep to a no-op that records d.
Also applies to: 576-639, 686-752, 802-849
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
router/internal/retrytransport/retry_transport.go(3 hunks)router/internal/retrytransport/retry_transport_test.go(12 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-26T17:19:28.144Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/internal/expr/retry_context.go:3-10
Timestamp: 2025-08-26T17:19:28.144Z
Learning: In the Cosmo router codebase, prefer using netErr.Timeout() checks over explicit context.DeadlineExceeded checks for timeout detection in retry logic, as Go's networking stack typically wraps context deadline exceeded errors in proper net.Error implementations.
Applied to files:
router/internal/retrytransport/retry_transport.go
🧬 Code graph analysis (1)
router/internal/retrytransport/retry_transport_test.go (1)
router/internal/retrytransport/retry_transport.go (2)
RetryHTTPTransport(31-35)RetryOptions(17-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: integration_test (./events)
- GitHub Check: Analyze (go)
🔇 Additional comments (4)
router/internal/retrytransport/retry_transport.go (2)
164-182: Body draining looks correct.Draining then closing ensures connection reuse; nil checks and error logs are in place.
184-188: OK: success criteria is conventional 2xx.Guarded by err==nil before invocation; no nil deref risk.
router/internal/retrytransport/retry_transport_test.go (2)
453-511: Good: OnRetry callback contract verified (count is 1-based).The test validates call count and parameters; aligns with transport semantics.
312-375: Body-draining behavior is exercised well.TrackableBody with throwOnRead/Close and log assertions give confidence in connection reuse and error logging.
Also applies to: 377-438
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
router/core/retry_expression_builder_test.go (1)
177-191: Clarify EOF behavior vs. PR note; add targeted test or adjust wordingPR text says “EOF errors are always retried.” Implementation (by design) matches only “unexpected EOF” via string matching; plain io.EOF is not retried. This is intentional per team preference captured in learnings, but it conflicts with the broad “EOF errors” phrasing. Consider either:
- Add a test for io.EOF reflecting current behavior (not retried), and update docs to say “unexpected EOF” is always retried, or
- Expand the default check to include io.EOF (and add tests).
Given the team’s stated preference to keep it simple and wait for feedback, I recommend updating wording/tests for clarity now.
Would you like me to push a follow-up commit that updates the docs/test names to explicitly say “unexpected EOF”?
router/core/retry_expression_builder.go (1)
75-85: Comment says “EOF errors are always retryable” but implementation matches only “unexpected EOF”This is intentional per team guidance (keep it simple; customers can extend via expression). Let’s make the comment exact to avoid confusion with the PR blurb.
Apply this diff:
-// isDefaultRetryableError checks for errors that should always be retryable -// regardless of the configured retry expression +// isDefaultRetryableError checks for errors that should always be retryable +// regardless of the configured retry expression. +// Note: intentionally limited to “unexpected EOF” string match for now (team decision); +// customers can handle other EOF variants via custom expressions. @@ - // EOF errors are always retryable as they indicate connection issues + // Unexpected EOFs are treated as retryable (indicative of truncated/aborted reads) return strings.Contains(errStr, "unexpected eof")
🧹 Nitpick comments (5)
router/core/retry_expression_builder_test.go (2)
84-85: Outdated comment: EOF is handled before expression evaluation, not in the transportThe handling occurs via isDefaultRetryableError inside BuildRetryFunction, prior to expression evaluation.
Apply this diff:
- assert.True(t, fn(errors.New("unexpected EOF"), req, nil)) // EOF is now handled at transport layer, not expression + // EOF is handled prior to expression evaluation via isDefaultRetryableError (not by the expression itself) + assert.True(t, fn(errors.New("unexpected EOF"), req, nil))
316-335: Optional coverage: verify behavior with missing request contextWe already cover the happy path. A small test asserting the function returns false when the request lacks an attached requestContext would guard against regressions.
Example test to append:
func TestShouldRetry_NoRequestContext_ReturnsFalse(t *testing.T) { fn, err := BuildRetryFunction(retrytransport.RetryOptions{ Enabled: true, Expression: "true", }) assert.NoError(t, err) assert.NotNil(t, fn) req, _ := http.NewRequest("POST", "http://example.com/graphql", nil) // no requestContext attached assert.False(t, fn(nil, req, &http.Response{StatusCode: 500})) }router/core/retry_expression_builder.go (3)
33-37: Avoid double-wrapping the compile error messageexpr.NewRetryExpressionManager already formats its error with “failed to compile retry expression: …”. Wrapping again duplicates the phrase.
Apply this diff:
- manager, err := expr.NewRetryExpressionManager(expression) - if err != nil { - return nil, fmt.Errorf("failed to compile retry expression: %w", err) - } + manager, err := expr.NewRetryExpressionManager(expression) + if err != nil { + return nil, err + }
41-46: Defensive nil check for req to prevent a potential panicShouldRetryFunc is an internal callback; guarding against a nil http.Request is cheap and eliminates a narrow edge-case panic.
Apply this diff:
- reqContext := getRequestContext(req.Context()) + if req == nil { + return false + } + reqContext := getRequestContext(req.Context())
47-50: Use EqualFold and the OperationTypeMutation constant for clarityMinor readability: avoid ToLower and compare against the declared constant.
Apply this diff:
- // Never retry mutations, regardless of expression result - if strings.ToLower(reqContext.Operation().Type()) == "mutation" { + // Never retry mutations, regardless of expression result + if strings.EqualFold(reqContext.Operation().Type(), string(OperationTypeMutation)) { return false }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
router/core/retry_expression_builder.go(1 hunks)router/core/retry_expression_builder_test.go(1 hunks)router/core/router.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/core/router.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-26T17:30:41.187Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/retry_expression_builder.go:3-11
Timestamp: 2025-08-26T17:30:41.187Z
Learning: In router/core/retry_expression_builder.go, the team prefers to keep EOF error detection simple with string matching ("unexpected eof") rather than using errors.Is() for io.EOF and io.ErrUnexpectedEOF. They want to wait for customer feedback before making it more robust, since customers can modify retry expressions to handle specific cases.
Applied to files:
router/core/retry_expression_builder_test.gorouter/core/retry_expression_builder.go
🧬 Code graph analysis (2)
router/core/retry_expression_builder_test.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/core/retry_expression_builder.go (2)
BuildRetryFunction(20-73)DefaultRetryExpression(13-13)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(17-27)router/core/context.go (3)
OperationTypeQuery(488-488)OperationTypeMutation(489-489)OperationTypeSubscription(490-490)
router/core/retry_expression_builder.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/retrytransport/retry_transport.go (2)
RetryOptions(17-27)ShouldRetryFunc(14-14)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/core/retry_expression_builder.go (1)
19-25: Contract note: returning a noop func when disabled is fine; document itGiven the test was expecting nil, it’s worth documenting that the disabled path returns a non-nil noop to simplify downstream usage (no nil checks).
If you want, I can add a docstring clarifying this behavior.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
router/core/retry_expression_builder.go (1)
82-85: Comment vs behavior mismatch for EOFs; tiny tweak or clarify commentCurrent code only matches “unexpected eof”, but the docs say “EOF errors are always retried.” If you want to keep the simple string approach (per team preference), consider also matching the common plain “eof” case; otherwise, narrow the comment.
Option A — broaden match (still string-based, minimal change):
- errStr := strings.ToLower(err.Error()) - // EOF errors are always retryable as they indicate connection issues - return strings.Contains(errStr, "unexpected eof") + errStr := strings.ToLower(err.Error()) + // Treat common EOFs as retryable connection issues. + // Keep this simple per team guidance: match exact "eof" and "unexpected eof". + return errStr == "eof" || strings.Contains(errStr, "unexpected eof")Option B — keep behavior, fix comment:
- // EOF errors are always retryable as they indicate connection issues + // "unexpected eof" errors are always retryable as they indicate connection issues
🧹 Nitpick comments (3)
router/core/retry_expression_builder.go (3)
15-17: Type the noop func explicitly to the public aliasMinor clarity/maintainability win; future signature changes will fail fast at declaration.
-var noopRetryFunc = func(err error, req *http.Request, resp *http.Response) bool { +var noopRetryFunc retrytransport.ShouldRetryFunc = func(err error, req *http.Request, resp *http.Response) bool { return false }
41-46: Defensive nil-check for req to avoid a rare panic pathIf an upstream caller ever passes a nil request,
req.Context()will panic. Cheap guard.- reqContext := getRequestContext(req.Context()) - - if reqContext == nil { - return false - } + if req == nil { + return false + } + reqContext := getRequestContext(req.Context()) + if reqContext == nil { + return false + }
48-50: Use EqualFold instead of ToLower for case-insensitive compareAvoids allocs and reads clearer.
- if strings.ToLower(reqContext.Operation().Type()) == "mutation" { + if strings.EqualFold(reqContext.Operation().Type(), "mutation") {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
router/core/retry_expression_builder.go(1 hunks)router/core/retry_expression_builder_test.go(1 hunks)router/pkg/config/config.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- router/pkg/config/config.go
- router/core/retry_expression_builder_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-26T17:30:41.212Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/retry_expression_builder.go:3-11
Timestamp: 2025-08-26T17:30:41.212Z
Learning: In router/core/retry_expression_builder.go, the team prefers to keep EOF error detection simple with string matching ("unexpected eof") rather than using errors.Is() for io.EOF and io.ErrUnexpectedEOF. They want to wait for customer feedback before making it more robust, since customers can modify retry expressions to handle specific cases.
Applied to files:
router/core/retry_expression_builder.go
🧬 Code graph analysis (1)
router/core/retry_expression_builder.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/retrytransport/retry_transport.go (2)
RetryOptions(17-27)ShouldRetryFunc(14-14)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build_test
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
router/core/retry_expression_builder.go (2)
19-26: Good early-return guard for disabled retriesClear behavior when retries are off; bad expressions won’t block startup. LGTM.
60-69: Fail-closed on evaluation errors: confirm intent and log volumeReturning false on eval errors is a conservative choice; just confirm this is desired UX vs. “fail-open” for transient eval issues. Also, this log can be hot if an expression keeps failing—consider rate limiting or downgrading after the first hit.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (1)
router/core/router.go (1)
1755-1776: Preserve backward compatibility: add a compat overloadThe signature change breaks external callers. Add a deprecated 4-arg overload that forwards to the new one.
+// Deprecated: use WithSubgraphRetryOptions with algorithm/expression/hooks. +func WithSubgraphRetryOptionsCompat( + enabled bool, + maxRetryCount int, + retryMaxDuration, retryInterval time.Duration, +) Option { + return WithSubgraphRetryOptions(enabled, backoffJitter, maxRetryCount, retryMaxDuration, retryInterval, "", nil) +}
🧹 Nitpick comments (2)
router/core/router.go (1)
1755-1776: Clarify that onRetryFunc is test-onlyDocument onRetryFunc as a test hook to avoid production misuse.
router/core/retry_builder_test.go (1)
440-472: Add a test for disabled retries with empty/invalid algorithmCovers the new short-circuit behavior in ProcessRetryOptions to prevent regressions.
func TestProcessRetryOptions(t *testing.T) { + t.Run("disabled retries tolerate empty algorithm", func(t *testing.T) { + opts := retrytransport.RetryOptions{ + Enabled: false, + Algorithm: "", + } + got, err := ProcessRetryOptions(opts) + assert.NoError(t, err) + assert.NotNil(t, got) + assert.NotNil(t, got.ShouldRetry) + assert.False(t, got.Enabled) + })
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (12)
router-tests/error_handling_test.go(4 hunks)router-tests/panic_test.go(2 hunks)router-tests/retry_test.go(1 hunks)router-tests/structured_logging_test.go(5 hunks)router-tests/telemetry/telemetry_test.go(1 hunks)router/core/graph_server.go(2 hunks)router/core/retry_builder.go(1 hunks)router/core/retry_builder_test.go(1 hunks)router/core/router.go(1 hunks)router/core/supervisor_instance.go(1 hunks)router/internal/retrytransport/retry_transport.go(3 hunks)router/pkg/config/config.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- router/pkg/config/config.go
- router-tests/retry_test.go
- router-tests/error_handling_test.go
- router/internal/retrytransport/retry_transport.go
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/retry_expression_builder.go:3-11
Timestamp: 2025-08-26T17:30:41.212Z
Learning: In router/core/retry_expression_builder.go, the team prefers to keep EOF error detection simple with string matching ("unexpected eof") rather than using errors.Is() for io.EOF and io.ErrUnexpectedEOF. They want to wait for customer feedback before making it more robust, since customers can modify retry expressions to handle specific cases.
📚 Learning: 2025-08-26T17:30:41.212Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/retry_expression_builder.go:3-11
Timestamp: 2025-08-26T17:30:41.212Z
Learning: In router/core/retry_expression_builder.go, the team prefers to keep EOF error detection simple with string matching ("unexpected eof") rather than using errors.Is() for io.EOF and io.ErrUnexpectedEOF. They want to wait for customer feedback before making it more robust, since customers can modify retry expressions to handle specific cases.
Applied to files:
router/core/retry_builder.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router/core/supervisor_instance.gorouter-tests/panic_test.gorouter-tests/structured_logging_test.gorouter-tests/telemetry/telemetry_test.gorouter/core/graph_server.gorouter/core/router.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: In the Cosmo router codebase, JSON schema validation prevents null values in TrafficShapingRules subgraph configurations, making nil checks unnecessary when dereferencing subgraph rule pointers in NewSubgraphTransportOptions.
Applied to files:
router-tests/structured_logging_test.gorouter-tests/telemetry/telemetry_test.go
🧬 Code graph analysis (8)
router/core/retry_builder.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/retrytransport/retry_transport.go (2)
RetryOptions(17-28)ShouldRetryFunc(14-14)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
router/core/retry_builder_test.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(17-28)router/core/context.go (3)
OperationTypeQuery(488-488)OperationTypeMutation(489-489)OperationTypeSubscription(490-490)router/core/retry_builder.go (1)
ProcessRetryOptions(23-48)
router/core/supervisor_instance.go (1)
router/pkg/config/config.go (1)
BackoffJitterRetry(231-238)
router-tests/panic_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1755-1776)
router-tests/structured_logging_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1755-1776)
router-tests/telemetry/telemetry_test.go (1)
router/core/router.go (1)
WithSubgraphRetryOptions(1755-1776)
router/core/graph_server.go (2)
router/core/retry_builder.go (1)
ProcessRetryOptions(23-48)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(17-28)
router/core/router.go (1)
router/internal/retrytransport/retry_transport.go (2)
OnRetryFunc(15-15)RetryOptions(17-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: image_scan (nonroot)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: image_scan
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./telemetry)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (5)
router/core/supervisor_instance.go (1)
196-204: API update correctly wired: Algorithm and Expression are passed to WithSubgraphRetryOptionsThe parameter order matches the updated signature (enabled, algorithm, maxAttempts, maxDuration, interval, expression, onRetry). Passing nil for onRetry is appropriate here.
router-tests/panic_test.go (1)
51-52: Tests updated to new 7-arg retry API — disabling retries explicitlyUsing false, "", 0, 0, 0, "", nil cleanly disables retry in both subtests and aligns with the new signature.
Also applies to: 83-84
router-tests/telemetry/telemetry_test.go (1)
8709-8710: Ensure no calls to WithSubgraphRetryOptions use fewer than seven arguments
Run the following to list any 4–6-arg invocations and update them to the new 7-argument signature:ast-grep --pattern $'WithSubgraphRetryOptions($_, $_, $_, $_)' ast-grep --pattern $'WithSubgraphRetryOptions($_, $_, $_, $_, $_)' ast-grep --pattern $'WithSubgraphRetryOptions($_, $_, $_, $_, $_, $_)'router/core/retry_builder.go (1)
113-116: EOF matching kept intentionally simple (ack)String match for "unexpected eof" aligns with the team’s preference; no change requested.
router/core/graph_server.go (1)
1158-1165: Good: centralizing retry option processing and ensuring ShouldRetry is non-nilUsing ProcessRetryOptions here removes prior nil-callback pitfalls and centralizes validation.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
router/core/retry_builder.go (1)
72-78: Guard against nil req and nil Operation; use constant + EqualFold for mutation checkCurrent code can panic if req is nil or if Operation() returns nil. Also prefer the typed constant and EqualFold.
Apply:
@@ - reqContext := getRequestContext(req.Context()) + if req == nil { + return false + } + reqContext := getRequestContext(req.Context()) @@ - // Never retry mutations, regardless of expression result - if strings.ToLower(reqContext.Operation().Type()) == "mutation" { + // Never retry mutations, regardless of expression result + if op := reqContext.Operation(); op == nil { + return false + } else if strings.EqualFold(op.Type(), string(OperationTypeMutation)) { return false }Also applies to: 79-82
🧹 Nitpick comments (4)
router/core/retry_builder.go (2)
60-63: De-duplicate default expression source"defaultRetryExpression" duplicates config defaults. To avoid drift, centralize this constant (e.g., shared internal package) and reference it from both config and core.
114-117: “EOF errors are always retried”: include io.EOF and wrapped forms?Implementation only matches “unexpected eof”. If the intent (and docs) is to retry all EOF-class errors, handle io.EOF and wrapped errors explicitly.
Option A (minimal, robust):
@@ -func isDefaultRetryableError(err error) bool { +func isDefaultRetryableError(err error) bool { if err == nil { return false } - errStr := strings.ToLower(err.Error()) - // EOF errors are always retryable as they indicate connection issues - return strings.Contains(errStr, "unexpected eof") + // EOF errors are always retryable as they indicate connection issues + if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) { + return true + } + // Keep the string fallback to catch non-wrapped variants + errStr := strings.ToLower(err.Error()) + return strings.Contains(errStr, "unexpected eof") }Add imports:
@@ -import ( +import ( + "errors" "fmt" + "io" "net/http" "strings"If the team intentionally limits to “unexpected eof” only (see learning), please confirm and update docs/tests to reflect that narrower scope.
router/core/retry_builder_test.go (2)
53-53: Add nil request and missing request-context guardsEnsure the retry func is resilient when req is nil or lacks a requestContext. This complements the guard fix in buildRetryFunction.
Append tests:
@@ func TestBuildRetryFunction(t *testing.T) { + t.Run("graceful when request is nil", func(t *testing.T) { + fn, err := buildRetryFunction(retrytransport.RetryOptions{ + Enabled: true, + Expression: defaultRetryExpression, + }) + assert.NoError(t, err) + assert.False(t, fn(nil, nil, &http.Response{StatusCode: 500})) + }) + + t.Run("graceful when request context is missing", func(t *testing.T) { + fn, err := buildRetryFunction(retrytransport.RetryOptions{ + Enabled: true, + Expression: defaultRetryExpression, + }) + assert.NoError(t, err) + req, _ := http.NewRequest("POST", "http://example.com/graphql", nil) // no requestContext attached + assert.False(t, fn(nil, req, &http.Response{StatusCode: 500})) + })
440-481: Add coverage for empty algorithm when retries are enabledProcessRetryOptions currently errors when Enabled is true and Algorithm is empty. Add a test to lock this behavior (and improve error clarity if desired).
@@ func TestProcessRetryOptions(t *testing.T) { t.Run("process valid options", func(t *testing.T) { options := retrytransport.RetryOptions{ Enabled: true, Algorithm: "backoff_jitter", Expression: "statusCode == 500 || IsTimeout() || IsConnectionError()", } response, err := ProcessRetryOptions(options) assert.NoError(t, err) assert.NotSame(t, &options, response) }) + + t.Run("process empty algorithm when enabled returns error", func(t *testing.T) { + _, err := ProcessRetryOptions(retrytransport.RetryOptions{ + Enabled: true, + Algorithm: "", + Expression: "statusCode == 500", + }) + assert.Error(t, err) + assert.ErrorContains(t, err, "unsupported retry algorithm") + }) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
router/core/retry_builder.go(1 hunks)router/core/retry_builder_test.go(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-28T09:59:19.956Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/supervisor_instance.go:196-204
Timestamp: 2025-08-28T09:59:19.956Z
Learning: In router/pkg/config/config.go, the BackoffJitterRetry struct uses envDefault tags to provide default values for Algorithm ("backoff_jitter") and Expression ("IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"), ensuring that even when YAML configs omit these fields, valid defaults are applied automatically, preventing ProcessRetryOptions from rejecting empty values.
Applied to files:
router/core/retry_builder.go
📚 Learning: 2025-08-26T17:30:41.212Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/retry_expression_builder.go:3-11
Timestamp: 2025-08-26T17:30:41.212Z
Learning: In router/core/retry_expression_builder.go, the team prefers to keep EOF error detection simple with string matching ("unexpected eof") rather than using errors.Is() for io.EOF and io.ErrUnexpectedEOF. They want to wait for customer feedback before making it more robust, since customers can modify retry expressions to handle specific cases.
Applied to files:
router/core/retry_builder.go
📚 Learning: 2025-08-28T09:59:10.337Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/router.go:1755-1776
Timestamp: 2025-08-28T09:59:10.337Z
Learning: In the router retry configuration, the algorithm parameter should not be defaulted when empty - users are expected to explicitly provide the algorithm value rather than relying on implicit defaults.
Applied to files:
router/core/retry_builder.go
🧬 Code graph analysis (2)
router/core/retry_builder_test.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(17-28)router/core/context.go (3)
OperationTypeQuery(488-488)OperationTypeMutation(489-489)OperationTypeSubscription(490-490)router/core/retry_builder.go (1)
ProcessRetryOptions(23-49)
router/core/retry_builder.go (4)
router/internal/expr/expr.go (2)
Request(65-74)Context(34-38)router/internal/retrytransport/retry_transport.go (2)
RetryOptions(17-28)ShouldRetryFunc(14-14)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)router/internal/expr/retry_context.go (1)
LoadRetryContext(138-152)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: build_test
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
router/core/retry_builder.go (1)
23-49: ProcessRetryOptions: validation + copy semantics look solidAlgorithm validation respects the disabled flag, and returning a copy avoids mutating caller-owned options. No issues.
router/core/retry_builder_test.go (1)
53-438: Strong, comprehensive coverageGood matrix across statuses, errors, expression shapes, and op types; compile-time error cases and option processing are covered.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
router/core/router.go (1)
1755-1762: Public API break: add a compat wrapper or a new -WithHooks function; document test-only hookYou expanded WithSubgraphRetryOptions; external callers using the 4-arg form will break. Either:
- keep this 7-arg and add a deprecated 4-arg wrapper, or
- rename this to WithSubgraphRetryOptionsWithHooks and reintroduce the old 4-arg WithSubgraphRetryOptions that forwards.
Also annotate OnRetry as test-only in the doc comment. This mirrors prior feedback.
Proposed shim (minimal, keeps current name; add below this function):
+// Deprecated: use WithSubgraphRetryOptionsWithHooks to pass algorithm, expression or test hooks. +func WithSubgraphRetryOptions( + enabled bool, + maxRetryCount int, + retryMaxDuration, retryInterval time.Duration, +) Option { + return WithSubgraphRetryOptionsWithHooks( + enabled, + backoffJitter, // explicit default to preserve prior behavior + maxRetryCount, + retryMaxDuration, retryInterval, + "", // no expression + nil, // no OnRetry hook + ) +} + +// WithSubgraphRetryOptionsWithHooks configures retry including algorithm, expression and test-only hooks. +func WithSubgraphRetryOptionsWithHooks( + enabled bool, + algorithm string, + maxRetryCount int, + retryMaxDuration, retryInterval time.Duration, + expression string, + onRetryFunc retrytransport.OnRetryFunc, +) Option { + return func(r *Router) { + r.retryOptions = retrytransport.RetryOptions{ + Enabled: enabled, + Algorithm: algorithm, + MaxRetryCount: maxRetryCount, + MaxDuration: retryMaxDuration, + Interval: retryInterval, + Expression: expression, + // Test case overrides + OnRetry: onRetryFunc, + } + } +}Additionally, add a brief doc comment above the live function noting OnRetry is intended for tests only.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (2)
router-tests/go.sumis excluded by!**/*.sumrouter/go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
router-tests/go.mod(1 hunks)router/core/graph_server.go(2 hunks)router/core/router.go(1 hunks)router/go.mod(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- router/go.mod
- router-tests/go.mod
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-08-28T10:07:27.637Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router-tests/structured_logging_test.go:712-713
Timestamp: 2025-08-28T10:07:27.637Z
Learning: In the router retry configuration, when retries are disabled (enabled=false in WithSubgraphRetryOptions), the algorithm parameter should not be validated since it won't be used. Empty algorithm strings are acceptable when retries are disabled.
Applied to files:
router/core/router.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
PR: wundergraph/cosmo#2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.
Applied to files:
router/core/router.go
📚 Learning: 2025-08-28T09:59:19.956Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/supervisor_instance.go:196-204
Timestamp: 2025-08-28T09:59:19.956Z
Learning: In router/pkg/config/config.go, the BackoffJitterRetry struct uses envDefault tags to provide default values for Algorithm ("backoff_jitter") and Expression ("IsRetryableStatusCode() || IsConnectionError() || IsTimeout()"), ensuring that even when YAML configs omit these fields, valid defaults are applied automatically, preventing ProcessRetryOptions from rejecting empty values.
Applied to files:
router/core/router.go
📚 Learning: 2025-08-28T09:59:10.337Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/router.go:1755-1776
Timestamp: 2025-08-28T09:59:10.337Z
Learning: In the router retry configuration, the algorithm parameter should not be defaulted when empty - users are expected to explicitly provide the algorithm value rather than relying on implicit defaults.
Applied to files:
router/core/router.go
🧬 Code graph analysis (2)
router/core/graph_server.go (2)
router/core/retry_builder.go (1)
ProcessRetryOptions(23-49)router/internal/retrytransport/retry_transport.go (1)
RetryOptions(17-28)
router/core/router.go (1)
router/internal/retrytransport/retry_transport.go (2)
OnRetryFunc(15-15)RetryOptions(17-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
router/core/router.go (1)
1764-1774: Intentional: no implicit defaulting of algorithm when emptyLeaving algorithm as-is (validated later) aligns with the design decision not to silently default the algorithm. Good.
router/core/graph_server.go (1)
1179-1185: Passing processed retry options looks correctUsing the processed options (with compiled expression) in TransportOptions is the right integration point.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
router/internal/expr/retry_expression_test.go (3)
340-375: Add nil error case for IsTimeout.Ensures IsTimeout() is false when no error is present.
{ name: "operation timed out string not detected (no string matching)", err: errors.New("operation timed out"), expected: false, }, + { + name: "nil error", + err: nil, + expected: false, + }, { name: "connection errors not detected by IsTimeout", err: syscall.ECONNREFUSED, expected: false, },
16-16: Speed up: mark top-level tests as parallel-safe.These tests are read-only and independent. Enable t.Parallel() at top level for faster CI. Avoid subtest-level parallelism to keep the table loop simple.
func TestRetryExpressionManager(t *testing.T) { + t.Parallel()func TestRetryExpressionManager_EmptyExpression(t *testing.T) { + t.Parallel()func TestLoadRetryContext(t *testing.T) { + t.Parallel()func TestRetryContext_SyscallErrorDetection(t *testing.T) { + t.Parallel()func TestRetryContext_ImprovedErrorDetection(t *testing.T) { + t.Parallel()func TestRetryExpressionManager_WithSyscallErrors(t *testing.T) { + t.Parallel()Also applies to: 140-140, 146-146, 183-183, 297-297, 493-493
16-137: Optional: DRY up the expression-eval pattern with a small helper.Reduces repetition and makes new cases trivial to add.
func eval(t *testing.T, expression string, ctx RetryContext) bool { t.Helper() m, err := NewRetryExpressionManager(expression) require.NoError(t, err) require.NotNil(t, m) got, err := m.ShouldRetry(ctx) require.NoError(t, err) return got }Use eval() inside TestRetryExpressionManager table loop.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
router/internal/expr/retry_context.go(1 hunks)router/internal/expr/retry_expression_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/internal/expr/retry_context.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-26T17:30:41.212Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2167
File: router/core/retry_expression_builder.go:3-11
Timestamp: 2025-08-26T17:30:41.212Z
Learning: In router/core/retry_expression_builder.go, the team prefers to keep EOF error detection simple with string matching ("unexpected eof") rather than using errors.Is() for io.EOF and io.ErrUnexpectedEOF. They want to wait for customer feedback before making it more robust, since customers can modify retry expressions to handle specific cases.
Applied to files:
router/internal/expr/retry_expression_test.go
🧬 Code graph analysis (1)
router/internal/expr/retry_expression_test.go (2)
router/internal/expr/retry_context.go (2)
RetryContext(13-19)LoadRetryContext(132-146)router/internal/expr/retry_expression.go (1)
NewRetryExpressionManager(17-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Analyze (go)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: build_push_image
- GitHub Check: build_test
- GitHub Check: image_scan (nonroot)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: image_scan
- GitHub Check: build_test
🔇 Additional comments (3)
router/internal/expr/retry_expression_test.go (3)
55-65: Good coverage of HTTP read timeout semantics and selective string matching.Nice job distinguishing the specific “timeout awaiting response headers” case (true) from generic timeout strings (false), which matches the intended stricter matching.
Also applies to: 366-389
183-295: Solid syscall surface coverage and expression integration.Tests for ECONNREFUSED/RESET/ETIMEDOUT across direct, wrapped, and net.OpError are comprehensive; the expression integration tests validate end-to-end behavior.
Also applies to: 493-549
92-97: No change needed: 429 is intentionally not retryable by default
The defaultIsRetryableStatusCodeimplementation deliberately excludes HTTP 429 (Too Many Requests) in the Balanced expression; the existing test expectingfalseis correct and should remain unchanged.Likely an incorrect or invalid review comment.
Summary by CodeRabbit
Example configuration
Retry Strategy Decision Table
IsRetryableStatusCode()IsRetryableStatusCode() || IsTimeout() || IsConnectionError()Is5xxError() || IsTimeout() || IsConnectionError()IsTimeout()IsHttpReadTimeout()IsConnectionError()statusCode == 503error contains "timeout"Available Methods:
IsTimeout(),IsConnectionError(),IsConnectionRefused(),IsConnectionReset(),Is5xxError(),IsRetryableStatusCode(),IsHttpReadTimeout()Available Fields:
statusCode(int),error(string)Default:
IsRetryableStatusCode() || IsConnectionError() || IsTimeout()Note: Mutations are never retried regardless of this expression. EOF errors are always retried.
Checklist