Skip to content

feat(router): block/disable persisted operations#2181

Merged
endigma merged 10 commits intomainfrom
jesse/eng-8016-apq-perf
Sep 3, 2025
Merged

feat(router): block/disable persisted operations#2181
endigma merged 10 commits intomainfrom
jesse/eng-8016-apq-perf

Conversation

@endigma
Copy link
Copy Markdown
Member

@endigma endigma commented Sep 1, 2025

  • fix(router): validate that user provided hashes match query
  • feat(router): refactor PO blocking, allow disabling PO provider
  • tests: update APQ tests to have correct SHAs

Summary by CodeRabbit

  • New Features

    • Added configurable blocking for persisted operations with optional conditions.
    • Stricter validation: requests with both query and hash must match; mismatches return a clear 400 error.
    • APQ behavior improved, including smarter cache/TTL handling.
  • Bug Fixes

    • Guards against conflicting security settings with clear errors and warnings.
  • Documentation

    • Configuration schema updated for clarity; added entries for blocking persisted operations.
  • Chores

    • Debug config: file watching disabled by default.

Checklist

Resolves ENG-8050

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 1, 2025

Walkthrough

Introduces configurable blocking for persisted operations, updates hashing/validation in the GraphQL prehandler, revises APQ handling and TTL renewal, gates storage client setup when persisted operations are disabled, refactors persisted-operation client interfaces, renames/relocates Redis closer package, updates configuration/schema and defaults, and adjusts tests (including new APQ mismatch test and hash updates).

Changes

Cohort / File(s) Summary
Persisted ops client API refactor
router/internal/persistedoperation/client.go, router/internal/persistedoperation/operationstorage/cdn/client.go, router/internal/persistedoperation/operationstorage/fs/client.go, router/internal/persistedoperation/operationstorage/s3/client.go, router/internal/persistedoperation/apq/redis.go, router/internal/persistedoperation/apq/client.go
Replace provider interface to StorageClient (no presence boolean), export concrete Client, adjust constructors and method signatures, add APQEnabled, update Redis APQ to use rediscloser.
Core integration and blocking
router/core/graph_server.go, router/core/graphql_prehandler.go, router/core/operation_blocker.go, router/core/router.go, router/core/operation_processor.go, router/core/ratelimiter.go, router/core/router_config.go
Wire new BlockPersisted options, add hash equality check and persistedQuery population, adjust blocker to evaluate persisted condition, gate provider resolution on Disabled, revise APQ/TTL renewal paths, update types/imports to new client and rediscloser.
Config and schema
router/pkg/config/config.go, router/pkg/config/config.schema.json, router/pkg/config/testdata/config_defaults.json, router/pkg/config/testdata/config_full.json, router/debug.config.yaml
Add security.block_persisted_operations (enabled/condition), tweak descriptions, include defaults in testdata, comment out file watch in debug config.
Redis closer package rename
router/internal/rediscloser/rediscloser.go, router/internal/rediscloser/rediscloser_test.go, router/pkg/pubsub/redis/adapter.go, router/internal/persistedoperation/apq/redis.go, router/core/ratelimiter.go, router/core/router.go
Rename package to rediscloser and update imports/usages across dependents; no functional changes beyond wiring.
Router tests (APQ and GET)
router-tests/automatic_persisted_queries_test.go, router-tests/persisted_operations_over_get_test.go
Add test for mismatched persistedQuery hash vs body and subsequent NotFound; update multiple sha256 hashes across APQ and GET tests to new values.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@endigma endigma changed the title feat: block/disable persisted operations feat(router): block/disable persisted operations Sep 1, 2025
@github-actions github-actions Bot added the router label Sep 1, 2025
@endigma endigma force-pushed the jesse/eng-8016-apq-perf branch from 59c12a9 to b254f5a Compare September 1, 2025 15:50
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 1, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-b2f0722257c4457b1664dabe4ebeb899fb086ae4

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 1, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-c4956f411eb930d8cf5143427b8e7131c9a775ca

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
router/pkg/config/config.schema.json (2)

146-150: Clarify “disabled” vs “blocked” to avoid operator confusion.

“Disable persisted operations.” is terse and easy to conflate with the new security.block_persisted_operations. Suggest clarifying that this disables the provider/loading, not request-time acceptance.

-          "description": "Disable persisted operations.",
+          "description": "Disable the persisted-operations provider (loading/serving by hash). This does not block requests that include a persisted hash by itself. To reject such requests at runtime, use security.block_persisted_operations.",

2394-2408: Add an explicit default for enabled (nit).

Many booleans in this schema omit defaults, but adding a default:false here reduces ambiguity.

         "block_persisted_operations": {
           "type": "object",
           "description": "The configuration for blocking persisted operations.",
           "additionalProperties": false,
           "properties": {
             "enabled": {
               "type": "boolean",
+              "default": false,
               "description": "Block persisted operations (sent with operation hash). You can also specify a condition that is evaluated to determine if the persisted operation should be blocked."
             },
router-tests/automatic_persisted_queries_test.go (1)

91-127: Good negative path coverage; consider making the assertion less brittle

The exact string match on the error body is likely to drift. Matching the salient fragment reduces test fragility while keeping intent clear.

-				require.Equal(t, `{"errors":[{"message":"persistedQuery sha256 hash does not match query body"}]}`, res1.Body)
+				require.Contains(t, res1.Body, `"persistedQuery sha256 hash does not match query body"`)

Optional: parse the JSON and assert on errors[0].message to avoid string-shape coupling. I can provide a helper if useful.

router/core/router.go (1)

983-1048: Add visibility for disabled persisted operations and misconfiguration

  • Log when persisted operations are disabled, distinguishing between a configured-but-disabled provider and fully disabled storage.
  • Warn on a non-empty ProviderID in disabled mode to catch misconfiguration early.

Nil-safety is already ensured by the existing persistedOperationClient == nil guard in FetchPersistedOperation and the shutdown checks.

router/core/graphql_prehandler.go (1)

570-579: Hash mismatch check: compare case-insensitively and surface an extension code

Hex hashes may differ in case; also consider returning a stable extension code for clients/metrics.

-	if operationKit.parsedOperation.GraphQLRequestExtensions.PersistedQuery.HasHash() && operationKit.parsedOperation.Request.Query != "" {
-		if operationKit.parsedOperation.Sha256Hash != operationKit.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash {
+	if operationKit.parsedOperation.GraphQLRequestExtensions.PersistedQuery.HasHash() &&
+		operationKit.parsedOperation.Request.Query != "" {
+		if !strings.EqualFold(
+			operationKit.parsedOperation.Sha256Hash,
+			operationKit.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash,
+		) {
 			return &httpGraphqlError{
 				message:    "persistedQuery sha256 hash does not match query body",
 				statusCode: http.StatusBadRequest,
+				extensionCode: "PERSISTED_QUERY_HASH_MISMATCH",
 			}
 		}
 	}

Optional: consider 200 + GraphQL error (e.g., PERSISTED_QUERY_BAD_HASH) if you want full APQ compatibility.

router/core/operation_blocker.go (1)

141-156: Attach an extension code when blocking persisted ops

Today callers see only the message. For better client handling/telemetry, map ErrPersistedOperationBlocked to a stable extension code (e.g., PERSISTED_OPERATION_BLOCKED) when turning it into httpGraphqlError in PreHandler.

Example (outside this hunk, in PreHandler where OperationIsBlocked is handled):

if err := h.operationBlocker.OperationIsBlocked(...); err != nil {
	code := ""
	if errors.Is(err, ErrPersistedOperationBlocked) {
		code = "PERSISTED_OPERATION_BLOCKED"
	}
	return &httpGraphqlError{message: err.Error(), statusCode: http.StatusOK, extensionCode: code}
}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 67c417f and b254f5a.

📒 Files selected for processing (10)
  • router-tests/automatic_persisted_queries_test.go (4 hunks)
  • router-tests/persisted_operations_over_get_test.go (3 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/graphql_prehandler.go (2 hunks)
  • router/core/operation_blocker.go (6 hunks)
  • router/core/router.go (1 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (2 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-21T15:06:36.664Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2067
File: router/pkg/config/config.schema.json:1637-1644
Timestamp: 2025-07-21T15:06:36.664Z
Learning: In the Cosmo router project, when extending JSON schema validation for security-sensitive fields like JWKS secrets, backwards compatibility is maintained by implementing warnings in the Go code rather than hard validation constraints in the schema. This allows existing configurations to continue working while alerting users to potential security issues.

Applied to files:

  • router/pkg/config/config.schema.json
📚 Learning: 2025-07-30T09:29:46.660Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2090
File: router/pkg/config/config.schema.json:0-0
Timestamp: 2025-07-30T09:29:46.660Z
Learning: The "operation_name_trim_limit" configuration property in router/pkg/config/config.schema.json should be placed at the security level as a sibling to complexity_limits, not inside the complexity_limits object.

Applied to files:

  • router/pkg/config/config.schema.json
🧬 Code graph analysis (5)
router/core/graph_server.go (1)
router/core/operation_blocker.go (1)
  • BlockPersistedOptions (49-52)
router-tests/automatic_persisted_queries_test.go (4)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (284-340)
  • Environment (1727-1763)
  • GraphQLRequest (1903-1911)
router/pkg/config/config.go (3)
  • Config (986-1060)
  • AutomaticPersistedQueriesConfig (853-857)
  • AutomaticPersistedQueriesCacheConfig (836-839)
router/core/router.go (2)
  • Option (172-172)
  • WithGraphApiToken (1626-1630)
router/core/graphql_handler.go (1)
  • NormalizationCacheHeader (39-39)
router/core/router.go (5)
router/internal/persistedoperation/operationstorage/fs/client.go (2)
  • NewClient (25-37)
  • Options (20-22)
router/internal/persistedoperation/client.go (2)
  • NewClient (53-66)
  • Options (37-45)
router/internal/persistedoperation/operationstorage/cdn/client.go (2)
  • NewClient (127-155)
  • Options (22-24)
router/internal/persistedoperation/apq/client.go (2)
  • NewClient (45-65)
  • Options (32-36)
router/internal/persistedoperation/operationstorage/s3/client.go (2)
  • NewClient (36-75)
  • Options (25-33)
router/core/graphql_prehandler.go (1)
router/core/operation_processor.go (1)
  • GraphQLRequestExtensions (188-190)
router/core/operation_blocker.go (1)
router/internal/expr/resolvers.go (1)
  • ResolveBoolExpression (38-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
🔇 Additional comments (13)
router/pkg/config/config.go (1)

410-411: Add block for persisted operations: looks good; confirm semantics vs provider 'disabled'.

The field wiring and envPrefix naming are consistent. Please double-check that docs and schema clearly differentiate:

  • security.block_persisted_operations.enabled → request-time blocking
  • persisted_operations.disabled → disables provider/client

If that’s intended, no code changes needed.

router/pkg/config/config.schema.json (1)

2384-2387: Good refinement of non-persisted wording.

“sent without operation hash” is clearer than “operation ID”. No further action.

router/pkg/config/testdata/config_full.json (1)

658-661: Defaults sample updated: placement and shape are correct.

BlockPersistedOperations matches the Go struct and schema location. LGTM.

router/pkg/config/testdata/config_defaults.json (1)

321-324: Defaults sample updated: matches schema/Go wiring.

Entry is consistent with neighboring security blocks. LGTM.

router-tests/persisted_operations_over_get_test.go (4)

61-62: Updated SHA matches new hashing behavior? Please verify once.

Given the prehandler changes, ensure this SHA256 corresponds to the exact query canonicalization used at runtime.

You can quickly sanity-check using the same hashing function used to generate test vectors (internal helper/script) to avoid future flakiness.


176-177: APQ test vector updated: looks fine.

Hash is updated consistently across requests; flow remains unchanged. LGTM.


185-186: APQ cache hit path stays consistent with new hash.

No issues spotted.


213-214: Mutation-over-GET guard still correct with new hash.

Behavior and expectation remain valid. LGTM.

router-tests/automatic_persisted_queries_test.go (1)

524-526: Updated APQ hashes look consistent

The new sha256 values align across all requests (initial with query, subsequent cache hits). Looks good.

Also applies to: 532-536, 540-546, 553-556

router/core/operation_blocker.go (4)

17-18: New persisted-op error is consistent with existing errors

Naming and usage align with other blocker errors.


24-25: State and VM program fields added correctly

Matches existing pattern for other block types.

Also applies to: 28-29


49-53: Config surface for blocking persisted ops: verify defaults/migration notes

Ensure defaults (disabled) and docs clearly replace any previous “persistedOperationsDisabled” semantics in user config.


112-119: Expression compilation for persisted condition looks good

Error wrapping mirrors other branches.

Comment thread router/core/graph_server.go
Comment thread router/core/graphql_prehandler.go
@endigma endigma force-pushed the jesse/eng-8016-apq-perf branch from 1c4e21f to 87816a1 Compare September 1, 2025 16:26
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/graphql_prehandler.go (1)

562-567: Also guard before setting PersistedQuery

Protect the field access with a nil-check.

Apply:

-		if h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled {
+		if h.operationBlocker != nil &&
+			(h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled) {
 			// Set the request hash to the parsed hash, to see if it matches a persisted operation
 			operationKit.parsedOperation.GraphQLRequestExtensions.PersistedQuery = &GraphQLRequestExtensionsPersistedQuery{
 				Sha256Hash: operationKit.parsedOperation.Sha256Hash,
 			}
 		}
♻️ Duplicate comments (2)
router/core/graphql_prehandler.go (2)

500-501: Nil safety in shouldFetchPersistedOperation

Same nil issue here.

Apply:

-func (h *PreHandler) shouldFetchPersistedOperation(operationKit *OperationKit) bool {
-	return operationKit.parsedOperation.IsPersistedOperation || h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled
-}
+func (h *PreHandler) shouldFetchPersistedOperation(operationKit *OperationKit) bool {
+	if h.operationBlocker == nil {
+		return operationKit.parsedOperation.IsPersistedOperation
+	}
+	return operationKit.parsedOperation.IsPersistedOperation ||
+		h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled
+}

489-491: Avoid nil-deref: guard operationBlocker before field access

h.operationBlocker can be nil; direct field access will panic.

Apply:

-	if !hasPersistedHash && (h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled) {
+	if !hasPersistedHash &&
+		h.operationBlocker != nil &&
+		(h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled) {
 		return true
 	}
🧹 Nitpick comments (4)
router/core/graphql_prehandler.go (1)

704-709: Optional: defensive nil-check around OperationIsBlocked

If h.operationBlocker can be unset, short-circuit to avoid panic.

Apply:

-	if err := h.operationBlocker.OperationIsBlocked(requestContext.logger, requestContext.expressionContext, operationKit.parsedOperation); err != nil {
+	if h.operationBlocker != nil {
+		if err := h.operationBlocker.OperationIsBlocked(requestContext.logger, requestContext.expressionContext, operationKit.parsedOperation); err != nil {
+			return &httpGraphqlError{
+				message:    err.Error(),
+				statusCode: http.StatusOK,
+			}
+		}
+	}
-		return &httpGraphqlError{
-			message:    err.Error(),
-			statusCode: http.StatusOK,
-		}
-	}
router/core/router.go (1)

1006-1071: Provider resolution gated by Disabled flag: solid; add an explicit log when disabled

Helps operators confirm intent at startup.

Apply:

-	if !r.persistedOperationsConfig.Disabled {
+	if !r.persistedOperationsConfig.Disabled {
 		// existing provider resolution...
 	}
+	if r.persistedOperationsConfig.Disabled {
+		r.logger.Info("Persisted operations provider disabled; skipping provider resolution")
+	}
router/core/operation_blocker.go (2)

86-121: Optional: guard against nil exprManager

Be defensive to avoid panics if options are miswired.

Apply:

 func (o *OperationBlocker) compileExpressions(exprManager *expr.Manager) error {
+	if exprManager == nil {
+		return nil
+	}
 	if o.blockMutations.Enabled && o.blockMutations.Condition != "" {

123-139: Minor: keep evaluation log messages symmetric

Consider aligning wording with non-persisted message (“failed to resolve … block expression”) for consistency. Not blocking.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b254f5a and 87816a1.

📒 Files selected for processing (10)
  • router-tests/automatic_persisted_queries_test.go (4 hunks)
  • router-tests/persisted_operations_over_get_test.go (3 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/graphql_prehandler.go (2 hunks)
  • router/core/operation_blocker.go (6 hunks)
  • router/core/router.go (2 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (2 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • router/pkg/config/config.go
  • router/core/graph_server.go
  • router-tests/automatic_persisted_queries_test.go
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/config/config.schema.json
  • router/pkg/config/testdata/config_full.json
  • router-tests/persisted_operations_over_get_test.go
🧰 Additional context used
🧬 Code graph analysis (3)
router/core/graphql_prehandler.go (1)
router/core/operation_processor.go (1)
  • GraphQLRequestExtensions (188-190)
router/core/operation_blocker.go (1)
router/internal/expr/resolvers.go (1)
  • ResolveBoolExpression (38-54)
router/core/router.go (5)
router/internal/persistedoperation/operationstorage/fs/client.go (2)
  • NewClient (25-37)
  • Options (20-22)
router/internal/persistedoperation/client.go (2)
  • NewClient (53-66)
  • Options (37-45)
router/internal/persistedoperation/operationstorage/cdn/client.go (2)
  • NewClient (127-155)
  • Options (22-24)
router/internal/persistedoperation/apq/client.go (2)
  • NewClient (45-65)
  • Options (32-36)
router/internal/persistedoperation/operationstorage/s3/client.go (2)
  • NewClient (36-75)
  • Options (25-33)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (go)
🔇 Additional comments (7)
router/core/graphql_prehandler.go (2)

552-568: LGTM: early sha256 computation path

The forced/conditional compute and metric tagging look correct.


570-578: Good: strict hash-vs-body validation

400 on mismatch is appropriate; aligns with APQ expectations.

router/core/router.go (3)

472-494: Good guardrails for conflicting security settings

Clear errors for unusable states and warnings for edge configs. Nice.


485-493: Good: safelist vs block_persisted_operations conflict handling

Erroring on unconditional block + safelist avoids footguns.


1102-1120: LGTM: persisted operations client assembled only when needed

Backwards-compat for cache size retained.

router/core/operation_blocker.go (2)

17-18: New error for persisted ops blocking

Clear, exported error improves diagnostics across layers.


24-29: Persisted-operations blocking path looks correct

Expression compilation and runtime evaluation mirror the non-persisted flow; unconditional block when no condition is intentional.

Also applies to: 49-52, 112-119, 141-156

Comment thread router/core/graphql_prehandler.go
Comment thread router-tests/automatic_persisted_queries_test.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
router/core/operation_processor.go (2)

450-455: Minor: redundant Request.Query=="" check

Inside the else branch, Request.Query is already empty. Dropping the extra condition clarifies intent.

-		} else if isAPQ && persistedOperationData == nil && o.parsedOperation.Request.Query == "" {
+		} else if isAPQ && persistedOperationData == nil {

441-441: Defensive nil check for PersistedQuery before dereference

Both SaveOperation calls dereference o.parsedOperation.GraphQLRequestExtensions.PersistedQuery. Callers likely guard this, but a defensive check prevents panics if this method is invoked without the extension.

Suggested guard (outside the changed hunk, near the top of the function):

if o.parsedOperation.GraphQLRequestExtensions.PersistedQuery == nil {
	return false, false, &httpGraphqlError{
		message:    "persistedQuery extension is required",
		statusCode: http.StatusBadRequest,
	}
}

Also applies to: 468-468

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 87816a1 and a52f092.

📒 Files selected for processing (1)
  • router/core/operation_processor.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router/core/operation_processor.go (1)
router/internal/persistedoperation/client.go (2)
  • PersistedOperation (13-16)
  • PersistentOperationNotFoundError (18-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/core/operation_processor.go (1)

446-449: Shadowing isAPQ breaks final return semantics

Using := introduces a new isAPQ scoped to the else block. The final return outside the block uses the named return isAPQ (still false). Assign to the named variable instead.

-		persistedOperationData, isAPQ, err := o.operationProcessor.persistedOperationClient.PersistedOperation(ctx, clientInfo.Name, o.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash)
+		persistedOperationData, apq, err := o.operationProcessor.persistedOperationClient.PersistedOperation(ctx, clientInfo.Name, o.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash)
+		isAPQ = apq
 		if err != nil {
 			return false, isAPQ, err
-		} else if isAPQ && persistedOperationData == nil && o.parsedOperation.Request.Query == "" {
+		} else if isAPQ && persistedOperationData == nil && o.parsedOperation.Request.Query == "" {

Likely an incorrect or invalid review comment.

Comment thread router/core/operation_processor.go Outdated
Copy link
Copy Markdown
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@endigma endigma force-pushed the jesse/eng-8016-apq-perf branch from a52f092 to 0bc45d7 Compare September 2, 2025 11:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
router/core/graphql_prehandler.go (1)

474-494: Guard against nil operationBlocker before dereferencing

Add a nil-check to avoid a potential panic if PreHandler is ever constructed without an OperationBlocker (defensive, low risk).

-    if !hasPersistedHash && (h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled) {
+    if !hasPersistedHash && h.operationBlocker != nil &&
+        (h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled) {
         return true
     }
#!/usr/bin/env bash
# Ensure every NewPreHandler call sets OperationBlocker non-nil
rg -nP 'NewPreHandler\(' -C3 -- router | sed -n '1,200p'
🧹 Nitpick comments (6)
router-tests/persisted_operations_over_get_test.go (2)

185-190: Remove repeated inline hash

The same Find hash appears again; prefer a single const (e.g., apqFindSHA) to DRY the test and reduce copy-paste errors.

-               Extensions: []byte(`{"persistedQuery": {"version": 1, "sha256Hash": "e33580cf6276de9a75fb3b1c4b7580fec2a1c8facd13f3487bf6c7c3f854f7e3"}}`),
+               Extensions: []byte(`{"persistedQuery": {"version": 1, "sha256Hash": "` + apqFindSHA + `"}}`),

Add near the top of the file:

+const apqFindSHA = "e33580cf6276de9a75fb3b1c4b7580fec2a1c8facd13f3487bf6c7c3f854f7e3"

176-186: Verified Find query APQ hash The sha256 hash of the Find query matches the expected value. Consider hoisting this hash into a file-level constant to avoid duplication across tests.

router/core/graphql_prehandler.go (4)

500-501: Defensive nil-check in shouldFetchPersistedOperation

Mirror the guard to prevent nil deref.

-return operationKit.parsedOperation.IsPersistedOperation || h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled
+if h.operationBlocker == nil {
+    return operationKit.parsedOperation.IsPersistedOperation
+}
+return operationKit.parsedOperation.IsPersistedOperation ||
+    h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled

562-567: Set PersistedQuery only when operationBlocker is non-nil

Minor safety guard to match the other changes.

-        if h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled {
+        if h.operationBlocker != nil &&
+            (h.operationBlocker.safelistEnabled || h.operationBlocker.logUnknownOperationsEnabled) {
             // Set the request hash to the parsed hash, to see if it matches a persisted operation
             operationKit.parsedOperation.GraphQLRequestExtensions.PersistedQuery = &GraphQLRequestExtensionsPersistedQuery{
                 Sha256Hash: operationKit.parsedOperation.Sha256Hash,
             }
         }

607-611: Nil-guard around logUnknown/safelist checks

Avoid dereferencing operationBlocker if absent.

-            if h.operationBlocker.logUnknownOperationsEnabled && errors.As(err, &poNotFoundErr) {
+            if h.operationBlocker != nil &&
+                h.operationBlocker.logUnknownOperationsEnabled && errors.As(err, &poNotFoundErr) {
                 requestContext.logger.Warn("Unknown persisted operation found", zap.String("query", operationKit.parsedOperation.Request.Query), zap.String("sha256Hash", poNotFoundErr.Sha256Hash))
                 if h.operationBlocker.safelistEnabled {
                     span.End()
                     return err
                 }

704-709: Defensive guard when invoking OperationIsBlocked

Keep behavior identical, but avoid panics in edge constructions.

-    if err := h.operationBlocker.OperationIsBlocked(requestContext.logger, requestContext.expressionContext, operationKit.parsedOperation); err != nil {
-        return &httpGraphqlError{
-            message:    err.Error(),
-            statusCode: http.StatusOK,
-        }
-    }
+    if h.operationBlocker != nil {
+        if err := h.operationBlocker.OperationIsBlocked(requestContext.logger, requestContext.expressionContext, operationKit.parsedOperation); err != nil {
+            return &httpGraphqlError{
+                message:    err.Error(),
+                statusCode: http.StatusOK,
+            }
+        }
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a52f092 and 0bc45d7.

📒 Files selected for processing (11)
  • router-tests/automatic_persisted_queries_test.go (4 hunks)
  • router-tests/persisted_operations_over_get_test.go (3 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/graphql_prehandler.go (2 hunks)
  • router/core/operation_blocker.go (6 hunks)
  • router/core/operation_processor.go (1 hunks)
  • router/core/router.go (2 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (2 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • router/pkg/config/testdata/config_full.json
  • router/core/operation_blocker.go
  • router/pkg/config/testdata/config_defaults.json
  • router-tests/automatic_persisted_queries_test.go
  • router/pkg/config/config.schema.json
  • router/core/operation_processor.go
  • router/core/router.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: build_test
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
router-tests/persisted_operations_over_get_test.go (1)

213-214: APQ hash verified Confirmed that the sha256 of the mutation body matches 49a2f7dd56b06f620c7d040dd9d562a1c16eadf7c149be5decdd62cfc92e1b12.

router/pkg/config/config.go (1)

410-410: LGTM: Config surface for blocking persisted operations

New SecurityConfiguration.BlockPersistedOperations is consistent with existing block configs and env prefixing.

router/core/graph_server.go (1)

1374-1377: LGTM: Correctly wiring BlockPersisted into OperationBlockerOptions

The options mirror other blockers and source from securityConfiguration; no issues spotted.

router/core/graphql_prehandler.go (1)

570-578: Good: 400 on mismatched persistedQuery hash vs body

Strict equality check is correct and fails fast; message is clear.

@endigma endigma requested a review from alepane21 as a code owner September 2, 2025 12:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
router/internal/rdcloser/rdcloser.go (1)

26-31: Prevent panics: guard nil options and default the logger.

opts and opts.Logger can be nil, causing NPEs at multiple call sites.

Apply:

 func NewRedisCloser(opts *RedisCloserOptions) (RDCloser, error) {
+ if opts == nil {
+   return nil, fmt.Errorf("nil RedisCloserOptions")
+ }
+ if opts.Logger == nil {
+   opts.Logger = zap.NewNop()
+ }
   if len(opts.URLs) == 0 {
     return nil, fmt.Errorf("no redis URLs provided")
   }
router/internal/persistedoperation/apq/redis.go (2)

32-45: Propagate constructor errors and avoid returning a half-initialized client.

Return nil on error and close any created client to prevent leaks.

Apply:

   rdb, err := rd.NewRedisCloser(&rd.RedisCloserOptions{
     Logger:         opts.Logger,
     URLs:           opts.StorageConfig.URLs,
     ClusterEnabled: opts.StorageConfig.ClusterEnabled,
   })
 
   rclient := &redisClient{
     logger: opts.Logger,
     client: rdb,
     prefix: opts.Prefix,
   }
 
-  return rclient, err
+  if err != nil {
+    if rdb != nil {
+      _ = rdb.Close()
+    }
+    return nil, err
+  }
+  return rclient, nil

32-37: Add Redis password field to config and pass it to RedisCloser

  • In router/pkg/config/config.go, extend RedisStorageProvider with a Password *string yaml:"password,omitempty" env:"STORAGE_PROVIDER_REDIS_PASSWORD"` field.
  • In router/internal/persistedoperation/apq/redis.go, include Password: opts.StorageConfig.Password in the call to rd.NewRedisCloser.
router/internal/persistedoperation/operationstorage/fs/client.go (1)

48-70: Validate sha256Hash (length/hex) and harden filename construction.

Unvalidated sha256Hash could enable path tricks if not guaranteed hex upstream. Validate 64-hex before using it in a filename.

Apply:

 func (c client) persistedOperation(clientName string, sha256Hash string) ([]byte, error) {
+  if len(sha256Hash) != 64 {
+    return nil, fmt.Errorf("invalid sha256Hash length: %d", len(sha256Hash))
+  }
+  if _, err := hex.DecodeString(sha256Hash); err != nil {
+    return nil, fmt.Errorf("invalid sha256Hash (not hex): %w", err)
+  }
   operationName := fmt.Sprintf("%s.json", sha256Hash)
   objectPath := filepath.Join(c.path, c.options.ObjectPathPrefix, operationName)

Add import:

import (
  // ...
  "encoding/hex"
)
router/internal/persistedoperation/client.go (1)

63-96: Tighten APQ gating and provider‐disabled path in PersistedOperation

  • In the if c.providerClient == nil branch, allow APQ fallback only when c.apqClient.Enabled(); otherwise return an explicit error when both provider and APQ are disabled.
  • In the errors.As(err, &poNotFound) clause, include c.apqClient.Enabled() to prevent using APQ when it’s disabled.
  • Confirm whether any router logic depends on the previous (nil, false, nil) return before merging.
if c.providerClient == nil {
-  return nil, c.apqClient != nil, nil
+  if c.apqClient != nil && c.apqClient.Enabled() {
+    return nil, true, nil
+  }
+  return nil, false, fmt.Errorf("persisted operation provider disabled and APQ disabled")
}

content, err := c.providerClient.PersistedOperation(ctx, clientName, sha256Hash)
- if errors.As(err, &poNotFound) && c.apqClient != nil {
+ if errors.As(err, &poNotFound) && c.apqClient != nil && c.apqClient.Enabled() {
   return content, true, nil
 }
🧹 Nitpick comments (16)
router/internal/rdcloser/rdcloser.go (2)

113-120: Bound Redis healthcheck with a timeout to avoid slow startup on unreachable nodes.

Ping(context.Background()) may block per client defaults; give it a small, predictable cap.

Apply:

 import (
   "context"
   "fmt"
   "net/url"
   "strings"
+  "time"

   "github.com/redis/go-redis/v9"
   "go.uber.org/zap"
 )
@@
 func IsFunctioningClient(rdb RDCloser) (bool, error) {
   if rdb == nil {
     return false, nil
   }
 
-  res, err := rdb.Ping(context.Background()).Result()
+  ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+  defer cancel()
+  res, err := rdb.Ping(ctx).Result()
   return err == nil && res == "PONG", err
 }

Also applies to: 3-11


91-111: Harden cluster URL handling: validate secondary addresses.

Warn (and optionally skip) empty hosts or hosts without ports; helps avoid silent misconfigurations.

Apply:

 func addClusterUrlsToQuery(opts *RedisCloserOptions, parsedUrl *url.URL) {
@@
   for _, rawURL := range opts.URLs[1:] {
     secondaryURL, parseErr := url.Parse(rawURL)
     if parseErr != nil {
       opts.Logger.Warn(fmt.Sprintf("Skipping invalid Redis URL %q: %v", rawURL, parseErr))
       continue
     }
 
     // Strip schema, username, and password
     addr := secondaryURL.Host
+    if addr == "" {
+      opts.Logger.Warn("Empty secondary Redis address", zap.String("raw", rawURL))
+      continue
+    }
+    if !strings.Contains(addr, ":") {
+      opts.Logger.Warn("Secondary Redis address missing port", zap.String("address", addr))
+    }
router/internal/rdcloser/rdcloser_test.go (2)

24-38: Close clients in tests to release resources.

Ensure connections are closed even on failures.

Apply:

   cl, err := NewRedisCloser(&RedisCloserOptions{
     Logger: zaptest.NewLogger(t),
     URLs:   []string{fmt.Sprintf("redis://%s", mr.Addr())},
   })
+  t.Cleanup(func() { if cl != nil { _ = cl.Close() } })

@@
   cl, err := NewRedisCloser(&RedisCloserOptions{
     Logger: zaptest.NewLogger(t),
     URLs:   []string{authUrl},
   })
+  t.Cleanup(func() { if cl != nil { _ = cl.Close() } })

Also applies to: 40-56


58-66: Speed up “Unresponsive redis fails” to reduce test flakiness.

Use short client timeouts so CI isn’t held up by long dials.

Apply:

   _, err := NewRedisCloser(&RedisCloserOptions{
     Logger: zaptest.NewLogger(t),
-    URLs:   []string{"redis://localhost:7000"},
+    URLs:   []string{"redis://localhost:7000?dial_timeout=200ms&read_timeout=200ms&write_timeout=200ms"},
   })
router/pkg/pubsub/redis/adapter.go (2)

97-113: Ensure subscription cleanup closes PubSub.

Unsubscribing is good; also close the PubSub to free resources.

Apply:

   cleanup := func() {
     err := sub.PUnsubscribe(ctx, event.Channels...)
     if err != nil {
       log.Error(fmt.Sprintf("error unsubscribing from redis for topics %v", event.Channels), zap.Error(err))
     }
+    if cerr := sub.Close(); cerr != nil {
+      log.Error("error closing redis PubSub", zap.Error(cerr))
+    }
   }

97-105: Guard against use before Startup.

Mirror the Publish check; fail fast if the connection wasn’t initialized.

Apply:

 func (p *ProviderAdapter) Subscribe(ctx context.Context, event SubscriptionEventConfiguration, updater resolve.SubscriptionUpdater) error {
+ if p.conn == nil {
+   return datasource.NewError("redis connection not initialized", nil)
+ }
   log := p.logger.With(
router/internal/persistedoperation/apq/redis.go (1)

55-58: Fix TTL conversion.

Avoid float conversion; use idiomatic duration math and handle non-positive TTLs explicitly.

Apply:

 func (r *redisClient) Set(ctx context.Context, operationHash string, operationBody []byte, ttl int) error {
-  ttlD := time.Duration(float64(ttl)) * time.Second
+  var ttlD time.Duration
+  if ttl > 0 {
+    ttlD = time.Duration(ttl) * time.Second
+  } else {
+    ttlD = 0 // no expiration
+  }
   status := r.client.Set(ctx, r.prefix+operationHash, operationBody, ttlD)
   return status.Err()
 }
router/debug.config.yaml (3)

11-11: Re-enable file watching for debug (if unintentional).

Commenting out watch degrades DX when iterating on debug.config.yaml.

Apply if you want hot-reload:

-    # watch: true
+    watch: true

15-17: APQ block is commented out — make intent explicit.

If APQ should be disabled in debug, prefer an explicit false to avoid ambiguity.

-# automatic_persisted_queries:
-#   enabled: true
+automatic_persisted_queries:
+  enabled: false

18-22: persisted_operations: clarify interplay with security.block_persisted_operations.

Two levers now exist:

  • persisted_operations.disabled → provider init toggle.
  • security.block_persisted_operations.enabled → runtime request blocking.

Consider showcasing both in debug to avoid confusion and needless provider init when blocked at runtime.

Example addition (verify exact schema path/names):

security:
  block_persisted_operations:
    enabled: false
router/internal/persistedoperation/operationstorage/cdn/client.go (3)

105-126: Bound body size; minor header cleanup.

  • Add a size cap via io.LimitReader to avoid unbounded reads.
  • For GET, set Accept instead of Content-Type.

Apply:

-	req.Header.Set("Content-Type", "application/json; charset=UTF-8")
+	req.Header.Set("Accept", "application/json")
 	req.Header.Add("Authorization", "Bearer "+cdn.authenticationToken)
 	req.Header.Set("Accept-Encoding", "gzip")
@@
-	var reader io.Reader = resp.Body
+	var reader io.Reader = resp.Body
@@
-		reader = r
+		reader = r
 	}
 
-	body, err := io.ReadAll(reader)
+	// Cap at 1 MiB; persisted op JSON should be tiny.
+	const maxPOSize = 1 << 20
+	body, err := io.ReadAll(io.LimitReader(reader, maxPOSize))

Also consider using http.MethodGet in NewRequest for consistency.


130-158: Return the interface from NewClient to reduce coupling.

Returning persistedoperation.StorageClient avoids leaking the concrete type and keeps callers unchanged if internals shift.

-func NewClient(endpoint string, token string, opts Options) (*client, error) {
+func NewClient(endpoint string, token string, opts Options) (persistedoperation.StorageClient, error) {
@@
-	return &client{
+	return &client{
 		cdnURL:              u,
 		authenticationToken: token,
 		federatedGraphID:    url.PathEscape(claims.FederatedGraphID),
 		organizationID:      url.PathEscape(claims.OrganizationID),
 		httpClient:          httpclient.NewRetryableHTTPClient(logger),
 		logger:              logger,
 	}, nil
}

If needed, keep the compile-time assertion.


85-101: Return exported sentinel errors for HTTP 401/400.

  • HTTP 401: return a persistedoperation.ErrUnauthorized instead of errors.New("could not authenticate against CDN")
  • HTTP 400: return a persistedoperation.ErrBadRequest instead of errors.New("bad request")
router/internal/persistedoperation/operationstorage/s3/client.go (1)

37-46: Guard nil TraceProvider to avoid a nil deref at startup

If Options.TraceProvider is unset, Tracer() will panic. Default to a no-op provider.

Apply:

 func NewClient(endpoint string, options *Options) (*Client, error) {
 	client := &Client{
 		options: options,
-		tracer: options.TraceProvider.Tracer(
+		tracer: func() trace.Tracer {
+			tp := options.TraceProvider
+			if tp == nil {
+				tp = sdktrace.NewTracerProvider()
+			}
+			return tp.Tracer(
 			"wundergraph/cosmo/router/s3_persisted_operations_client",
 			trace.WithInstrumentationVersion("0.0.1"),
-		),
+			)
+		}(),
 	}
router/core/router.go (1)

1004-1071: Log when a provider is configured but persisted operations are disabled

Helps users understand why their configured provider is ignored.

Apply:

-	if !r.persistedOperationsConfig.Disabled {
+	if !r.persistedOperationsConfig.Disabled {
 		// provider resolution...
 		// ...
-	}
+	} else if r.persistedOperationsConfig.Storage.ProviderID != "" {
+		r.logger.Warn("Persisted operations provider is configured but disabled; provider will be ignored",
+			zap.String("provider_id", r.persistedOperationsConfig.Storage.ProviderID),
+		)
+	}
router/internal/persistedoperation/client.go (1)

98-105: Consider explicit error when SaveOperation is invoked without APQ.

Silent success can mask misconfiguration. Optionally return a clear error when APQ is not enabled.

Apply:

 func (c *Client) SaveOperation(ctx context.Context, clientName, sha256Hash, operationBody string) error {
   if c.apqClient != nil && c.apqClient.Enabled() {
     return c.apqClient.SaveOperation(ctx, clientName, sha256Hash, []byte(operationBody))
   }
-  return nil
+  return fmt.Errorf("APQ is not enabled; SaveOperation is unavailable")
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0bc45d7 and 63c7f18.

📒 Files selected for processing (13)
  • router/core/operation_processor.go (3 hunks)
  • router/core/ratelimiter.go (1 hunks)
  • router/core/router.go (3 hunks)
  • router/core/router_config.go (2 hunks)
  • router/debug.config.yaml (1 hunks)
  • router/internal/persistedoperation/apq/redis.go (1 hunks)
  • router/internal/persistedoperation/client.go (3 hunks)
  • router/internal/persistedoperation/operationstorage/cdn/client.go (4 hunks)
  • router/internal/persistedoperation/operationstorage/fs/client.go (2 hunks)
  • router/internal/persistedoperation/operationstorage/s3/client.go (2 hunks)
  • router/internal/rdcloser/rdcloser.go (1 hunks)
  • router/internal/rdcloser/rdcloser_test.go (1 hunks)
  • router/pkg/pubsub/redis/adapter.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
router/core/router_config.go (1)
router/internal/persistedoperation/client.go (1)
  • Client (42-46)
router/internal/persistedoperation/operationstorage/fs/client.go (1)
router/internal/persistedoperation/client.go (3)
  • Options (32-40)
  • StorageClient (27-30)
  • PersistedOperation (13-16)
router/internal/persistedoperation/operationstorage/cdn/client.go (3)
router/internal/persistedoperation/client.go (4)
  • StorageClient (27-30)
  • PersistedOperation (13-16)
  • NewClient (48-61)
  • Options (32-40)
router/internal/persistedoperation/operationstorage/fs/client.go (2)
  • NewClient (25-37)
  • Options (20-22)
router/internal/persistedoperation/operationstorage/s3/client.go (2)
  • NewClient (38-76)
  • Options (25-33)
router/core/operation_processor.go (2)
router/internal/persistedoperation/client.go (3)
  • Client (42-46)
  • PersistedOperation (13-16)
  • PersistentOperationNotFoundError (18-21)
router/internal/persistedoperation/apq/client.go (2)
  • Client (16-21)
  • PersistedOperation (11-14)
router/internal/persistedoperation/operationstorage/s3/client.go (1)
router/internal/persistedoperation/client.go (5)
  • StorageClient (27-30)
  • Client (42-46)
  • NewClient (48-61)
  • Options (32-40)
  • PersistedOperation (13-16)
router/core/router.go (1)
router/internal/persistedoperation/client.go (3)
  • StorageClient (27-30)
  • NewClient (48-61)
  • Options (32-40)
router/internal/persistedoperation/client.go (2)
router/internal/persistedoperation/apq/client.go (2)
  • PersistedOperation (11-14)
  • Client (16-21)
router/internal/persistedoperation/operationstorage/cache.go (2)
  • OperationsCache (14-19)
  • NewOperationsCache (21-42)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (9)
router/core/ratelimiter.go (1)

13-14: Import path change looks correct.

rd.RDCloser still satisfies the limiter; no functional change.

router/core/router_config.go (1)

11-11: Import path rename to rdcloser looks good.

router/internal/persistedoperation/operationstorage/cdn/client.go (1)

9-12: Interface assertion and added imports LGTM.

Also applies to: 27-27

router/core/operation_processor.go (1)

466-471: TTL renewal on APQ fetch path looks correct

Re-saving the fetched body to renew TTL is sound, and uses the original bytes.

router/core/router.go (1)

472-493: Good guardrails for conflicting security settings

Clear errors for unconditional blocks and warnings for conditional overlaps read well.

router/internal/persistedoperation/operationstorage/fs/client.go (2)

39-46: Receiver and interface assertion look good.

Value receiver is fine (cheap to copy), and the pointer assertion ensures interface compliance.


72-72: LGTM on Close.

No resources to release for FS; no-op is OK.

router/internal/persistedoperation/client.go (2)

106-109: APQEnabled helper: good addition.

Clear and self-explanatory.


110-120: Close: solid cleanup path.

Closes provider, cache, and APQ safely.

Comment thread router/core/operation_processor.go
Comment thread router/core/router_config.go
Comment thread router/internal/persistedoperation/client.go
Comment thread router/internal/persistedoperation/operationstorage/cdn/client.go
Comment thread router/internal/persistedoperation/operationstorage/fs/client.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 63c7f18 and f3ec76e.

📒 Files selected for processing (2)
  • router/core/operation_processor.go (3 hunks)
  • router/internal/persistedoperation/apq/client.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • router/internal/persistedoperation/apq/client.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.
📚 Learning: 2025-09-02T12:52:27.652Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/operation_processor.go
📚 Learning: 2025-09-02T12:52:27.652Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/operation_processor.go
🧬 Code graph analysis (1)
router/core/operation_processor.go (2)
router/internal/persistedoperation/apq/client.go (2)
  • Client (17-22)
  • PersistedOperation (12-15)
router/internal/persistedoperation/client.go (3)
  • Client (42-46)
  • PersistedOperation (13-16)
  • PersistentOperationNotFoundError (18-21)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router/core/operation_processor.go (1)

103-103: Pointer client type change verified: all OperationProcessorOptions initializations either omit PersistedOperationClient (nil) or pass a *persistedoperation.Client, and no residual references to the old non-pointer type or interfaces remain; the nil-check in FetchPersistedOperation (line 415) covers the disabled-provider path.

Comment thread router/core/operation_processor.go
Comment thread router/internal/rdcloser/rdcloser.go
@endigma endigma force-pushed the jesse/eng-8016-apq-perf branch from 4265796 to f8e6f1e Compare September 3, 2025 09:20
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
router/internal/rediscloser/rediscloser.go (1)

26-31: Guard against nil opts/logger to avoid panics.

NewRedisCloser dereferences opts and opts.Logger without checks; passing a nil logger will panic. Default to zap.NewNop() and validate opts.

 func NewRedisCloser(opts *RedisCloserOptions) (RDCloser, error) {
+	if opts == nil {
+		return nil, fmt.Errorf("nil RedisCloserOptions")
+	}
+	if opts.Logger == nil {
+		opts.Logger = zap.NewNop()
+	}
 	if len(opts.URLs) == 0 {
 		return nil, fmt.Errorf("no redis URLs provided")
 	}
router/pkg/pubsub/redis/adapter.go (2)

97-104: Avoid panic when Subscribe is called before Startup.

p.conn may be nil; mirror the Publish guard to return a typed error.

 func (p *ProviderAdapter) Subscribe(ctx context.Context, event SubscriptionEventConfiguration, updater resolve.SubscriptionUpdater) error {
 	log := p.logger.With(
 		zap.String("provider_id", event.ProviderID),
 		zap.String("method", "subscribe"),
 		zap.Strings("channels", event.Channels),
 	)
+	if p.conn == nil {
+		return datasource.NewError("redis connection not initialized", nil)
+	}
 	sub := p.conn.PSubscribe(ctx, event.Channels...)
 	msgChan := sub.Channel()

118-129: Don’t exit the goroutine on nil messages.

The code logs “skipping” but returns, prematurely ending the subscription loop.

 			case msg, ok := <-msgChan:
 				if !ok {
 					log.Debug("subscription closed, stopping")
 					return
 				}
 				if msg == nil {
 					log.Debug("empty message received on subscription update, skipping")
-					return
+					continue
 				}
♻️ Duplicate comments (3)
router/internal/persistedoperation/client.go (1)

48-53: Handle nil opts and fix error wording.

opts can be nil (panic at Line 49). Also, error message mentions "CDN cache".

Apply:

-func NewClient(opts *Options) (*Client, error) {
-	cacheSize := int64(opts.CacheSize)
+func NewClient(opts *Options) (*Client, error) {
+	if opts == nil {
+		return nil, fmt.Errorf("nil options provided")
+	}
+	cacheSize := int64(opts.CacheSize)
 
 	cache, err := operationstorage.NewOperationsCache(cacheSize)
 	if err != nil {
-		return nil, errors.Join(err, fmt.Errorf("initializing CDN cache"))
+		return nil, errors.Join(err, fmt.Errorf("initializing operations cache"))
 	}
router/core/operation_processor.go (2)

438-447: Avoid err shadowing in the APQ short-circuit path

Use assignment to the named err rather than redeclaration; keeps error handling consistent with the function’s named returns.

-        // If the operation was fetched with APQ, save it again to renew the TTL
-        err := o.operationProcessor.persistedOperationClient.SaveOperation(ctx, clientInfo.Name, o.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash, o.parsedOperation.Request.Query)
+        // If the operation was fetched with APQ, save it again to renew the TTL
+        err = o.operationProcessor.persistedOperationClient.SaveOperation(
+            ctx, clientInfo.Name,
+            o.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash,
+            o.parsedOperation.Request.Query,
+        )

448-452: Remove inner var err error shadowing

This reintroduces error shadowing inside the else-branch. Reuse the named err instead.

-        var persistedOperationData []byte
-        var err error
+        var persistedOperationData []byte
🧹 Nitpick comments (11)
router/internal/rediscloser/rediscloser.go (4)

13-17: Update the doc comment to reflect the actual interface used.

The comment mentions combining redis.Cmdable and io.Closer, but the type embeds redis.UniversalClient. Adjust for clarity.

-// RDCloser is an interface that combines the redis.Cmdable and io.Closer interfaces, ensuring that we can close the
-// client connection.
+// RDCloser is a thin alias over redis.UniversalClient, ensuring we can issue commands and Close the client.

118-120: Prefer a bounded Ping timeout.

Using context.Background() can make tests/timeouts slow on unreachable hosts. Consider a small timeout or making it configurable in options.

Example (requires importing time):

ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
res, err := rdb.Ping(ctx).Result()

86-111: Harden secondary addr handling (edge cases and logging).

  • Skip entries with empty Host.
  • Keep logging sanitized; do not log creds even if provided in secondary URLs.
  • Optionally validate presence of port to avoid silent misconfig.
 	for _, rawURL := range opts.URLs[1:] {
 		secondaryURL, parseErr := url.Parse(rawURL)
 		if parseErr != nil {
 			opts.Logger.Warn(fmt.Sprintf("Skipping invalid Redis URL %q: %v", rawURL, parseErr))
 			continue
 		}
 
-		// Strip schema, username, and password
-		addr := secondaryURL.Host
+		// Strip schema & credentials; only host:port is allowed in addr
+		addr := secondaryURL.Host
+		if addr == "" {
+			opts.Logger.Warn("Skipping secondary Redis URL with empty host", zap.String("url", rawURL))
+			continue
+		}
 		if secondaryURL.User != nil && parsedUrl.User != nil && parsedUrl.User.String() != "" && secondaryURL.User.Username() != parsedUrl.User.Username() {
 			opts.Logger.Warn("Stripping credentials from secondary Redis address", zap.String("address", addr))
 		}

76-81: Close client and return nil on health‐check failure.

Closing the Redis client before returning an error prevents background resource leaks and avoids misuse of an unhealthy client.

-if isFunctioning, err := IsFunctioningClient(rdb); !isFunctioning {
-	return rdb, fmt.Errorf("failed to create a functioning redis client with the provided URLs: %w", err)
-}
+if isFunctioning, err := IsFunctioningClient(rdb); !isFunctioning {
+	_ = rdb.Close()
+	return nil, fmt.Errorf("failed to create a functioning redis client with the provided URLs: %w", err)
+}
router/internal/rediscloser/rediscloser_test.go (1)

58-66: Make the “unresponsive redis” test fast and deterministic.

Using localhost:7000 can hang or be occupied on some dev machines. Point to a refused port and set tight timeouts.

-		_, err := NewRedisCloser(&RedisCloserOptions{
-			Logger: zaptest.NewLogger(t),
-			URLs:   []string{"redis://localhost:7000"},
-		})
+		_, err := NewRedisCloser(&RedisCloserOptions{
+			Logger: zaptest.NewLogger(t),
+			// Port 1 refuses immediately; add short timeouts to avoid slow CI
+			URLs: []string{"redis://127.0.0.1:1?dial_timeout=200ms&read_timeout=200ms&write_timeout=200ms"},
+		})
router/pkg/pubsub/redis/adapter.go (1)

106-112: Ensure PubSub is fully closed on shutdown paths.

Unsubscribing patterns is good; also close the PubSub to stop background goroutines.

 	cleanup := func() {
 		err := sub.PUnsubscribe(ctx, event.Channels...)
 		if err != nil {
 			log.Error(fmt.Sprintf("error unsubscribing from redis for topics %v", event.Channels), zap.Error(err))
 		}
+		_ = sub.Close()
 	}
router/internal/persistedoperation/client.go (1)

110-120: Add nil checks for consistency and defensive programming.

While the fields are typically initialized through NewClient, adding nil checks would prevent potential panics if the struct is created outside the constructor.

 func (c *Client) Close() {
 	if c.providerClient != nil {
 		c.providerClient.Close()
 	}
-	if c.cache != nil && c.cache.Cache != nil {
+	if c != nil && c.cache != nil && c.cache.Cache != nil {
 		c.cache.Cache.Close()
 	}
 	if c.apqClient != nil {
 		c.apqClient.Close()
 	}
 }
router/core/operation_processor.go (4)

103-103: Prefer an interface over a concrete client type for DI/testability

Depending on the concrete *persistedoperation.Client here couples the processor to an implementation and makes testing/mocking harder. Consider narrowing this to a minimal interface the processor needs (e.g., APQEnabled(), PersistedOperation(...), SaveOperation(...)).


127-127: Mirror the options change: keep the field typed to an interface

Same concern as in options: store an interface rather than the concrete *persistedoperation.Client to ease substitutes in tests and future providers.


438-446: Confirm isAPQ semantics for the “query present” short-circuit

You set isAPQ = true when the request includes both hash and query. Is “isAPQ” intended to mean “APQ used this request” (even if body provided) or “body came from APQ storage”? This flag feeds TTL and cache behavior downstream; please confirm the intended invariant.


468-472: Fix misleading comment wording

The operation content here was loaded via hash (no body provided), not “passed via body”.

-            // when we have successfully loaded the operation content from the storage,
-            // but it was passed via body instead of hash, we need to mark operation as persisted
-            // to populate persisted operation cache
+            // Successfully loaded the operation content from storage via hash (no body provided).
+            // Mark as persisted to populate the persisted operation cache.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7a58b4d and b7bbf86.

📒 Files selected for processing (23)
  • router-tests/automatic_persisted_queries_test.go (4 hunks)
  • router-tests/persisted_operations_over_get_test.go (3 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/graphql_prehandler.go (2 hunks)
  • router/core/operation_blocker.go (6 hunks)
  • router/core/operation_processor.go (4 hunks)
  • router/core/ratelimiter.go (1 hunks)
  • router/core/router.go (3 hunks)
  • router/core/router_config.go (2 hunks)
  • router/debug.config.yaml (1 hunks)
  • router/internal/persistedoperation/apq/client.go (1 hunks)
  • router/internal/persistedoperation/apq/redis.go (1 hunks)
  • router/internal/persistedoperation/client.go (3 hunks)
  • router/internal/persistedoperation/operationstorage/cdn/client.go (4 hunks)
  • router/internal/persistedoperation/operationstorage/fs/client.go (2 hunks)
  • router/internal/persistedoperation/operationstorage/s3/client.go (2 hunks)
  • router/internal/rediscloser/rediscloser.go (1 hunks)
  • router/internal/rediscloser/rediscloser_test.go (1 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (2 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
  • router/pkg/pubsub/redis/adapter.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • router/internal/persistedoperation/apq/client.go
🚧 Files skipped from review as they are similar to previous changes (15)
  • router/pkg/config/testdata/config_full.json
  • router/core/ratelimiter.go
  • router/core/graph_server.go
  • router/pkg/config/config.go
  • router/pkg/config/testdata/config_defaults.json
  • router-tests/persisted_operations_over_get_test.go
  • router/internal/persistedoperation/apq/redis.go
  • router/pkg/config/config.schema.json
  • router/debug.config.yaml
  • router/core/operation_blocker.go
  • router/core/graphql_prehandler.go
  • router/core/router_config.go
  • router-tests/automatic_persisted_queries_test.go
  • router/internal/persistedoperation/operationstorage/fs/client.go
  • router/core/router.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-02T12:52:27.652Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/operation_processor.go
📚 Learning: 2025-09-02T12:52:27.652Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/operation_processor.go
🧬 Code graph analysis (4)
router/core/operation_processor.go (2)
router/internal/persistedoperation/apq/client.go (2)
  • Client (17-22)
  • PersistedOperation (12-15)
router/internal/persistedoperation/client.go (3)
  • Client (42-46)
  • PersistedOperation (13-16)
  • PersistentOperationNotFoundError (18-21)
router/internal/persistedoperation/operationstorage/s3/client.go (1)
router/internal/persistedoperation/client.go (4)
  • StorageClient (27-30)
  • Client (42-46)
  • Options (32-40)
  • PersistedOperation (13-16)
router/internal/persistedoperation/client.go (2)
router/internal/persistedoperation/apq/client.go (2)
  • PersistedOperation (12-15)
  • Client (17-22)
router/internal/persistedoperation/operationstorage/cache.go (2)
  • OperationsCache (14-19)
  • NewOperationsCache (21-42)
router/internal/persistedoperation/operationstorage/cdn/client.go (1)
router/internal/persistedoperation/client.go (4)
  • StorageClient (27-30)
  • PersistedOperation (13-16)
  • NewClient (48-61)
  • Options (32-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: Analyze (go)
🔇 Additional comments (11)
router/internal/rediscloser/rediscloser.go (1)

1-1: Package rename/import shuffle looks good.

No functional changes; aligns with new internal package name.

router/internal/rediscloser/rediscloser_test.go (1)

1-1: LGTM on package/import cleanup.

Consistent with the implementation package rename; no behavior changes.

Also applies to: 5-6

router/pkg/pubsub/redis/adapter.go (1)

10-10: Import path update looks correct.

Alias preserved (rd), no functional changes.

router/internal/persistedoperation/client.go (1)

106-108: LGTM! Good addition of the APQEnabled accessor.

The method provides a clean way for external code to check if APQ is enabled, avoiding direct access to internal fields.

router/internal/persistedoperation/operationstorage/cdn/client.go (3)

27-27: Good addition of compile-time interface check.

The compile-time assertion ensures the CDN client correctly implements the StorageClient interface.


42-49: LGTM! Simplified signature aligns with the new StorageClient interface.

The removal of the boolean flag and simplified return signature makes the API cleaner and more consistent with the new interface design.


130-130: LGTM! Returning concrete type provides better type safety.

Returning *client instead of an interface allows callers to access implementation-specific methods if needed, while still satisfying the interface contract.

router/internal/persistedoperation/operationstorage/s3/client.go (3)

35-35: Good addition of compile-time interface check.

The compile-time assertion ensures the S3 client correctly implements the StorageClient interface, consistent with other storage implementations.


37-38: LGTM! Consistent API changes across storage implementations.

The return type change to *Client aligns with other storage implementations and provides better type safety.


78-85: LGTM! Simplified error handling pattern.

The removal of the boolean flag and simplified return signature makes the API cleaner and more consistent with the new StorageClient interface.

router/core/operation_processor.go (1)

429-436: Only renew APQ TTL with the raw query
Guard all TTL-renewal calls to use o.parsedOperation.Request.Query when non-empty to avoid persisting the normalized representation; if you need to renew TTL without rewriting the body, add a dedicated “touch”/renew-only method to the PersistedOperationClient instead.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
router/internal/persistedoperation/client.go (2)

75-78: Avoid silent “success” when no provider and APQ disabled.

Returning (nil, false, nil) hides a configuration error and forces ambiguous downstream handling. Prefer a consistent NotFound signal.

-	if c.providerClient == nil {
-		// This can happen if we are using APQ client, without any persisted operation client. Otherwise, we should have a provider client and shouldn't reach here.
-		return nil, c.apqClient != nil, nil
-	}
+	if c.providerClient == nil {
+		// No provider configured. If APQ is enabled we allow APQ flow to continue,
+		// otherwise signal a clear not-found for persisted operations.
+		if c.apqClient != nil {
+			return nil, true, nil
+		}
+		return nil, false, &PersistentOperationNotFoundError{
+			ClientName: clientName,
+			Sha256Hash: sha256Hash,
+		}
+	}

71-73: Guard nil Cache in Get/Set

In router/internal/persistedoperation/operationstorage/cache.go, (*OperationsCache).Get and Set unconditionally call methods on c.Cache, which is nil when caching is disabled (cacheSize ≤ 0), causing a panic. Add early checks:

func (c *OperationsCache) Get(…) []byte {
    if c.Cache == nil {
        return nil
    }
    …
}

func (c *OperationsCache) Set(…) {
    if c.Cache == nil {
        return
    }
    …
}
router/internal/persistedoperation/operationstorage/s3/client.go (1)

38-46: Guard nil options and tolerate missing TraceProvider.

Avoid nil deref on options/TraceProvider. Tracer isn’t used here—noop is fine.

-func NewClient(endpoint string, options *Options) (*Client, error) {
-	client := &Client{
-		options: options,
-		tracer: options.TraceProvider.Tracer(
+func NewClient(endpoint string, options *Options) (*Client, error) {
+	if options == nil {
+		return nil, fmt.Errorf("nil options")
+	}
+	client := &Client{
+		options: options,
+	}
+	if options.TraceProvider != nil {
+		client.tracer = options.TraceProvider.Tracer(
 			"wundergraph/cosmo/router/s3_persisted_operations_client",
 			trace.WithInstrumentationVersion("0.0.1"),
-		),
-	}
+		)
+	}
♻️ Duplicate comments (1)
router/internal/persistedoperation/client.go (1)

48-60: Fix nil options panic and error context string.

opts can be nil (panic at Line 49). Also, wrap error with the correct context (“operations cache”).

 func NewClient(opts *Options) (*Client, error) {
-	cacheSize := int64(opts.CacheSize)
+	if opts == nil {
+		opts = &Options{}
+	}
+	cacheSize := int64(opts.CacheSize)
 
 	cache, err := operationstorage.NewOperationsCache(cacheSize)
 	if err != nil {
-		return nil, errors.Join(err, fmt.Errorf("initializing CDN cache"))
+		return nil, errors.Join(err, fmt.Errorf("initializing operations cache"))
 	}
 
 	return &Client{
 		providerClient: opts.ProviderClient,
 		cache:          cache,
 		apqClient:      opts.ApqClient,
 	}, nil
 }
🧹 Nitpick comments (15)
router/debug.config.yaml (1)

11-11: Commenting out file watch in debug config may reduce DX; consider clarifying intent

If disabling schema file hot-reload here is intentional, add a short inline note (e.g., “disable for stability”) or a separate debug profile that keeps it on. Otherwise, re-enable for local iteration.

router/internal/rediscloser/rediscloser.go (5)

13-17: Update the interface comment to match the actual type

The comment mentions Cmdable and io.Closer, but RDCloser embeds redis.UniversalClient now. Suggest aligning the comment to avoid confusion.

Apply this diff:

-// RDCloser is an interface that combines the redis.Cmdable and io.Closer interfaces, ensuring that we can close the
-// client connection.
+// RDCloser is an interface satisfied by redis.UniversalClient, ensuring we can use single/cluster clients and close them.

76-81: Return nil client on connectivity failure

Returning a non-nil client alongside an error can lead to accidental use of a faulty client. Prefer nil on error.

Apply this diff:

- if isFunctioning, err := IsFunctioningClient(rdb); !isFunctioning {
-     return rdb, fmt.Errorf("failed to create a functioning redis client with the provided URLs: %w", err)
- }
+ if isFunctioning, err := IsFunctioningClient(rdb); !isFunctioning {
+     return nil, fmt.Errorf("failed to create a functioning redis client with the provided URLs: %w", err)
+ }

118-120: Bound the Ping with a timeout

Avoid potential hangs under network issues by using a short context timeout.

Apply this diff:

-func IsFunctioningClient(rdb RDCloser) (bool, error) {
+func IsFunctioningClient(rdb RDCloser) (bool, error) {
     if rdb == nil {
         return false, nil
     }
 
-    res, err := rdb.Ping(context.Background()).Result()
+    ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
+    defer cancel()
+    res, err := rdb.Ping(ctx).Result()
     return err == nil && res == "PONG", err
 }

91-111: Skip invalid secondary addresses with empty host

When a secondary URL lacks a host (e.g., missing scheme or malformed), addr becomes empty. Skip adding it to avoid creating an invalid query.

Apply this diff:

-    for _, rawURL := range opts.URLs[1:] {
+    for _, rawURL := range opts.URLs[1:] {
         secondaryURL, parseErr := url.Parse(rawURL)
         if parseErr != nil {
             opts.Logger.Warn(fmt.Sprintf("Skipping invalid Redis URL %q: %v", rawURL, parseErr))
             continue
         }
 
         // Strip schema, username, and password
         addr := secondaryURL.Host
+        if addr == "" {
+            opts.Logger.Warn("Skipping secondary Redis address with empty host", zap.String("raw", rawURL))
+            continue
+        }

104-106: Consider failing fast on scheme mismatch

Logging a warning for mixed redis/rediss schemes may hide TLS misconfiguration. Consider returning an error when schemes differ to avoid subtle production issues.

router/internal/rediscloser/rediscloser_test.go (2)

27-37: Close clients to avoid goroutine leaks in tests

Add a cleanup to close the client.

Apply this diff:

 		cl, err := NewRedisCloser(&RedisCloserOptions{
 			Logger: zaptest.NewLogger(t),
 			URLs:   []string{fmt.Sprintf("redis://%s", mr.Addr())},
 		})
 
 		require.NoError(t, err)
 		require.NotNil(t, cl)
+		t.Cleanup(func() { _ = cl.Close() })

45-56: Close clients to avoid goroutine leaks in tests

Same here for the auth case.

Apply this diff:

 		cl, err := NewRedisCloser(&RedisCloserOptions{
 			Logger: zaptest.NewLogger(t),
 			URLs:   []string{authUrl},
 		})
 
 		require.NoError(t, err)
 		require.NotNil(t, cl)
+		t.Cleanup(func() { _ = cl.Close() })
router-tests/persisted_operations_over_get_test.go (1)

59-66: Centralize persisted hash test data to reduce drift

Multiple hardcoded sha256 hashes are updated here. Consider defining named constants or a small map at the top of the file (or a shared test helper) so future hash changes require a single edit and are self-documented (e.g., HashOfFindQueryOverGET, HashOfMutationOverGET).

Also applies to: 76-83, 175-186, 183-191, 213-219

router-tests/automatic_persisted_queries_test.go (4)

91-126: Assert HTTP status for hash/query mismatch

The new test exercises the mismatch path well; also assert that the status is 400 to lock in semantics.

Apply this diff:

 				res1, err := xEnv.MakeGraphQLRequest(testenv.GraphQLRequest{
 					Query:      `{__typename}`,
 					Extensions: []byte(`{"persistedQuery": {"version": 1, "sha256Hash": "85d996c3662d12de4f4abc17ba6f7aa696c1e760c7ed482a8ae64c49a7d68773"}}`),
 					Header:     header,
 				})
 				require.NoError(t, err)
+				require.Equal(t, http.StatusBadRequest, res1.Response.StatusCode)
 				require.Equal(t, `{"errors":[{"message":"persistedQuery sha256 hash does not match query body"}]}`, res1.Body)

128-157: Reduce flakiness in TTL tests

Sleep-based TTL checks can be flaky on slow CI. If feasible, consider injecting a clock or using a slightly larger safety margin (e.g., TTL 2s, sleep 3.5–4s) to lower false negatives.

Also applies to: 160-197, 270-299, 301-347, 359-405, 446-475


20-89: Add an e2e test for blocking persisted operations

Given this PR adds security.block_persisted_operations, add a router-test that:

  • enables PO provider
  • toggles block_persisted_operations.enabled: true (and an optional condition)
  • asserts that persisted operations are rejected while non-persisted/APQ still work.

I can sketch this test using testenv.Config if you want.


63-79: Compute APQ hashes in tests: Replace the hardcoded sha256 values with runtime computation (e.g. using Go’s crypto/sha256) for both the single-query and multi‐operation documents in router-tests/automatic_persisted_queries_test.go (lines 63–79, 125–149, 270–296, 447–465) to prevent accidental divergence.

router/internal/persistedoperation/operationstorage/cdn/client.go (1)

73-76: Nit: let net/http auto-handle gzip to simplify.

If you don’t set Accept-Encoding, Go auto-decompresses; you can drop manual gzip handling.

-	req.Header.Set("Accept-Encoding", "gzip")
+	// Let net/http manage Accept-Encoding and auto-decompression.
-	if resp.Header.Get("Content-Encoding") == "gzip" {
-		r, err := gzip.NewReader(resp.Body)
-		if err != nil {
-			return nil, errors.New("could not create gzip reader. " + err.Error())
-		}
-		defer r.Close()
-		reader = r
-	}
+	// No manual gzip reader needed; http.Client auto-decompresses unless header is set explicitly.

Also applies to: 105-112

router/internal/persistedoperation/operationstorage/s3/client.go (1)

78-85: Pointer receivers to avoid copying the client.

Use *Client receivers for methods; reduces copies and aligns with interface assertion.

-func (c Client) PersistedOperation(ctx context.Context, clientName, sha256Hash string) ([]byte, error) {
+func (c *Client) PersistedOperation(ctx context.Context, clientName, sha256Hash string) ([]byte, error) {
@@
-func (c Client) persistedOperation(ctx context.Context, clientName, sha256Hash string) ([]byte, error) {
+func (c *Client) persistedOperation(ctx context.Context, clientName, sha256Hash string) ([]byte, error) {
@@
-func (c Client) Close() {}
+func (c *Client) Close() {}

Also applies to: 87-107, 109-110

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7a58b4d and b7bbf86.

📒 Files selected for processing (23)
  • router-tests/automatic_persisted_queries_test.go (4 hunks)
  • router-tests/persisted_operations_over_get_test.go (3 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/graphql_prehandler.go (2 hunks)
  • router/core/operation_blocker.go (6 hunks)
  • router/core/operation_processor.go (4 hunks)
  • router/core/ratelimiter.go (1 hunks)
  • router/core/router.go (3 hunks)
  • router/core/router_config.go (2 hunks)
  • router/debug.config.yaml (1 hunks)
  • router/internal/persistedoperation/apq/client.go (1 hunks)
  • router/internal/persistedoperation/apq/redis.go (1 hunks)
  • router/internal/persistedoperation/client.go (3 hunks)
  • router/internal/persistedoperation/operationstorage/cdn/client.go (4 hunks)
  • router/internal/persistedoperation/operationstorage/fs/client.go (2 hunks)
  • router/internal/persistedoperation/operationstorage/s3/client.go (2 hunks)
  • router/internal/rediscloser/rediscloser.go (1 hunks)
  • router/internal/rediscloser/rediscloser_test.go (1 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (2 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
  • router/pkg/pubsub/redis/adapter.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (12)
  • router/pkg/config/testdata/config_full.json
  • router/core/router_config.go
  • router/internal/persistedoperation/apq/client.go
  • router/pkg/config/testdata/config_defaults.json
  • router/pkg/pubsub/redis/adapter.go
  • router/pkg/config/config.go
  • router/core/graph_server.go
  • router/internal/persistedoperation/operationstorage/fs/client.go
  • router/pkg/config/config.schema.json
  • router/core/graphql_prehandler.go
  • router/internal/persistedoperation/apq/redis.go
  • router/core/operation_processor.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.
📚 Learning: 2025-09-02T12:52:27.652Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router-tests/persisted_operations_over_get_test.go
  • router-tests/automatic_persisted_queries_test.go
📚 Learning: 2025-09-02T12:52:27.652Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router-tests/persisted_operations_over_get_test.go
  • router-tests/automatic_persisted_queries_test.go
🧬 Code graph analysis (6)
router/core/router.go (1)
router/internal/persistedoperation/client.go (3)
  • StorageClient (27-30)
  • NewClient (48-61)
  • Options (32-40)
router-tests/automatic_persisted_queries_test.go (3)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (284-340)
  • Environment (1727-1763)
  • GraphQLRequest (1903-1911)
router/core/router.go (2)
  • Option (172-172)
  • WithGraphApiToken (1649-1653)
router/core/graphql_handler.go (1)
  • NormalizationCacheHeader (39-39)
router/core/operation_blocker.go (1)
router/internal/expr/resolvers.go (1)
  • ResolveBoolExpression (38-54)
router/internal/persistedoperation/operationstorage/s3/client.go (1)
router/internal/persistedoperation/client.go (5)
  • StorageClient (27-30)
  • Client (42-46)
  • NewClient (48-61)
  • Options (32-40)
  • PersistedOperation (13-16)
router/internal/persistedoperation/client.go (2)
router/internal/persistedoperation/apq/client.go (2)
  • PersistedOperation (12-15)
  • Client (17-22)
router/internal/persistedoperation/operationstorage/cache.go (2)
  • OperationsCache (14-19)
  • NewOperationsCache (21-42)
router/internal/persistedoperation/operationstorage/cdn/client.go (1)
router/internal/persistedoperation/client.go (3)
  • StorageClient (27-30)
  • PersistedOperation (13-16)
  • Options (32-40)
🔇 Additional comments (10)
router/internal/rediscloser/rediscloser.go (1)

1-1: Package rename looks good

The move to package rediscloser aligns with the new import path usage across the codebase.

router/internal/rediscloser/rediscloser_test.go (1)

1-1: Package rename matches implementation

Tests now align with package rediscloser. Good.

router-tests/automatic_persisted_queries_test.go (1)

521-533: Nice: APQ normalization cache coverage is strong

The new hash and assertions around HIT/MISS for multi-op documents look good and exercise the critical path.

Also applies to: 540-546, 551-555

router/internal/persistedoperation/client.go (1)

106-109: APQEnabled helper: LGTM.

Simple and correct.

router/internal/persistedoperation/operationstorage/cdn/client.go (2)

27-28: Good: compile-time interface assertion.

Ensures API drift is caught at build time.


85-101: Good: unified NotFound mapping.

Translates 404 to PersistentOperationNotFoundError, aligning with the caller’s error handling.

router/core/operation_blocker.go (1)

49-52: Persisted-ops blocking: solid addition; parity with non-persisted path.

The new options and expression flow mirror existing logic. Good defaults (block-all when condition absent).

Please add tests that cover:

  • Enabled + nil condition → blocks persisted ops.
  • Enabled + condition true/false.
  • Disabled → no persisted-specific blocking.

Also applies to: 112-156

router/core/ratelimiter.go (1)

13-14: Import path switch: LGTM.

Alias preserved; public API unchanged.

router/core/router.go (2)

40-40: LGTM: redis import path swap is correct

Alias rd matches NewRedisCloser usage; no API/behavior change.


1004-1071: PersistedOperationClient nil-safety verified
All calls to persistedOperationClient occur within OperationKit.FetchPersistedOperation, which immediately returns if the client is nil, and the shutdown path in router.go guards Close() with a nil check. No additional nil-guards are required.

Comment thread router/core/router.go
Comment thread router/core/router.go
Comment thread router/internal/persistedoperation/operationstorage/s3/client.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/events/nats_events_test.go (1)

1290-1293: Minor: replace magic duration with a named helper/const

Consider a small helper (e.g., testenv.DisableSubscriptionHeartbeats()) or a named const (e.g., noHeartbeatsInterval = time.Hour) to make the intent explicit and reusable across SSE tests.

I can add a tiny helper in testenv and wire it here if you like.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b7bbf86 and 364ed2b.

📒 Files selected for processing (1)
  • router-tests/events/nats_events_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.652Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.
📚 Learning: 2025-08-28T09:18:10.085Z
Learnt from: endigma
PR: wundergraph/cosmo#2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.085Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/events/nats_events_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
router-tests/events/nats_events_test.go (1)

1290-1293: Deterministic SSE: disabling heartbeats here is a good call

Prevents stray heartbeat frames from interleaving with data and makes the filter assertions stable. Looks good.

@endigma endigma force-pushed the jesse/eng-8016-apq-perf branch from 364ed2b to b7bbf86 Compare September 3, 2025 16:36
@endigma endigma force-pushed the jesse/eng-8016-apq-perf branch from b7bbf86 to 521b8d8 Compare September 3, 2025 19:22
@endigma endigma merged commit 24fafa7 into main Sep 3, 2025
28 of 29 checks passed
@endigma endigma deleted the jesse/eng-8016-apq-perf branch September 3, 2025 19:36
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/internal/persistedoperation/operationstorage/s3/client.go (1)

37-45: Nil-safety: guard Options and TraceProvider; add sane timeout for IAM metadata.

NewClient dereferences options and options.TraceProvider unconditionally; both can be nil and will panic. Also the IAM HTTP client lacks a timeout.

Apply:

 func NewClient(endpoint string, options *Options) (*Client, error) {
-	client := &Client{
-		options: options,
-		tracer: options.TraceProvider.Tracer(
-			"wundergraph/cosmo/router/s3_persisted_operations_client",
-			trace.WithInstrumentationVersion("0.0.1"),
-		),
-	}
+	if options == nil {
+		options = &Options{}
+	}
+	tp := options.TraceProvider
+	if tp == nil {
+		tp = trace.NewNoopTracerProvider()
+	}
+	client := &Client{
+		options: options,
+		tracer:  tp.Tracer("wundergraph/cosmo/router/s3_persisted_operations_client", trace.WithInstrumentationVersion("0.0.1")),
+	}
@@
-		&credentials.IAM{
-			Client: &http.Client{
-				Transport: http.DefaultTransport,
-			},
-		},
+		&credentials.IAM{
+			Client: &http.Client{
+				Transport: http.DefaultTransport,
+				Timeout:   2 * time.Second,
+			},
+		},

Note: add import for time.

♻️ Duplicate comments (4)
router/internal/persistedoperation/operationstorage/s3/client.go (1)

87-107: Map S3 “not found” to PersistentOperationNotFoundError to enable APQ fallback.

MinIO/AWS return NoSuchKey via Stat/Read; without mapping, callers can’t distinguish 404 and APQ fallback won’t trigger.

Apply:

 func (c Client) persistedOperation(ctx context.Context, clientName, sha256Hash string) ([]byte, error) {
 	objectPath := fmt.Sprintf("%s/%s.json", c.options.ObjectPathPrefix, sha256Hash)
-	reader, err := c.client.GetObject(ctx, c.options.BucketName, objectPath, minio.GetObjectOptions{})
+	obj, err := c.client.GetObject(ctx, c.options.BucketName, objectPath, minio.GetObjectOptions{})
 	if err != nil {
 		return nil, err
 	}
-	defer reader.Close()
+	defer obj.Close()
+
+	// Verify existence and normalize "not found"
+	if _, err := obj.Stat(ctx); err != nil {
+		resp := minio.ToErrorResponse(err)
+		if resp.StatusCode == http.StatusNotFound || resp.Code == "NoSuchKey" {
+			return nil, &persistedoperation.PersistentOperationNotFoundError{
+				ClientName: clientName,
+				Sha256Hash: sha256Hash,
+			}
+		}
+		return nil, err
+	}
 
-	body, err := io.ReadAll(reader)
+	body, err := io.ReadAll(obj)
 	if err != nil {
-		return nil, err
+		resp := minio.ToErrorResponse(err)
+		if resp.StatusCode == http.StatusNotFound || resp.Code == "NoSuchKey" {
+			return nil, &persistedoperation.PersistentOperationNotFoundError{
+				ClientName: clientName,
+				Sha256Hash: sha256Hash,
+			}
+		}
+		return nil, err
 	}
router/internal/persistedoperation/operationstorage/fs/client.go (1)

25-36: Nil-safety and path traversal: guard options and sanitize ObjectPathPrefix.

options may be nil (panic later), and a leading “/” or “..” in ObjectPathPrefix can escape the storage root when joined.

Apply:

 func NewClient(path string, options *Options) (*client, error) {
   absolutePath, err := filepath.Abs(path)
   if err != nil {
     return nil, fmt.Errorf("failed to get absolute storage path: %w", err)
   }
 
-	client := &client{
-		path:    absolutePath,
-		options: options,
-	}
+	if options == nil {
+		options = &Options{}
+	}
+	prefix := filepath.Clean(options.ObjectPathPrefix)
+	if filepath.IsAbs(prefix) || prefix == ".." || strings.HasPrefix(prefix, ".."+string(os.PathSeparator)) {
+		return nil, fmt.Errorf("invalid ObjectPathPrefix %q: must be relative to storage root", options.ObjectPathPrefix)
+	}
+	client := &client{
+		path:    absolutePath,
+		options: &Options{ObjectPathPrefix: prefix},
+	}

Add import:

-import (
+import (
@@
 	"path/filepath"
+	"strings"
 )
router/core/router.go (2)

472-494: Fail fast when safelist is enabled but persisted-ops provider is disabled

Right now this config boots with only a warning elsewhere and results in a non-functional safelist. Add a hard guard to prevent footguns.

Apply this diff near these checks:

@@
   if r.persistedOperationsConfig.Safelist.Enabled && r.securityConfiguration.BlockPersistedOperations.Enabled {
@@
     r.logger.Warn("The security configuration field 'block_persisted_operations' is enabled alongside the persisted operations safelist. Take care to ensure this is intentional. Misconfiguration will result in safelisted queries being blocked.")
   }
+
+  // Safelist requires a persisted operations storage provider; disabling the provider makes safelist unusable.
+  if r.persistedOperationsConfig.Safelist.Enabled && r.persistedOperationsConfig.Disabled {
+    return nil, errors.New("safelist requires a persisted operations storage provider; set persisted_operations.disabled=false or configure a provider")
+  }

1004-1071: Unknown ProviderID silently disables persisted ops when graphApiToken is empty

If an unrecognized persisted-ops Storage.ProviderID is set and graphApiToken is empty, the code falls through without error and pClient stays nil. This is surprising and hard to diagnose.

Minimal, targeted fix: error out whenever ProviderID is non-empty but unrecognized, regardless of token presence. Keep the CDN fallback only when no ProviderID is set.

@@
-    } else if r.graphApiToken != "" {
-      if r.persistedOperationsConfig.Storage.ProviderID != "" {
-        return fmt.Errorf("unknown storage provider id '%s' for persisted operations", r.persistedOperationsConfig.Storage.ProviderID)
-      }
-
-      c, err := cdn.NewClient(r.cdnConfig.URL, r.graphApiToken, cdn.Options{
-        Logger: r.logger,
-      })
-      if err != nil {
-        return err
-      }
-      pClient = c
-
-      r.logger.Debug("Default to Cosmo CDN as persisted operations provider",
-        zap.String("url", r.cdnConfig.URL),
-      )
-    }
+    } else if r.persistedOperationsConfig.Storage.ProviderID != "" {
+      // ProviderID was configured but not resolved above -> always error
+      return fmt.Errorf("unknown storage provider id '%s' for persisted operations", r.persistedOperationsConfig.Storage.ProviderID)
+    } else if r.graphApiToken != "" {
+      // No ProviderID configured: default to Cosmo CDN if token is available
+      c, err := cdn.NewClient(r.cdnConfig.URL, r.graphApiToken, cdn.Options{
+        Logger: r.logger,
+      })
+      if err != nil {
+        return err
+      }
+      pClient = c
+      r.logger.Debug("Default to Cosmo CDN as persisted operations provider",
+        zap.String("url", r.cdnConfig.URL),
+      )
+    }

Optional: emit a clear log when the provider is explicitly disabled so operators aren’t left guessing.

@@
-  if !r.persistedOperationsConfig.Disabled {
+  if !r.persistedOperationsConfig.Disabled {
     // existing provider resolution...
-  }
+  } else {
+    r.logger.Info("Persisted operations storage provider is disabled; skipping provider setup")
+  }
🧹 Nitpick comments (8)
router/internal/rediscloser/rediscloser.go (4)

26-35: Guard against nil options/logger to avoid panics

opts and opts.Logger can be nil and will panic on first use. Default to a no-op logger.

 func NewRedisCloser(opts *RedisCloserOptions) (RDCloser, error) {
+	if opts == nil {
+		return nil, fmt.Errorf("nil RedisCloserOptions")
+	}
+	if opts.Logger == nil {
+		opts.Logger = zap.NewNop()
+	}
 	if len(opts.URLs) == 0 {
 		return nil, fmt.Errorf("no redis URLs provided")
 	}

76-81: Close client and return nil on failure to ping

Avoid returning a half-initialized client when connectivity checks fail; close it and return nil.

-	if isFunctioning, err := IsFunctioningClient(rdb); !isFunctioning {
-		return rdb, fmt.Errorf("failed to create a functioning redis client with the provided URLs: %w", err)
-	}
+	if isFunctioning, err := IsFunctioningClient(rdb); !isFunctioning {
+		_ = rdb.Close()
+		return nil, fmt.Errorf("failed to create a functioning redis client with the provided URLs: %w", err)
+	}

118-120: Consider a bounded Ping timeout

Ping(context.Background()) may wait on client defaults. Optionally add a PingTimeout in options and use context.WithTimeout here to keep startup snappy and tests fast.


93-101: Use structured logging for invalid secondary URLs

Minor nit: avoid fmt.Sprintf in logs and use fields.

-			opts.Logger.Warn(fmt.Sprintf("Skipping invalid Redis URL %q: %v", rawURL, parseErr))
+			opts.Logger.Warn("Skipping invalid Redis URL",
+				zap.String("url", rawURL), zap.Error(parseErr))
router/internal/rediscloser/rediscloser_test.go (1)

58-66: Deflake and speed up the “unresponsive redis” test

Relying on localhost:7000 may be flaky if something is listening. Also, default timeouts slow the test when unreachable. Point to a nearly guaranteed closed port and set short dial timeout.

-		_, err := NewRedisCloser(&RedisCloserOptions{
-			Logger: zaptest.NewLogger(t),
-			URLs:   []string{"redis://localhost:7000"},
-		})
+		_, err := NewRedisCloser(&RedisCloserOptions{
+			Logger: zaptest.NewLogger(t),
+			URLs:   []string{"redis://127.0.0.1:1?dial_timeout=100ms&read_timeout=100ms"},
+		})

If you adopt returning nil on error in NewRedisCloser, consider also asserting the client is nil here.

router/internal/persistedoperation/operationstorage/s3/client.go (1)

17-17: Remove unused Option type.

type Option is defined but unused in this implementation.

-type Option func(*Client)
router/core/operation_processor.go (2)

100-105: Prefer depending on interfaces in options for testability.

Using *persistedoperation.Client couples OperationProcessor to a concrete type and complicates mocking. Consider accepting a narrow interface here.


438-447: Avoid err shadowing in APQ-with-query path.

Use assignment to the named return ‘err’ to simplify flow.

-		err := o.operationProcessor.persistedOperationClient.SaveOperation(ctx, clientInfo.Name, o.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash, o.parsedOperation.Request.Query)
-		if err != nil {
+		if err = o.operationProcessor.persistedOperationClient.SaveOperation(
+			ctx, clientInfo.Name,
+			o.parsedOperation.GraphQLRequestExtensions.PersistedQuery.Sha256Hash,
+			o.parsedOperation.Request.Query,
+		); err != nil {
 			return false, true, err
 		}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 364ed2b and 521b8d8.

📒 Files selected for processing (23)
  • router-tests/automatic_persisted_queries_test.go (4 hunks)
  • router-tests/persisted_operations_over_get_test.go (3 hunks)
  • router/core/graph_server.go (1 hunks)
  • router/core/graphql_prehandler.go (2 hunks)
  • router/core/operation_blocker.go (6 hunks)
  • router/core/operation_processor.go (4 hunks)
  • router/core/ratelimiter.go (1 hunks)
  • router/core/router.go (3 hunks)
  • router/core/router_config.go (2 hunks)
  • router/debug.config.yaml (1 hunks)
  • router/internal/persistedoperation/apq/client.go (1 hunks)
  • router/internal/persistedoperation/apq/redis.go (1 hunks)
  • router/internal/persistedoperation/client.go (3 hunks)
  • router/internal/persistedoperation/operationstorage/cdn/client.go (4 hunks)
  • router/internal/persistedoperation/operationstorage/fs/client.go (2 hunks)
  • router/internal/persistedoperation/operationstorage/s3/client.go (2 hunks)
  • router/internal/rediscloser/rediscloser.go (1 hunks)
  • router/internal/rediscloser/rediscloser_test.go (1 hunks)
  • router/pkg/config/config.go (1 hunks)
  • router/pkg/config/config.schema.json (2 hunks)
  • router/pkg/config/testdata/config_defaults.json (1 hunks)
  • router/pkg/config/testdata/config_full.json (1 hunks)
  • router/pkg/pubsub/redis/adapter.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • router/core/ratelimiter.go
🚧 Files skipped from review as they are similar to previous changes (15)
  • router/pkg/config/config.go
  • router/pkg/config/testdata/config_full.json
  • router/internal/persistedoperation/apq/client.go
  • router/pkg/config/testdata/config_defaults.json
  • router/debug.config.yaml
  • router-tests/persisted_operations_over_get_test.go
  • router/pkg/pubsub/redis/adapter.go
  • router-tests/automatic_persisted_queries_test.go
  • router/core/graphql_prehandler.go
  • router/core/router_config.go
  • router/internal/persistedoperation/apq/redis.go
  • router/internal/persistedoperation/client.go
  • router/internal/persistedoperation/operationstorage/cdn/client.go
  • router/pkg/config/config.schema.json
  • router/core/operation_blocker.go
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/graph_server.go
  • router/core/operation_processor.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/graph_server.go
  • router/core/operation_processor.go
🧬 Code graph analysis (5)
router/core/graph_server.go (1)
router/core/operation_blocker.go (1)
  • BlockPersistedOptions (49-52)
router/core/router.go (4)
router/internal/persistedoperation/client.go (3)
  • StorageClient (27-30)
  • NewClient (48-61)
  • Options (32-40)
router/internal/persistedoperation/operationstorage/cdn/client.go (2)
  • NewClient (130-158)
  • Options (23-25)
router/internal/persistedoperation/operationstorage/fs/client.go (2)
  • NewClient (25-37)
  • Options (20-22)
router/internal/persistedoperation/operationstorage/s3/client.go (2)
  • NewClient (38-76)
  • Options (25-33)
router/internal/persistedoperation/operationstorage/fs/client.go (2)
router/internal/persistedoperation/client.go (3)
  • Options (32-40)
  • StorageClient (27-30)
  • PersistedOperation (13-16)
router/internal/persistedoperation/operationstorage/s3/client.go (1)
  • Options (25-33)
router/core/operation_processor.go (2)
router/internal/persistedoperation/apq/client.go (2)
  • Client (17-22)
  • PersistedOperation (12-15)
router/internal/persistedoperation/client.go (3)
  • Client (42-46)
  • PersistedOperation (13-16)
  • PersistentOperationNotFoundError (18-21)
router/internal/persistedoperation/operationstorage/s3/client.go (1)
router/internal/persistedoperation/client.go (4)
  • StorageClient (27-30)
  • Client (42-46)
  • Options (32-40)
  • PersistedOperation (13-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build-router
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (11)
router/internal/rediscloser/rediscloser_test.go (1)

3-10: LGTM: import cleanup is correct

Single testing import retained and ordering is fine.

router/internal/persistedoperation/operationstorage/s3/client.go (2)

35-35: Interface conformance check looks good.


78-85: Wrapper signature change is consistent with StorageClient.

router/core/operation_processor.go (4)

124-129: Field type aligns with options; OK.


429-435: TTL renewal uses NormalizedRepresentation; confirm intent.

SaveOperation is invoked with the normalized operation when renewing TTL from cache. If the backing APQ store is expected to persist the original query text, this can desync “hash→body” semantics.

Would you like me to switch this to use the original body when available, or document that the APQ store intentionally holds normalized text for renewals?


451-479: APQ fetch path and error mapping look correct.

The flow sets isAPQ correctly, handles empty-body APQ 404 via PersistentOperationNotFoundError, and renews TTL on success. This relies on storage providers mapping “not found” to PersistentOperationNotFoundError (see S3 review).


482-483: Return tuple is consistent with upstream expectations.

router/internal/persistedoperation/operationstorage/fs/client.go (3)

39-46: Wrapper signature aligns with StorageClient.


48-71: Good: translates os.IsNotExist to PersistentOperationNotFoundError.


72-72: No-op Close is fine.

router/core/router.go (1)

40-41: LGTM: redis closer import move is correct

Alias retained and usage aligns with rd.NewRedisCloser below.

Comment on lines +1374 to +1377
BlockPersisted: BlockPersistedOptions{
Enabled: s.securityConfiguration.BlockPersistedOperations.Enabled,
Condition: s.securityConfiguration.BlockPersistedOperations.Condition,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Wiring for conditional persisted-op blocking looks correct

Passing BlockPersisted from securityConfiguration.BlockPersistedOperations into the blocker aligns with the new config model.

Please verify:

  • The pre-handler invokes the operation blocker before any persisted-operation/APQ storage lookups so blocked requests are short-circuited and a nil PersistedOperationClient is safe.
  • OperationProcessor gracefully handles a nil PersistedOperationClient (no dereference).

You can run:


🏁 Script executed:

#!/usr/bin/env bash
# Ensure blocker is consulted in prehandler before PO/APQ paths
rg -n -C4 'OperationBlocker|ErrPersistedOperationBlocked|BlockPersisted' router/core/graphql_prehandler.go

# Inspect use-sites of PersistedOperationClient for nil-guards
rg -n -C4 'PersistedOperationClient' router/core

Length of output: 3972


🏁 Script executed:

#!/usr/bin/env bash
# Locate the handleOperation method in graphql_prehandler.go
rg -n -C5 'func \(.*\) handleOperation' router/core/graphql_prehandler.go

# If found, dump a larger context around its invocation of OperationBlocker and persisted operations
rg -n -C10 'handleOperation' -A200 router/core/graphql_prehandler.go | sed -n '1,200p'

Length of output: 9419


🏁 Script executed:

#!/usr/bin/env bash
# Locate OperationBlocker definition and its options
rg -n -C5 'type OperationBlocker' -g 'router/core/*'

# Show methods on OperationBlocker to see how blocking is enforced
rg -n -C3 'func \(.*OperationBlocker\)' -g 'router/core/*'

# Search for any call sites to OperationBlocker within PreHandler.handleOperation
rg -n -C5 'operationBlocker' router/core/graphql_prehandler.go

Length of output: 7178


🏁 Script executed:

#!/usr/bin/env bash
# Inspect usage of persistedOperationClient within OperationProcessor methods
rg -n -C5 'persistedOperationClient' router/core/operation_processor.go

Length of output: 3857


Move persisted-op blocking before storage lookup

  • In router/core/graphql_prehandler.go’s handleOperation, h.operationBlocker.OperationIsBlocked is invoked after FetchPersistedOperation (around line 704), so blocked persisted operations still hit the client/cache. Relocate the blocker call to immediately before the shouldFetchPersistedOperation check (around line 594) to short-circuit early.
  • OperationProcessor.FetchPersistedOperation already guards against a nil persistedOperationClient (it returns an httpGraphqlError instead of panic).
🤖 Prompt for AI Agents
In router/core/graph_server.go around lines 1374 to 1377 and specifically in
router/core/graphql_prehandler.go (handleOperation) move the call to
h.operationBlocker.OperationIsBlocked so it runs immediately before the
shouldFetchPersistedOperation check (around line ~594) instead of after
FetchPersistedOperation (~704); this short-circuits blocked persisted operations
before any client/cache storage lookup. Update the control flow to call
OperationIsBlocked with the same inputs and return its error/path as before if
blocked, leaving FetchPersistedOperation unchanged (it already handles a nil
persistedOperationClient).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants