Skip to content

feat: ability to add operation info as trace or telemetry attributes#2252

Merged
SkArchon merged 15 commits intomainfrom
milinda/eng-8211-ability-to-add-sha256-to-traces
Oct 7, 2025
Merged

feat: ability to add operation info as trace or telemetry attributes#2252
SkArchon merged 15 commits intomainfrom
milinda/eng-8211-ability-to-add-sha256-to-traces

Conversation

@SkArchon
Copy link
Copy Markdown
Contributor

@SkArchon SkArchon commented Sep 30, 2025

What happens now

  • Operation name is part of the expression context, but if you try to manually set it to traces, it will be empty, this is as we evaluate the expression BEFORE setting it to the expression context
  • Only Hash / Name / Type is available as attributes

What this PR does

  • Better resolving of expressions for traces / telemetry / metrics - Before we set evaluate expressions as before auth and after auth, now we have more steps, like we set the sha256Hash to traces after it has been computed, etc
  • Add new operation attributes to expressions

Summary by CodeRabbit

  • New Features

    • Added expression-driven telemetry/log attributes for operations: sha256Hash, persistedId, parsingTime, normalizationTime, validationTime, planningTime, and hash.
    • Support blocking mutations based on parsing-time expressions.
    • Improved detection of request.operation.sha256Hash usage for targeted telemetry.
  • Refactor

    • Centralized, bucketed evaluation of attribute expressions for telemetry, metrics, and tracing.
    • Unified telemetry population flow and streamlined operation timing/identifier propagation.
  • Tests

    • Large expansion of structured-logging, telemetry, expression-edge-case, visitor, and bucketization tests.

Checklist

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 30, 2025

Walkthrough

Group expression evaluation into prioritized buckets, centralize telemetry attribute resolution, extend request.operation with sha256/persisted ID and timing fields, add visitors to detect operation-related expression usage, update call sites to use bucketed expressions, and add extensive tests for logging, telemetry, visitors, and blocking by parsingTime.

Changes

Cohort / File(s) Summary
Core expression telemetry refactor
router/core/attribute_expressions.go, router/core/engine_loader_hooks.go, router/core/graphql_prehandler.go, router/core/transport.go
Replace per-attribute maps with bucketed expressions (ProgramWithKey); add AddExprOpts, addExpressions, setTelemetryAttributes; centralize telemetry/tracing/metric expression resolution; populate operation fields and timing directly into expression contexts; update call sites to pass explicit buckets.
Internal expr model & visitors
router/internal/expr/expr.go, router/internal/expr/request_operation_bucket_visitor.go, router/internal/expr/use_request_operation_sha256.go, router/internal/expr/visitor_group.go
Expand Operation (Sha256Hash, PersistedID, Parsing/Normalization/Validation/Planning durations); add AttributeBucket enum and RequestOperationBucketVisitor for bucket classification; add UsesRequestOperationSha256 visitor and wire it into VisitorGroup with an accessor.
Visitor helpers & grouping
router/internal/expr/visitor_group.go, router/internal/expr/visitor_group_test.go
Register new visitor kind for sha256 usage and add IsRequestOperationSha256UsedInExpressions() accessor; add tests validating accessor behavior on compiled expressions.
Expression-use detector tests
router/internal/expr/uses_request_operation_sha256_test.go
Add unit tests for UsesRequestOperationSha256 visitor covering positive, negative, nil, and short-circuit cases.
Bucket classification tests
router/internal/expr/request_operation_bucket_visitor_test.go
Add comprehensive tests mapping many expression forms to expected buckets, verify bucket priority and setBucketIfHigher behavior.
Attribute expressions tests update
router/core/attribute_expressions_test.go
Update tests to assert bucketed storage and new expressionsAttributes signature that accepts an explicit bucket parameter.
Router tests: logging & telemetry
router-tests/structured_logging_test.go, router-tests/telemetry/telemetry_test.go
Extend structured logging tests with many expression/timing cases, tracing sampling, response/subgraph fields; add helpers for span attribute validation and cover dynamic attribute expressions across spans/exporters.
Router tests: blocking
router-tests/block_operations_test.go
Add a blocking test case that uses a parsing-time expression (request.operation.parsingTime.Nanoseconds() > 0) to verify BlockMutations blocks and returns the expected error.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly identifies the new feature of adding operation information to trace and telemetry attributes and directly reflects the primary change introduced by the pull request without unnecessary detail.
✨ Finishing touches
  • 📝 Generate docstrings

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 30, 2025

❌ Internal Query Planner CI checks failed

The Internal Query Planner CI checks failed in the celestial repository, and this is going to stop the merge of this PR.
If you are part of the WunderGraph organization, you can see the PR with more details.

@SkArchon
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 30, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
router/internal/expr/expr.go (1)

81-91: Consider grouping related fields for better readability.

The Operation struct now has 9 fields, but they're interleaved rather than grouped by category. Consider organizing them logically:

  • Identifiers: Name, Type, Hash, Sha256Hash, PersistedId
  • Timings: ParsingTime, NormalizationTime, ValidationTime, PlanningTime

This would improve maintainability as the struct grows.

 type Operation struct {
+	// Identifiers
+	Name        string `expr:"name"`
+	Type        string `expr:"type"`
+	Hash        string `expr:"hash"`
 	Sha256Hash  string `expr:"sha256Hash"`
+	PersistedId string `expr:"persistedId"`
+	// Timings
 	ParsingTime       time.Duration `expr:"parsingTime"`
-	Name              string        `expr:"name"`
-	Type              string        `expr:"type"`
-	PersistedId       string        `expr:"persistedId"`
 	NormalizationTime time.Duration `expr:"normalizationTime"`
-	Hash              string        `expr:"hash"`
 	ValidationTime    time.Duration `expr:"validationTime"`
 	PlanningTime      time.Duration `expr:"planningTime"`
 }
router/internal/expr/request_operation_bucket_visitor.go (4)

57-66: Deduplicate property-name extraction via a small helper.

Cuts repetition and reduces chances of subtle bugs with additional property node types.

@@
+// propNameOf returns identifier/string property names; otherwise "".
+func propNameOf(n ast.Node) string {
+	switch p := n.(type) {
+	case *ast.StringNode:
+		return p.Value
+	case *ast.IdentifierNode:
+		return p.Value
+	default:
+		return ""
+	}
+}
@@
-        {
-            prop := ""
-            switch p := member.Property.(type) {
-            case *ast.StringNode:
-                prop = p.Value
-            case *ast.IdentifierNode:
-                prop = p.Value
-            }
-            if prop == "auth" {
+        {
+            if propNameOf(member.Property) == "auth" {
                 if reqIdent, ok := member.Node.(*ast.IdentifierNode); ok && reqIdent.Value == ExprRequestKey {
                     v.setBucketIfHigher(BucketAuth)
                     // no return; higher-priority matches may exist in other nodes
                 }
             }
         }
@@
-        propName := ""
-        switch p := member.Property.(type) {
-        case *ast.StringNode:
-            propName = p.Value
-        case *ast.IdentifierNode:
-            propName = p.Value
-        default:
-            propName = ""
-        }
-        if propName == "" {
+        prop := propNameOf(member.Property)
+        if prop == "" {
             return
         }
@@
-        opProp := ""
-        switch op := opMember.Property.(type) {
-        case *ast.StringNode:
-            opProp = op.Value
-        case *ast.IdentifierNode:
-            opProp = op.Value
-        }
+        opProp := propNameOf(opMember.Property)
@@
-        switch propName {
+        switch prop {

Also applies to: 76-85, 95-100


108-126: Consider handling unknown request.operation properties explicitly.

A default branch (e.g., leave as BucketDefault or emit trace‑level log) can aid debugging when new props appear.


1-129: Add focused tests for bucketing (table‑driven).

Cover: subgraph.name, request.auth, and each request.operation property (sha256Hash, parsingTime, name, type, persistedId, normalizationTime, hash, validationTime, planningTime). Assert highest bucket wins when multiple occur.

I can scaffold a test that parses expressions with expr, walks the AST with this visitor, and asserts Bucket.


10-22: Follow Go initialism conventions for exported buckets

• Rename in router/internal/expr/request_operation_bucket_visitor.go​:
– consts: BucketSha256→BucketSHA256, BucketPersistedId→BucketPersistedID
– switch-case: v.setBucketIfHigher(BucketSha256/​BucketPersistedId)→v.setBucketIfHigher(BucketSHA256/​BucketPersistedID)

• Update all usages, e.g., in router/core/graphql_prehandler.go at the setTelemetryAttributes calls (expr.BucketSha256/​BucketPersistedId) to use the new names.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 146f4ac and 6d4dab1.

📒 Files selected for processing (14)
  • router-tests/block_operations_test.go (1 hunks)
  • router-tests/structured_logging_test.go (1 hunks)
  • router/core/attribute_expressions.go (3 hunks)
  • router/core/attribute_expressions_test.go (2 hunks)
  • router/core/engine_loader_hooks.go (2 hunks)
  • router/core/graphql_prehandler.go (22 hunks)
  • router/core/transport.go (2 hunks)
  • router/core/websocket.go (4 hunks)
  • router/internal/expr/expr.go (1 hunks)
  • router/internal/expr/request_operation_bucket_visitor.go (1 hunks)
  • router/internal/expr/use_request_operation_sha256.go (1 hunks)
  • router/internal/expr/uses_request_operation_sha256_test.go (1 hunks)
  • router/internal/expr/visitor_group.go (3 hunks)
  • router/internal/expr/visitor_group_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/graphql_prehandler.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
🔇 Additional comments (45)
router/core/attribute_expressions.go (6)

17-20: LGTM: Clean wrapper struct.

The ProgramWithKey struct appropriately pairs a compiled expression program with its attribute key, enabling the bucket-based architecture.


76-96: LGTM: Bucket-based attribute resolution.

The method correctly resolves attributes for a specific bucket by iterating over ProgramWithKey entries and resolving values via the wrapped Program. The error handling is appropriate.


98-105: LGTM: Clean options struct.

The AddExprOpts struct encapsulates all necessary context for attribute addition, supporting the unified telemetry path.


136-151: LGTM: Unified attribute addition with proper nil checks.

The addExpressions helper correctly:

  • Handles nil expressions early
  • Resolves attributes for the specified bucket
  • Applies attributes via the provided function
  • Optionally updates the span

The error logging is appropriate.


107-134: No changes needed. Nil safety is ensured by the if opts.expressions == nil { return } guard, each addExpressions call correctly handles its own errors without blocking the others, and using a single currSpan for both telemetry and tracing (while omitting it for metrics) is intentional.


52-74: Verified RequestOperationBucketVisitor behavior and error formatting
RequestOperationBucketVisitor correctly maps auth, sha256Hash, parsingTime, name/type, persistedId, normalizationTime, hash, validationTime, planningTime, and subgraph to their respective buckets in the intended priority order, and compilation failures produce clear messages including the attribute key, the expression, and the underlying error.

router/core/attribute_expressions_test.go (1)

87-148: LGTM: Test correctly validates bucket-based expressions.

The test appropriately:

  1. Creates attributes with and without auth access
  2. Verifies they're bucketed correctly (BucketDefault for "attr1", BucketAuth for "attr2")
  3. Resolves attributes from each bucket separately
  4. Validates the resolved values

The use of assert.Condition with inline iteration is clear and effective.

router-tests/block_operations_test.go (1)

126-147: Verified parsingTime population before condition evaluation parsingTime is set immediately after parsing in both HTTP (graphql_prehandler.go:588-589,601-602) and WebSocket handlers (websocket.go:887,891), ensuring the block condition sees a populated parsingTime.

router/internal/expr/visitor_group_test.go (1)

67-110: LGTM! Well-structured test following established patterns.

The test correctly mirrors the existing IsResponseBodyUsedInExpressions pattern and covers the key scenarios:

  • Negative case (non-sha256 operation field access)
  • Dot-chaining access
  • Square-bracket access
  • Mixed access

The parallel execution and assertions are appropriate.

router/internal/expr/uses_request_operation_sha256_test.go (1)

1-113: LGTM! Comprehensive test coverage for the new visitor.

The test suite thoroughly covers:

  • Nil handling
  • All valid access patterns (dot, bracket, mixed)
  • Negative cases (wrong property, wrong root, wrong middle segment)
  • Short-circuit optimization

The test structure is clean and assertion choices are appropriate.

router/internal/expr/visitor_group.go (3)

12-12: LGTM! Constant addition follows established pattern.

The new usesRequestOperationSha256Key constant is correctly added to the visitorKind iota sequence.


26-31: LGTM! Visitor correctly registered.

The UsesRequestOperationSha256 visitor is properly initialized in the globalVisitors map with consistent formatting.


67-73: LGTM! Public accessor follows established pattern.

The new IsRequestOperationSha256UsedInExpressions() method correctly mirrors the existing accessor pattern, including:

  • Nil receiver handling (safe default: true)
  • Type assertion on the visitor
  • Field access to return the boolean flag
router/core/engine_loader_hooks.go (2)

7-7: LGTM! Import correctly added for bucket constant.

The new import is necessary for the expr.BucketSubgraph constant used in the refactored expression handling.


176-205: No changes required for metric expressions
The addExpressions function guards on opts.currSpan != nil before setting span attributes, so omitting currSpan in the metrics-only call is safe and intentional.

router/internal/expr/use_request_operation_sha256.go (1)

7-63: LGTM! Well-structured visitor implementation.

The visitor correctly detects request.operation.sha256Hash references in the AST by:

  • Early-exiting when the node is nil or the flag is already set (performance optimization)
  • Validating the three-level structure: sha256Hash property → operation member → request identifier
  • Handling both string and identifier forms of property access
  • Using type switches defensively to ensure correct node types

The logic is clear, correct, and integrates well with the broader expression evaluation framework introduced in this PR.

router/core/websocket.go (5)

876-879: LGTM! Centralized context retrieval.

The early retrieval of reqCtx and validation ensures it's available for timing updates throughout the function, reducing redundant retrievals and improving clarity.


902-922: LGTM! Comprehensive normalization timing on all paths.

Normalization timing is consistently updated across all four outcomes:

  1. After NormalizeOperation error (line 906)
  2. After NormalizeVariables error (line 914)
  3. After RemapVariables error (line 920)
  4. On success (line 932)

This ensures accurate timing metrics regardless of where normalization fails.


937-953: LGTM! Validation timing captured on both paths.

Validation timing is correctly updated after ValidateQueryComplexity error (line 942) and after Validate error (line 948), as well as on success (line 953).


955-965: LGTM! Planning timing captured consistently.

Planning timing is correctly updated on both error (line 960) and success (line 965) paths.


884-893: Verify ParsingTime initialization on skipParse path.

No skipParse occurrences were found in the codebase—please manually confirm that when parsing is skipped, reqCtx.expressionContext.Request.Operation.ParsingTime is explicitly initialized (e.g., to zero) to prevent downstream assumptions.

router-tests/structured_logging_test.go (7)

1999-2035: LGTM! Test validates sha256Hash with persisted operation.

The test correctly validates that request.operation.sha256Hash expression resolves to the SHA-256 hash of an empty string (e3b0c44...) when a persisted query is used (no body), which aligns with the expected behavior.


2037-2071: LGTM! Test validates sha256Hash without persisted operation.

The test correctly validates that request.operation.sha256Hash computes the hash of the query body when no persisted operation is provided. The expected hash c13e0fa... matches the SHA-256 of query employees { employees { id } }.


2073-2110: LGTM! Test validates persistedId expression.

The test correctly validates that request.operation.persistedId resolves to the persisted query SHA-256 hash (dc67510...) from the extensions.


2112-2139: LGTM! Test validates parsingTime is non-zero.

The test correctly validates that request.operation.parsingTime yields a non-zero duration, confirming that parsing timing is captured and exposed via expressions.


2141-2168: LGTM! Test validates normalizationTime is non-zero.

The test correctly validates that request.operation.normalizationTime yields a non-zero duration, confirming that normalization timing is captured and exposed via expressions.


2170-2197: LGTM! Test validates validationTime is non-zero.

The test correctly validates that request.operation.validationTime yields a non-zero duration, confirming that validation timing is captured and exposed via expressions.


2199-2226: LGTM! Test validates planningTime is non-zero.

The test correctly validates that request.operation.planningTime yields a non-zero duration, confirming that planning timing is captured and exposed via expressions.

router/core/transport.go (2)

138-138: LGTM! Bucket parameter added for expression resolution.

The addition of expr.BucketDefault to expressionsAttributes aligns with the new bucketed expression system introduced in this PR.


490-508: LGTM! Centralized expression resolution improves maintainability.

The refactor replaces inline conditional expression resolution with the unified addExpressions flow. The spanAttrFunc closure cleanly captures attributes, and using BucketSubgraph for both telemetry and tracing expressions in the GRPC context is appropriate.

router/core/graphql_prehandler.go (12)

211-211: LGTM! Initial telemetry attributes set early in request flow.

Setting telemetry attributes with BucketDefault early in the handler ensures base attributes are captured before operation-specific details are available.


358-358: LGTM! Telemetry attributes refreshed after authentication.

Using BucketAuth after authentication allows expressions to access auth-related context for telemetry/tracing.


420-422: LGTM! Conditional sha256 computation based on expression usage.

The new check IsRequestOperationSha256UsedInExpressions() ensures the expensive sha256 computation is performed when expressions reference request.operation.sha256Hash, improving performance when not needed.


505-507: LGTM! Sha256 hash propagated to expression context.

The sha256 hash is correctly assigned to both requestContext.operation.sha256Hash and requestContext.expressionContext.Request.Operation.Sha256Hash, and telemetry attributes are updated with the appropriate BucketSha256.


615-618: LGTM! Operation name and type propagated correctly.

The operation name and type are assigned to the expression context and telemetry attributes are updated with BucketNameOrType. This follows the pattern established for other operation fields.


672-673: LGTM! Persisted ID propagated to expression context.

The persisted operation ID is correctly assigned and telemetry attributes are updated with BucketPersistedId.


858-859: LGTM! Operation hash set after normalization.

The operation hash is assigned to the expression context after successful normalization, and telemetry attributes are updated with BucketHash. The placement after normalization is correct since the hash is computed during that phase.


966-1003: LGTM! Planning time set on both success and error paths.

Planning time is correctly assigned to expressionContext.Request.Operation.PlanningTime on both the error path (lines 988-989) and success path (lines 1002-1003), with telemetry attributes updated using BucketPlanningTime. The planCtx is properly created from req.Context() at line 966.


1169-1184: LGTM! Client context properly updated with individual field assignments.

The setExpressionContextClient function now assigns client name and version to individual fields rather than replacing the entire Client structure. This approach is cleaner and avoids potential issues with uninitialized fields.


576-603: parseCtx usage verified; no changes required.


890-947: Validation timing paths correctly implemented

validationCtx is initialized from req.Context() at line 890, and both query-depth and validation error branches record validationTime before ending the span.


692-779: Normalization timing is recorded consistently on all paths

All three error exits (lines 702–703, 734–735, 778–779) and the successful path (lines 854–856) assign operation.normalizationTime, update expressionContext.Request.Operation.NormalizationTime, and invoke setTelemetryAttributes.

router/internal/expr/request_operation_bucket_visitor.go (3)

37-56: LGTM: traversal/early‑exit logic.

Nil guard and short‑circuit on BucketSubgraph look good; subgraph detection via Identifier/Member is clear.


74-107: LGTM: request.operation. recognition and guarding.*

Parent checks ensure we only bucket true request.operation properties.


25-26: No action required: bucket ordering matches intended schedule

AttributeBucket enum confirms the intended evaluation sequence—auth < sha256 < parsingTime < name/type < persistedId < normalizationTime < hash < validationTime < planningTime < subgraph.

Comment thread router/internal/expr/request_operation_bucket_visitor.go Outdated
@SkArchon SkArchon marked this pull request as ready for review October 1, 2025 17:39
@SkArchon SkArchon marked this pull request as draft October 1, 2025 17:41
@SkArchon SkArchon marked this pull request as ready for review October 1, 2025 20:22
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/core/graphql_prehandler.go (1)

867-869: Set span attributes before ending the span.

You call engineNormalizeSpan.End() and then set attributes on it. Attributes set after End are dropped.

- engineNormalizeSpan.End()
-
- if operationKit.parsedOperation.IsPersistedOperation {
-     engineNormalizeSpan.SetAttributes(otel.WgEnginePersistedOperationCacheHit.Bool(operationKit.parsedOperation.PersistedOperationCacheHit))
- }
+ if operationKit.parsedOperation.IsPersistedOperation {
+     engineNormalizeSpan.SetAttributes(otel.WgEnginePersistedOperationCacheHit.Bool(operationKit.parsedOperation.PersistedOperationCacheHit))
+ }
+ engineNormalizeSpan.End()

Also applies to: 865-866

🧹 Nitpick comments (8)
router-tests/structured_logging_test.go (5)

1999-2001: Rename to reflect scenario (no body present).

The test name says “with persisted hash and body” but the request includes only a persisted hash (no query body). Rename to avoid confusion or include a body if you intend to test the hash-vs-body validation path.

- t.Run("validate request.operation.sha256Hash expression with persisted hash and body", func(t *testing.T) {
+ t.Run("validate request.operation.sha256Hash expression with persisted hash (no body)", func(t *testing.T) {

2135-2137: Prefer direct duration comparisons.

Use require.Greater(t, val, time.Duration(0)) instead of casting to int to avoid needless conversions and potential 32-bit overflow pitfalls.

- val, ok := requestContext["parsing_time_expression"].(time.Duration)
+ val, ok := requestContext["parsing_time_expression"].(time.Duration)
  require.True(t, ok)
- require.Greater(t, int(val), 0)
+ require.Greater(t, val, time.Duration(0))

Apply the same pattern for normalizationTime, validationTime, planningTime, fetchDuration, and connAcquireDuration assertions in these blocks.

Also applies to: 2164-2166, 2193-2195, 2222-2224, 3127-3130, 3162-3165, 3209-3211, 3240-3244, 3277-3279, 3322-3325


3157-3165: Avoid index-based assumptions for subgraph log order.

Relying on requestLog.All()[0]/[1] can flake under concurrency. Select entries by stable fields (e.g., subgraph_name or subgraph_id) instead of array position.

- employeeSubgraphLogs := requestLogAll[0]
+ employeeSubgraphLogs := findBy(requestLogAll, "subgraph_name", "employees")

If helpful, I can provide a small helper to search ObservedLogs by context key.

Also applies to: 3208-3212


2771-2773: Brittle span-count assertions.

Asserting an exact span count (9, 0) is fragile across innocuous instrumentation changes. Prefer asserting sampled flag correctness and that “at least one span exists” (or use >= thresholds).

- require.Len(t, sn, 9)
+ require.GreaterOrEqual(t, len(sn), 1)

And for the non-sampled case, assert zero or near-zero only if guaranteed by the exporter config.

Also applies to: 2812-2814


2843-2846: Compare JSON bodies semantically, not as raw strings.

Raw string equality on JSON can break due to spacing/ordering changes. Parse and assert fields or use require.JSONEq where applicable.

- responseBody := requestContextMap["response_body"].(string)
- require.Equal(t, `{"data":{"employees":[{"id":1},{"id":2},{"id":3},{"id":4},{"id":5},{"id":7},{"id":8},{"id":10},{"id":11},{"id":12}]}}`, responseBody)
+ responseBody := requestContextMap["response_body"].(string)
+ require.JSONEq(t, `{"data":{"employees":[{"id":1},{"id":2},{"id":3},{"id":4},{"id":5},{"id":7},{"id":8},{"id":10},{"id":11},{"id":12}]}}`, responseBody)

Same idea for the other response_body checks in this block.

Also applies to: 2873-2876, 3395-3418

router/internal/expr/request_operation_bucket_visitor.go (1)

36-47: Use constants for keys to prevent drift.

“subgraph”, “auth”, and “operation” are string literals. If constants exist (e.g., ExprSubgraphKey, ExprRequestAuthKey, ExprRequestOperationKey), prefer them to keep tags and code in sync.

- if ident, ok := (*baseNode).(*ast.IdentifierNode); ok {
-     if ident.Value == "subgraph" {
+ if ident, ok := (*baseNode).(*ast.IdentifierNode); ok {
+     if ident.Value == ExprSubgraphKey {
          ...
- if ident, ok := member.Node.(*ast.IdentifierNode); ok && ident.Value == "subgraph" {
+ if ident, ok := member.Node.(*ast.IdentifierNode); ok && ident.Value == ExprSubgraphKey {
          ...
- if prop == "auth" {
+ if prop == ExprRequestAuthKey {
          ...
- if opProp != "operation" {
+ if opProp != ExprRequestOperationKey {

If these constants don’t exist yet, consider adding them next to ExprRequestKey. As per coding guidelines.

Also applies to: 57-63, 70-73

router/core/graphql_prehandler.go (2)

505-508: Avoid unnecessary expression evaluation when bucket unused.

setTelemetryAttributes is invoked many times. Guard with a fast “bucket used?” check to skip work when no expressions depend on that stage.

- setTelemetryAttributes(req.Context(), requestContext, expr.BucketSha256)
+ if h.exprManager != nil && h.exprManager.VisitorManager.IsBucketUsed(expr.BucketSha256) {
+     setTelemetryAttributes(req.Context(), requestContext, expr.BucketSha256)
+ }

Apply similarly for other buckets (Default/Auth/Parsing/Normalization/Hash/Validation/Planning).

Also applies to: 591-604, 703-704, 735-736, 856-860, 909-910, 947-948, 989-990, 1003-1004, 211-212, 358-359


865-869: Optional: also attach normalization cache hit on the router span for correlation.

You set WgNormalizationCacheHit on the span; consider also mirroring it on the root/router span attributes if that’s useful in logs/metrics correlation.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d4dab1 and a78adac.

📒 Files selected for processing (11)
  • router-tests/block_operations_test.go (1 hunks)
  • router-tests/structured_logging_test.go (1 hunks)
  • router-tests/telemetry/telemetry_test.go (3 hunks)
  • router/core/attribute_expressions.go (1 hunks)
  • router/core/attribute_expressions_test.go (3 hunks)
  • router/core/graphql_prehandler.go (22 hunks)
  • router/internal/expr/expr.go (1 hunks)
  • router/internal/expr/request_operation_bucket_visitor.go (1 hunks)
  • router/internal/expr/request_operation_bucket_visitor_test.go (1 hunks)
  • router/internal/expr/use_request_operation_sha256.go (1 hunks)
  • router/internal/expr/uses_request_operation_sha256_test.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • router/internal/expr/uses_request_operation_sha256_test.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/graphql_prehandler.go
🧬 Code graph analysis (8)
router/internal/expr/request_operation_bucket_visitor_test.go (2)
router/internal/expr/request_operation_bucket_visitor.go (13)
  • AttributeBucket (8-8)
  • BucketDefault (11-11)
  • BucketAuth (12-12)
  • BucketSha256 (13-13)
  • BucketParsingTime (14-14)
  • BucketNameOrType (15-15)
  • BucketPersistedID (16-16)
  • BucketNormalizationTime (17-17)
  • BucketHash (18-18)
  • BucketValidationTime (19-19)
  • BucketPlanningTime (20-20)
  • BucketSubgraph (21-21)
  • RequestOperationBucketVisitor (27-29)
router/internal/expr/expr_manager.go (1)
  • CreateNewExprManager (19-23)
router/core/attribute_expressions_test.go (2)
router/internal/expr/request_operation_bucket_visitor.go (3)
  • RequestOperationBucketVisitor (27-29)
  • BucketAuth (12-12)
  • BucketDefault (11-11)
router/internal/expr/expr_manager.go (1)
  • CreateNewExprManager (19-23)
router-tests/block_operations_test.go (2)
router-tests/testenv/testenv.go (4)
  • Run (107-124)
  • Config (286-342)
  • Environment (1729-1765)
  • GraphQLRequest (1905-1913)
router/pkg/config/config.go (3)
  • Config (996-1070)
  • SecurityConfiguration (408-418)
  • BlockOperationConfiguration (403-406)
router/core/attribute_expressions.go (4)
router/internal/expr/request_operation_bucket_visitor.go (2)
  • AttributeBucket (8-8)
  • RequestOperationBucketVisitor (27-29)
router/pkg/config/config.go (1)
  • CustomAttribute (43-47)
router/internal/expr/expr.go (1)
  • Context (35-39)
router/internal/expr/resolvers.go (1)
  • ResolveStringExpression (22-34)
router/internal/expr/request_operation_bucket_visitor.go (1)
router/internal/expr/expr.go (1)
  • ExprRequestKey (31-31)
router-tests/structured_logging_test.go (2)
router-tests/testenv/testenv.go (5)
  • Run (107-124)
  • Config (286-342)
  • LogObservationConfig (388-391)
  • Environment (1729-1765)
  • GraphQLRequest (1905-1913)
router/pkg/config/config.go (3)
  • Config (996-1070)
  • CustomAttribute (43-47)
  • CustomDynamicAttribute (36-41)
router/core/graphql_prehandler.go (2)
router/internal/expr/expr.go (4)
  • Context (35-39)
  • Request (66-75)
  • Operation (81-91)
  • Client (93-97)
router/internal/expr/request_operation_bucket_visitor.go (10)
  • BucketDefault (11-11)
  • BucketAuth (12-12)
  • BucketSha256 (13-13)
  • BucketParsingTime (14-14)
  • BucketNameOrType (15-15)
  • BucketPersistedID (16-16)
  • BucketNormalizationTime (17-17)
  • BucketHash (18-18)
  • BucketValidationTime (19-19)
  • BucketPlanningTime (20-20)
router-tests/telemetry/telemetry_test.go (3)
router-tests/testenv/testenv.go (4)
  • Run (107-124)
  • Config (286-342)
  • Environment (1729-1765)
  • GraphQLRequest (1905-1913)
router/pkg/trace/tracetest/tracetest.go (1)
  • NewInMemoryExporter (11-17)
router/pkg/config/config.go (3)
  • Config (996-1070)
  • CustomAttribute (43-47)
  • CustomDynamicAttribute (36-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: Analyze (go)
🔇 Additional comments (20)
router-tests/block_operations_test.go (1)

126-147: LGTM! Clean test for operation-timing expression support.

The test correctly verifies that parsingTime is available in the expression context and can be used for blocking logic. The expression parsingTime.Nanoseconds() > 0 will always evaluate to true for successfully parsed operations, which is the intended behavior to verify the timing attribute is populated and accessible.

Minor note: Good use of require.JSONEq at line 145 for robust JSON comparison.

router/internal/expr/request_operation_bucket_visitor_test.go (4)

16-317: Excellent test coverage for bucket classification.

The test suite comprehensively validates bucket assignment across all priority levels (Default → Subgraph) and covers edge cases including bracket notation, conditionals, and complex expressions. The priority tests correctly verify that higher-priority attributes win when multiple are present.


319-347: LGTM!

Clean helper function for readable test output. All bucket types are covered.


349-366: LGTM!

Good defensive test to catch accidental changes to the bucket priority ordering defined by the iota constants.


368-421: LGTM!

Properly validates the setBucketIfHigher logic across all key scenarios, including the special case where Subgraph (highest priority) always wins.

router/internal/expr/use_request_operation_sha256.go (2)

7-10: LGTM!

Good practice using constants for attribute names to avoid magic strings.


17-68: LGTM!

The visitor correctly detects request.operation.sha256Hash by walking the AST member access chain. Both dot and bracket notation are properly supported, and the early-exit optimization avoids redundant work.

router/core/attribute_expressions.go (6)

4-7: LGTM!

New imports appropriately support the telemetry integration paths introduced in this refactor.


16-19: LGTM!

The ProgramWithKey wrapper cleanly couples compiled expressions with their attribute keys, enabling the new bucket-based storage model.


21-24: LGTM!

The refactored storage using a single bucket-indexed map simplifies the design and aligns with the new bucketed evaluation model.


26-48: LGTM!

The refactored constructor correctly integrates the RequestOperationBucketVisitor to classify expressions during compilation, storing them in the appropriate bucket.


50-70: LGTM!

The updated API correctly accepts a bucket parameter and resolves only expressions associated with that bucket, maintaining proper error handling.


72-125: LGTM!

The new telemetry integration functions provide a clean, unified path for resolving and attaching attributes. The AddExprOpts struct effectively bundles parameters, and addExpressions eliminates duplication across telemetry types.

router/internal/expr/expr.go (1)

81-91: LGTM!

The expanded Operation struct correctly exposes all new attributes (sha256Hash, timing fields, persistedId) with appropriate types and expr tags, aligning with the bucket-based expression model and test coverage.

router/core/attribute_expressions_test.go (2)

13-85: LGTM!

Test correctly updated to use the new RequestOperationBucketVisitor and validates bucket assignment instead of the old HasAuth flag.


87-148: LGTM!

Test correctly validates that expressions are bucketed appropriately (non-auth in BucketDefault, auth in BucketAuth) and that resolution works correctly when called with explicit bucket parameters.

router/internal/expr/request_operation_bucket_visitor.go (1)

31-42: Bucketization logic and priority look solid.

Traversal handles subgraph/auth/operation.* correctly and honors priority via setBucketIfHigher; early exit on BucketSubgraph avoids extra work.

Also applies to: 44-50, 57-63, 70-77, 80-98

router/core/graphql_prehandler.go (3)

519-526: Persisted hash vs. body validation in the correct place.

The sha256 check against the provided body happens early in handleOperation, before APQ logic, matching prior design notes. Good placement.

Based on learnings


305-307: Body capture gated correctly.

request.body.raw is populated only when expressions require it, reducing overhead and PII exposure. Nice.

Also applies to: 274-276


860-866: No SetAttributes-after-End patterns detected—verification complete.

Comment thread router-tests/telemetry/telemetry_test.go
Copy link
Copy Markdown
Contributor

@alepane21 alepane21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
router-tests/telemetry/telemetry_test.go (2)

9612-10138: Comprehensive dynamic request.operation attribute tests

Great coverage across sha256Hash, name/type, persistedId, and timing fields. The skip list approach with name+kind is precise, and asserting times > 0 matches the repo preference for these telemetry tests. Consider a small follow‑up to de‑duplicate the repeated pattern via a table‑driven test harness around validateDetectedSpans, but not required.

Based on learnings


11100-11120: Tighten helpers: mark as test helper, drop no‑op, avoid pointer return

Small cleanups improve readability and failure reporting:

  • Add t.Helper() to validateDetectedSpans so require/asserter point to the caller.
  • Remove the no‑op snapshot.SpanKind() statement.
  • Return attribute.Value by value from getAttributeFromKey and pass it directly to validateFunc, avoiding transient pointer usage.

Apply this diff:

-func validateDetectedSpans(t *testing.T, sn []sdktrace.ReadOnlySpan, key string, skipSpans []spanEntry, validateFunc func(value attribute.Value)) int {
-	var detectedSpanCount int
-
-	for _, snapshot := range sn {
-		attributes := snapshot.Attributes()
-		snapshot.SpanKind()
-
-		spanSearchEntry := spanEntry{name: snapshot.Name(), spanKind: snapshot.SpanKind()}
-		value, ok := getAttributeFromKey(attributes, key)
+func validateDetectedSpans(t *testing.T, sn []sdktrace.ReadOnlySpan, key string, skipSpans []spanEntry, validateFunc func(value attribute.Value)) int {
+	t.Helper()
+	var detectedSpanCount int
+
+	for _, snapshot := range sn {
+		attributes := snapshot.Attributes()
+
+		spanSearchEntry := spanEntry{name: snapshot.Name(), spanKind: snapshot.SpanKind()}
+		value, ok := getAttributeFromKey(attributes, key)
 		if slices.Contains(skipSpans, spanSearchEntry) {
 			require.False(t, ok)
 			continue
 		}
 
 		require.True(t, ok)
 
-		validateFunc(*value)
+		validateFunc(value)
 		detectedSpanCount++
 	}
 	return detectedSpanCount
 }
 
-func getAttributeFromKey(attrs []attribute.KeyValue, key string) (*attribute.Value, bool) {
+func getAttributeFromKey(attrs []attribute.KeyValue, key string) (attribute.Value, bool) {
 	for _, attr := range attrs {
 		if string(attr.Key) == key {
-			return &attr.Value, true
+			return attr.Value, true
 		}
 	}
-	return nil, false
+	return attribute.Value{}, false
 }

Also applies to: 11122-11129

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90d7d6a and 511aeb3.

📒 Files selected for processing (2)
  • router-tests/telemetry/telemetry_test.go (3 hunks)
  • router/core/graphql_prehandler.go (23 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/telemetry/telemetry_test.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the query body before any APQ operations occur.

Applied to files:

  • router/core/graphql_prehandler.go
📚 Learning: 2025-09-02T12:52:27.677Z
Learnt from: endigma
PR: wundergraph/cosmo#2181
File: router/core/operation_processor.go:0-0
Timestamp: 2025-09-02T12:52:27.677Z
Learning: Hash validation for persisted queries with query bodies is performed in router/core/graphql_prehandler.go in the handleOperation function at lines 571-578, not in the APQ processing logic in operation_processor.go. This validates that extensions.persistedQuery.sha256Hash matches the computed query hash before any APQ operations occur. There's also a test case that verifies this behavior.

Applied to files:

  • router/core/graphql_prehandler.go
🔇 Additional comments (10)
router/core/graphql_prehandler.go (8)

217-217: LGTM! Telemetry attribute evaluation at key request stages.

The centralized calls to setTelemetryAttributes at the unauthenticated and authenticated stages align with the PR's objective of evaluating expressions at multiple points in the request lifecycle.

Also applies to: 364-364


583-615: LGTM! Parsing stage telemetry is correctly instrumented.

The parse stage properly:

  • Creates a stage-specific context (parseCtx) for the span
  • Sets ParsingTime in the expression context in both error and success paths
  • Calls setTelemetryAttributes with the stage context and BucketParsingTime

This pattern enables expressions that depend on parsing time to be evaluated as soon as the timing is available.


684-685: LGTM! Persisted operation ID is correctly propagated to expressions.

The persisted operation ID is set in the expression context and BucketPersistedID expressions are evaluated only when a persisted query hash exists, which is the correct behavior.


704-791: LGTM! Normalization stage telemetry is comprehensively instrumented.

The normalization stage correctly:

  • Creates a stage-specific context (normalizeCtx)
  • Sets NormalizationTime in all error paths (lines 714-715, 746-747, 790-791) and the success path (lines 867-868)
  • Sets the operation Hash after normalization completes (line 870)
  • Calls setTelemetryAttributes with the appropriate buckets (BucketNormalizationTime and BucketHash)

All code paths ensure that expressions depending on normalization timing can be evaluated as soon as the timing is available.

Also applies to: 867-871, 886-886


903-903: LGTM! Validation stage telemetry is correctly instrumented.

The validation stage properly:

  • Creates a stage-specific context (validationCtx)
  • Sets ValidationTime in both error paths (lines 921-922, 938-939) and the success path (lines 959-960)
  • Calls setTelemetryAttributes with validationCtx and BucketValidationTime

This ensures expressions that depend on validation time can be evaluated consistently across all code paths.

Also applies to: 921-922, 938-939, 959-960


979-979: LGTM! Planning stage telemetry is correctly instrumented.

The planning stage properly:

  • Creates a stage-specific context (planCtx)
  • Sets PlanningTime in both the error path (lines 1001-1002) and success path (lines 1015-1016)
  • Calls setTelemetryAttributes with planCtx and BucketPlanningTime

This completes the comprehensive telemetry instrumentation across all operation processing stages.

Also applies to: 1001-1002, 1015-1016


426-428: Add nil-safety check for exprManager.

The code calls h.exprManager.VisitorManager.IsRequestOperationSha256UsedInExpressions() without verifying that h.exprManager or h.exprManager.VisitorManager is non-nil. If these are guaranteed to be initialized, this is safe; otherwise, a nil pointer dereference could occur.

Run the following script to check if there are nil checks elsewhere or if initialization is guaranteed:


627-632: Verify that client info doesn't need to be available for BucketNameOrType expressions.

The operation name and type are set (lines 627-628) and BucketNameOrType expressions are evaluated (line 630) before client information is populated via setExpressionContextClient (line 632). If any expressions in BucketNameOrType need access to client name/version, they would see empty values.

Run the following script to check if client fields are used in expressions that might be evaluated at the NameOrType stage:

router-tests/telemetry/telemetry_test.go (2)

45-48: Helper for span filtering looks good

Minimal, clear struct to match skip targets by name and kind. No issues.


10141-10142: Parallelization OK

Using t.Parallel at the suite level here is fine; no changes needed.

@SkArchon SkArchon merged commit 97ab155 into main Oct 7, 2025
51 of 53 checks passed
@SkArchon SkArchon deleted the milinda/eng-8211-ability-to-add-sha256-to-traces branch October 7, 2025 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants