feat: add timings per client fetch for GraphQL http#2183
Conversation
WalkthroughAdds per-fetch timing: a context key and atomic counter are created in engine loader hooks, incremented by the HTTP transport per RoundTrip, and OnFinished stores the accumulated value into expr.ClientTrace.FetchDuration (ConnectionAcquireDuration changed to time.Duration). Tests and traceclient adjusted to duration semantics. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes ✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/internal/expr/expr.go (1)
55-60: Bug: wrong map updated while cloning URL query.You’re writing into claims instead of query; this corrupts the clone.
Apply:
- for k, v := range copyCtx.Request.URL.Query { - claims[k] = v - } - copyCtx.Request.URL.Query = query + for k, v := range copyCtx.Request.URL.Query { + query[k] = v + } + copyCtx.Request.URL.Query = query
🧹 Nitpick comments (7)
router/internal/context/keys.go (1)
15-16: New context key looks fine; add units doc and avoid iota surprises.
- Please add a short doc comment clarifying that the accumulated value stored under this key is nanoseconds.
- Consider not inserting new keys mid-iota to avoid unintended renumbering churn; or switch to unexported struct types for keys to eliminate any collision risk across packages.
router/pkg/trace/transport.go (1)
3-9: Alias internal context import to avoid shadowing stdlib context.Importing internal/context as “context” is confusing and risky if this file ever needs the stdlib context. Alias it.
Apply:
-import ( - "github.com/wundergraph/cosmo/router/internal/context" +import ( + rcontext "github.com/wundergraph/cosmo/router/internal/context"And update usage below:
- if value := r.Context().Value(context.FetchTimingKey); value != nil { + if value := r.Context().Value(rcontext.FetchTimingKey); value != nil {router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)
155-165: Measure only the actual Invoke time to match HTTP semantics.Currently the timer starts before plugin-availability checks and metadata work, inflating “fetch” vs. HTTP (which measures just RoundTrip). Start timing immediately before g.cc.Invoke and add after it returns.
Apply:
- startTime := time.Now() - defer func() { - // In case of GRPC Clients there will be multiple invocations - // so adding is required - if value := ctx.Value(rcontext.FetchTimingKey); value != nil { - if fetchTiming, ok := value.(*atomic.Int64); ok { - fetchTiming.Add(int64(time.Since(startTime))) - } - } - }()…and replace the return at the end of Invoke:
- return g.cc.Invoke(ctx, method, args, reply, opts...) + start := time.Now() + err := g.cc.Invoke(ctx, method, args, reply, opts...) + if value := ctx.Value(rcontext.FetchTimingKey); value != nil { + if fetchTiming, ok := value.(*atomic.Int64); ok { + fetchTiming.Add(int64(time.Since(start))) + } + } + return errrouter/internal/expr/expr.go (1)
136-138: Expose a numeric variant to simplify expressions and logging.Using time.Duration in expr contexts can be awkward for arithmetic/formatting. Consider adding a millisecond numeric alongside the Duration.
Apply:
type ClientTrace struct { - FetchDuration time.Duration `expr:"fetchDuration"` + FetchDuration time.Duration `expr:"fetchDuration"` + FetchDurationMs float64 `expr:"fetchDurationMs"` ConnectionAcquireDuration float64 `expr:"connAcquireDuration"` }Then set it where you assign FetchDuration (see engine_loader_hooks.go suggestion).
router/core/engine_loader_hooks.go (3)
97-99: Context-stored atomic is OK; minor clarity nit.This safely escapes to heap. Consider naming it fetchTiming for readability.
- duration := atomic.Int64{} - ctx = context.WithValue(ctx, rcontext.FetchTimingKey, &duration) + fetchTiming := atomic.Int64{} + ctx = context.WithValue(ctx, rcontext.FetchTimingKey, &fetchTiming)
160-165: Set both Duration and a numeric ms helper (if added).You’re storing ns as time.Duration, which is good. If you add FetchDurationMs, populate it here.
if value := ctx.Value(rcontext.FetchTimingKey); value != nil { if fetchTiming, ok := value.(*atomic.Int64); ok { - exprCtx.Subgraph.Request.ClientTrace.FetchDuration = time.Duration(fetchTiming.Load()) + ns := fetchTiming.Load() + exprCtx.Subgraph.Request.ClientTrace.FetchDuration = time.Duration(ns) + exprCtx.Subgraph.Request.ClientTrace.FetchDurationMs = float64(ns) / 1e6 } }
202-217: Optional: include fetch_duration in default access log fields.If you want fetch_duration without requiring users to add expressions, append zap.Duration("fetch_duration", exprCtx.Subgraph.Request.ClientTrace.FetchDuration).
fields := []zap.Field{ zap.String("subgraph_name", ds.Name), zap.String("subgraph_id", ds.ID), zap.Int("status", responseInfo.StatusCode), zap.Duration("latency", latency), + zap.Duration("fetch_duration", exprCtx.Subgraph.Request.ClientTrace.FetchDuration), }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
router/core/engine_loader_hooks.go(3 hunks)router/internal/context/keys.go(1 hunks)router/internal/expr/expr.go(2 hunks)router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go(2 hunks)router/pkg/trace/transport.go(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
router/pkg/trace/transport.go (1)
router/internal/context/keys.go (1)
FetchTimingKey(15-15)
router/core/engine_loader_hooks.go (3)
router/internal/context/keys.go (1)
FetchTimingKey(15-15)router/internal/expr/expr.go (3)
Subgraph(141-146)Request(66-75)ClientTrace(135-138)router-tests/testenv/testenv.go (1)
Subgraph(2852-2862)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)
router/internal/context/keys.go (1)
FetchTimingKey(15-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
🔇 Additional comments (2)
router/pkg/trace/transport.go (1)
41-49: HTTP fetch-duration instrumentation LGTM.Start/stop placement captures only the RoundTripper latency and safely accumulates into the atomic counter even on errors.
router/internal/expr/expr.go (1)
9-10: Import addition OK.time is needed for the new Duration field.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
router/internal/expr/expr.go (1)
55-59: Bug: URL query copied into claims; query left emptyClone() writes into claims instead of query, dropping URL params and corrupting claims.
- for k, v := range copyCtx.Request.URL.Query { - claims[k] = v - } + for k, v := range copyCtx.Request.URL.Query { + query[k] = v + }router/internal/traceclient/traceclient.go (1)
107-111: Avoid potential panic: use safe type assertion for subgraph name.A non-string value under CurrentSubgraphContextKey would panic. Prefer ok-assertion.
Apply within this hunk:
- subgraphCtxVal := ctx.Value(rcontext.CurrentSubgraphContextKey{}) - if subgraphCtxVal != nil { - subgraph = subgraphCtxVal.(string) - } + if v := ctx.Value(rcontext.CurrentSubgraphContextKey{}); v != nil { + if s, ok := v.(string); ok { + subgraph = s + } + }router-tests/telemetry/telemetry_test.go (1)
9842-9844: Fix attribute key type: pass attribute.Key, not string.
attribute.Set.Valueexpects anattribute.Key. Passing a plain string won’t compile.- val, ok := atts.Value("custom.subgraph") + val, ok := atts.Value(attribute.Key("custom.subgraph"))
🧹 Nitpick comments (8)
router/demo.config.yaml (1)
45-47: Rename stray key "there"Looks like a leftover. Use a meaningful name or remove it.
- - key: there + - key: conn_acquire_duration_string value_from: expression: "subgraph.request.clientTrace.connAcquireDuration.String()"router/internal/expr/expr.go (1)
136-138: Type change and naming consistency
- ConnectionAcquireDuration switched to time.Duration. This may break user expressions expecting a number. Consider documenting the change and/or adding a computed ms field for compatibility.
- Config refers to fetchDuration; struct exposes dataSourceFetchDuration. Either keep config updated (preferred, already suggested) or optionally add an alias field with expr:"fetchDuration".
Optional alias example:
type ClientTrace struct { - DataSourceFetchDuration time.Duration `expr:"dataSourceFetchDuration"` + DataSourceFetchDuration time.Duration `expr:"dataSourceFetchDuration"` + // Optional alias to match existing config snippets: + FetchDuration time.Duration `expr:"fetchDuration"` ConnectionAcquireDuration time.Duration `expr:"connAcquireDuration"` }router/core/engine_loader_hooks.go (2)
97-99: Allocate atomic counter on heap for clarityThe pointer escapes so this is safe, but new(atomic.Int64) is clearer and avoids confusion about lifetime.
- duration := atomic.Int64{} - ctx = context.WithValue(ctx, rcontext.FetchTimingKey, &duration) + duration := new(atomic.Int64) + ctx = context.WithValue(ctx, rcontext.FetchTimingKey, duration)
160-164: Optionally populate alias if addedIf you add ClientTrace.FetchDuration (expr:"fetchDuration") as an alias, set it here too to keep fields in sync.
if fetchTiming, ok := value.(*atomic.Int64); ok { exprCtx.Subgraph.Request.ClientTrace.DataSourceFetchDuration = time.Duration(fetchTiming.Load()) + // Optional: keep alias in sync if present. + exprCtx.Subgraph.Request.ClientTrace.FetchDuration = time.Duration(fetchTiming.Load()) }router/internal/traceclient/traceclient.go (1)
126-129: Clarify units in variable name (milliseconds) or add a comment.The metric expects ms; rename for readability or add a unit comment.
- connAcquireTime := float64(duration) / float64(time.Millisecond) - t.connectionMetricStore.MeasureConnectionAcquireDuration(ctx, - connAcquireTime, + // milliseconds + connAcquireMs := float64(duration) / float64(time.Millisecond) + t.connectionMetricStore.MeasureConnectionAcquireDuration(ctx, + connAcquireMs, serverAttributes...)router-tests/telemetry/telemetry_test.go (1)
9858-9908: Avoid brittle count; derive expected attributes from actual Engine - Fetch spans.Hard-coding
2assumes exactly two subgraph fetches. Make the assertion resilient to topology changes by counting Engine-Fetch spans.- var attributesDetected int + var attributesDetected int + var engineFetchCount int @@ - if slices.Contains([]string{"Engine - Fetch"}, sn[i].Name()) { + if slices.Contains([]string{"Engine - Fetch"}, sn[i].Name()) { + engineFetchCount++ for _, attributeEntry := range attributes { @@ - require.Equal(t, 2, attributesDetected) + require.Equal(t, engineFetchCount, attributesDetected)router-tests/structured_logging_test.go (2)
2898-2901: Avoid int cast; compare durations directlyCasting time.Duration to int is unnecessary and can be brittle on 32-bit builds. Compare against a zero duration.
- require.Greater(t, int(dataSourceFetchDuration), 0) + require.Greater(t, dataSourceFetchDuration, time.Duration(0))
2929-2937: Use duration comparison without narrowing conversionsApply the same zero-duration comparison for both subgraph entries.
- require.Greater(t, int(dataSourceFetchDuration1), 0) + require.Greater(t, dataSourceFetchDuration1, time.Duration(0)) @@ - require.Greater(t, int(dataSourceFetchDuration2), 0) + require.Greater(t, dataSourceFetchDuration2, time.Duration(0))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
router-tests/structured_logging_test.go(2 hunks)router-tests/telemetry/telemetry_test.go(2 hunks)router/core/engine_loader_hooks.go(3 hunks)router/demo.config.yaml(1 hunks)router/internal/expr/expr.go(2 hunks)router/internal/traceclient/traceclient.go(3 hunks)router/pkg/trace/transport.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- router/pkg/trace/transport.go
🧰 Additional context used
🧬 Code graph analysis (4)
router/internal/traceclient/traceclient.go (1)
router/internal/expr/expr.go (3)
Subgraph(141-146)Request(66-75)ClientTrace(135-138)
router/core/engine_loader_hooks.go (3)
router/internal/context/keys.go (1)
FetchTimingKey(15-15)router/internal/expr/expr.go (3)
Subgraph(141-146)Request(66-75)ClientTrace(135-138)router-tests/testenv/testenv.go (1)
Subgraph(2852-2862)
router-tests/telemetry/telemetry_test.go (3)
router-tests/testenv/testenv.go (3)
Run(105-122)Config(284-340)Environment(1727-1763)router/pkg/trace/tracetest/tracetest.go (1)
NewInMemoryExporter(11-17)router/pkg/config/config.go (3)
Config(985-1059)CustomAttribute(43-47)CustomDynamicAttribute(36-41)
router-tests/structured_logging_test.go (2)
router-tests/testenv/testenv.go (7)
Run(105-122)Config(284-340)LogObservationConfig(386-389)Environment(1727-1763)GraphQLRequest(1903-1911)SubgraphsConfig(365-378)SubgraphConfig(380-384)router/internal/expr/expr.go (1)
Request(66-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
🔇 Additional comments (6)
router/demo.config.yaml (1)
31-33: Verify OTLP HTTP endpointDefault OTLP HTTP receiver port is typically 4318. Confirm 4319 is intentional for your demo collector.
router/internal/traceclient/traceclient.go (2)
9-11: Imports look correct and scoped.New imports are used below and keep layering within router/internal. No issues.
117-117: Confirm time.Duration marshalling to float seconds
Theexprpackage’s default marshaller convertstime.Durationfields (likeConnectionAcquireDuration) into seconds (float64) when evaluating expressions (validated by structured logging tests casting tofloat64) and for metrics you explicitly divide bytime.Millisecond(yielding milliseconds). Units remain consistent: logs use seconds as float, metrics store milliseconds. No changes needed.router-tests/telemetry/telemetry_test.go (1)
9796-9801: Using Seconds() on time.Duration is correct.Switching to
connAcquireDuration.Seconds()matches the underlying type change totime.Durationand keeps the expression engine behavior consistent. Looks good.router-tests/structured_logging_test.go (2)
13-13: Import of time is appropriateUsed for time.Duration assertions below. No concerns.
2871-2872: Parallelize the subgraph expression testsGood use of t.Parallel() to keep suite fast.
a2f540b to
0fd2225
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
router-tests/structured_logging_test.go (1)
2898-2901: Type consistency fix acknowledgedAsserting time.Duration (not float64) for data_source_fetch_duration is now consistent with prior tests and the expr contract.
🧹 Nitpick comments (2)
router-tests/structured_logging_test.go (2)
2928-2937: Don’t depend on subgraph log orderingIndex-based assumptions (0=employees, 1=availability) can flake. Select entries by subgraph_name.
Example refactor pattern for these blocks:
var empCtx, availCtx map[string]interface{} for _, e := range requestLogAll { m := e.ContextMap() if name, ok := m["subgraph_name"].(string); ok { switch name { case "employees": empCtx = m case "availability": availCtx = m } } }Then assert using empCtx/availCtx. Apply similarly in the referenced ranges.
Also applies to: 2975-2983, 3042-3051, 3089-3097
2898-2901: Avoid int casts on time.Duration in assertions
Replace allrequire.Greater(t, int(<duration>), 0)calls with direct duration comparisons, e.g.:require.Greater(t, dataSourceFetchDuration, time.Duration(0))Occurrences to update in router-tests/structured_logging_test.go:
- Line 2900
- Line 2931
- Line 2936
- Line 2982
- Line 3014
- Line 3045
- Line 3050
- Line 3096
- Line 3139
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
router-tests/structured_logging_test.go(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/structured_logging_test.go (2)
router-tests/testenv/testenv.go (7)
Run(105-122)Config(284-340)LogObservationConfig(386-389)Environment(1727-1763)GraphQLRequest(1903-1911)SubgraphsConfig(365-378)SubgraphConfig(380-384)router/pkg/config/config.go (3)
Config(985-1059)CustomAttribute(43-47)CustomDynamicAttribute(36-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: build-router
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: image_scan (nonroot)
- GitHub Check: image_scan
- GitHub Check: integration_test (./events)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_test
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
router-tests/structured_logging_test.go (2)
13-13: Import added is appropriatetime is needed for Duration assertions. Looks good.
2871-2872: Parallelization enabled — OKConsistent with surrounding tests and isolation via testenv.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
router-tests/structured_logging_test.go (2)
2898-2901: Avoid int casts when asserting durationsCasting time.Duration to int can overflow on 32-bit platforms and is unnecessary. Compare durations directly against time.Duration(0).
Apply these diffs (repeat pattern for all occurrences shown in the selected ranges):
- require.Greater(t, int(dataSourceFetchDuration), 0) + require.Greater(t, dataSourceFetchDuration, time.Duration(0))- require.Greater(t, int(dataSourceFetchDuration1), 0) + require.Greater(t, dataSourceFetchDuration1, time.Duration(0))- require.Greater(t, int(dataSourceFetchDuration2), 0) + require.Greater(t, dataSourceFetchDuration2, time.Duration(0))- require.Greater(t, int(connAcquireDuration), 0) + require.Greater(t, connAcquireDuration, time.Duration(0))- require.Greater(t, int(connAcquireDuration1), 0) + require.Greater(t, connAcquireDuration1, time.Duration(0))- require.Greater(t, int(connAcquireDuration2), 0) + require.Greater(t, connAcquireDuration2, time.Duration(0))Also applies to: 2930-2932, 2934-2937, 2980-2983, 3011-3015, 3043-3046, 3048-3051, 3094-3097, 3137-3140
2895-2897: Guard against index panics by asserting log count before indexingThese tests index requestLogAll[0]/[1] without confirming length. Add a quick Len check to make failures clearer and avoid panics.
requestLog := xEnv.Observer().FilterMessage("/graphql") requestLogAll := requestLog.All() + require.GreaterOrEqual(t, requestLog.Len(), 1)requestLog := xEnv.Observer().FilterMessage("/graphql") requestLogAll := requestLog.All() + require.GreaterOrEqual(t, requestLog.Len(), 2)requestLog := xEnv.Observer().FilterMessage("/graphql") requestLogAll := requestLog.All() + require.GreaterOrEqual(t, requestLog.Len(), 2)Also applies to: 2926-2927, 2972-2974
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
router-tests/structured_logging_test.go(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/structured_logging_test.go (3)
router-tests/testenv/testenv.go (6)
Run(105-122)Config(284-340)LogObservationConfig(386-389)Environment(1727-1763)SubgraphsConfig(365-378)SubgraphConfig(380-384)router/pkg/config/config.go (3)
Config(985-1059)CustomAttribute(43-47)CustomDynamicAttribute(36-41)router/internal/expr/expr.go (1)
Request(66-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
router-tests/structured_logging_test.go (2)
13-13: LGTM: time importRequired for the new duration assertions.
2873-2884: LGTM: correct expression path and snake_case keyUsing subgraph.request.clientTrace.dataSourceFetchDuration and emitting it under data_source_fetch_duration matches the expr tag and existing naming.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (9)
router/internal/expr/expr.go (1)
135-138: Duration types: good change; document units and confirm external compatibility.Switching ConnectionAcquireDuration from float64 to time.Duration and adding FetchDuration improves type safety, but it’s a breaking change for existing expressions/configs expecting numeric types. Please:
- Document that both fields are time.Duration and express units (ns) when serialized.
- Call out the breaking change in the PR description/CHANGELOG.
- Ensure any telemetry/metrics expressions use .Seconds()/.Milliseconds() as needed.
Optionally, consider a temporary alias field (deprecated) if you need smoother upgrade paths.
Would you like me to draft a short upgrade note and grep script to locate downstream usages?
router-tests/structured_logging_test.go (8)
2871-2902: Avoid int cast; compare durations directly and assert subgraph log selection.
- Prefer comparing durations to time.Duration(0) to avoid needless casts.
- To reduce ordering brittleness, assert the selected entry is a subgraph log.
Apply:
- fetchDuration, ok := requestContextMap["fetch_duration"].(time.Duration) - require.True(t, ok) - require.Greater(t, int(fetchDuration), 0) + fetchDuration, ok := requestContextMap["fetch_duration"].(time.Duration) + require.True(t, ok) + require.Greater(t, fetchDuration, time.Duration(0)) + require.Equal(t, "client/subgraph", requestContextMap["log_type"])
2904-2937: Same here: compare durations directly and assert target log is subgraph.- fetchDuration1, ok := employeeSubgraphLogs.ContextMap()["fetch_duration"].(time.Duration) + fetchDuration1, ok := employeeSubgraphLogs.ContextMap()["fetch_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(fetchDuration1), 0) + require.Greater(t, fetchDuration1, time.Duration(0)) + require.Equal(t, "client/subgraph", employeeSubgraphLogs.ContextMap()["log_type"]) - fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration) + fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(fetchDuration2), 0) + require.Greater(t, fetchDuration2, time.Duration(0)) + require.Equal(t, "client/subgraph", availabilitySubgraphLogs.ContextMap()["log_type"])
2940-2983: Conditional case: remove int cast in duration comparison.- fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration) + fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(fetchDuration2), 0) + require.Greater(t, fetchDuration2, time.Duration(0))
3011-3015: connAcquireDuration: compare time.Duration directly.- connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration) + connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(connAcquireDuration), 0) + require.Greater(t, connAcquireDuration, time.Duration(0))
3043-3046: Same: remove int cast for first subgraph.- connAcquireDuration1, ok := employeeSubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration) + connAcquireDuration1, ok := employeeSubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(connAcquireDuration1), 0) + require.Greater(t, connAcquireDuration1, time.Duration(0))
3048-3051: Same: remove int cast for second subgraph.- connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration) + connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(connAcquireDuration2), 0) + require.Greater(t, connAcquireDuration2, time.Duration(0))
3094-3097: Conditional case: remove int cast.- connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration) + connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(connAcquireDuration2), 0) + require.Greater(t, connAcquireDuration2, time.Duration(0))
3137-3140: Cleanup/unification: compare durations without casting.- connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration) + connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration) require.True(t, ok) - require.Greater(t, int(connAcquireDuration), 0) + require.Greater(t, connAcquireDuration, time.Duration(0))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
router-tests/structured_logging_test.go(6 hunks)router-tests/telemetry/telemetry_test.go(2 hunks)router/core/engine_loader_hooks.go(3 hunks)router/internal/expr/expr.go(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- router/core/engine_loader_hooks.go
- router-tests/telemetry/telemetry_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-03T11:38:45.827Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2183
File: router/internal/traceclient/traceclient.go:115-118
Timestamp: 2025-09-03T11:38:45.827Z
Learning: In the Cosmo codebase, the exprContext obtained from getExprContext in TraceInjectingRoundTripper.processConnectionMetrics is expected to always be non-nil as part of the existing system behavior, so defensive nil checks are not needed.
Applied to files:
router/internal/expr/expr.go
🧬 Code graph analysis (1)
router-tests/structured_logging_test.go (3)
router-tests/testenv/testenv.go (5)
Run(105-122)Config(284-340)LogObservationConfig(386-389)Environment(1727-1763)SubgraphsConfig(365-378)router/pkg/config/config.go (3)
Config(985-1059)CustomAttribute(43-47)CustomDynamicAttribute(36-41)router/internal/expr/expr.go (1)
Request(66-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: integration_test (./telemetry)
- GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
- GitHub Check: build_push_image (nonroot)
- GitHub Check: build_push_image
- GitHub Check: integration_test (./events)
- GitHub Check: build_test
- GitHub Check: Analyze (go)
🔇 Additional comments (3)
router/internal/expr/expr.go (1)
9-9: LGTM: time import is appropriate.router-tests/structured_logging_test.go (2)
13-13: LGTM: import time for duration assertions.
2871-2985: Naming consistency check: confirm canonical field name in expressions.PR text/examples mention dataSourceFetchDuration, while tests use fetchDuration. Please confirm the intended canonical expr path and align PR text and any docs accordingly.
Would you like a quick repo grep script to verify no stale references remain?
This PR Adds timings per datasource fetch for Graphql (HTTP) this would mean one fetch per subgraph call (or per subgraph log entry)
We didn't add a flag to conditionally since adding a timing adds about 2 microseconds on average. And adding a flag roughly costs about 300 nanoseconds (as per my local machine).
User's can compare how long (roughly, since we still have some go code to go through especially in the case of GraphQL/HTTP) the actual datasource call took with the total latency provided in the subgraph access log.
Looking at
latency - fetchDurationwill give how long non fetch part of the router/engine per subgraph call takes.Config
GRPC Skipped
This PR does not add fetch timings for GRPC datasources, as we do not process timings for logs because "Request" is nil for responseInfo for GRPC requests. The function has
This will be worked on as a separate PR, re factoring
RequestFields.Summary by CodeRabbit
New Features
Refactor
Tests
Checklist