Skip to content

feat: add timings per client fetch for GraphQL http#2183

Merged
SkArchon merged 14 commits intomainfrom
milinda/eng-7817-unusual-timings-in-trace
Sep 3, 2025
Merged

feat: add timings per client fetch for GraphQL http#2183
SkArchon merged 14 commits intomainfrom
milinda/eng-7817-unusual-timings-in-trace

Conversation

@SkArchon
Copy link
Copy Markdown
Contributor

@SkArchon SkArchon commented Sep 2, 2025

This PR Adds timings per datasource fetch for Graphql (HTTP) this would mean one fetch per subgraph call (or per subgraph log entry)

We didn't add a flag to conditionally since adding a timing adds about 2 microseconds on average. And adding a flag roughly costs about 300 nanoseconds (as per my local machine).

User's can compare how long (roughly, since we still have some go code to go through especially in the case of GraphQL/HTTP) the actual datasource call took with the total latency provided in the subgraph access log.

17:15:55 PM INFO requestlogger/subgraphlogger.go:68 /graphql {..., "latency": 0.006611209, ..."fetch_duration": 0.005611209}

Looking at latency - fetchDuration will give how long non fetch part of the router/engine per subgraph call takes.

Config

access_logs:
  subgraphs:
    enabled: true
    fields:
      - key: awesome
        value_from:
          expression: "subgraph.request.clientTrace.fetchDuration"

GRPC Skipped
This PR does not add fetch timings for GRPC datasources, as we do not process timings for logs because "Request" is nil for responseInfo for GRPC requests. The function has

if responseInfo.Request != nil {
  RequestFields(...)

This will be worked on as a separate PR, re factoring RequestFields.

Summary by CodeRabbit

  • New Features

    • Records per-data-source fetch duration and surfaces it in client traces and structured logs (including multi-subgraph and conditional cases).
  • Refactor

    • Duration values standardized to native time units; telemetry attributes now report durations in seconds.
  • Tests

    • Added/updated tests to validate fetch-duration and connection-acquire duration presence, formats, and telemetry across engine fetch spans and multi-subgraph requests.

Checklist

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 2, 2025

Walkthrough

Adds per-fetch timing: a context key and atomic counter are created in engine loader hooks, incremented by the HTTP transport per RoundTrip, and OnFinished stores the accumulated value into expr.ClientTrace.FetchDuration (ConnectionAcquireDuration changed to time.Duration). Tests and traceclient adjusted to duration semantics.

Changes

Cohort / File(s) Summary of changes
Fetch timing plumbing
router/core/engine_loader_hooks.go, router/pkg/trace/transport.go, router/internal/context/keys.go
Adds FetchTimingKey; OnLoad initializes a *atomic.Int64 in context; transport RoundTrip records start time and adds elapsed nanoseconds to that atomic counter; OnFinished reads the atomic and stores the duration into expr.ClientTrace.FetchDuration. Adds sync/atomic, time, and context imports where required.
Expr client trace durations
router/internal/expr/expr.go
Replaces previous numeric conn-acquire field with two time.Duration fields: FetchDuration and ConnectionAcquireDuration (imports time).
Traceclient duration handling
router/internal/traceclient/traceclient.go
ClientTrace.ConnectionAcquireDuration is now stored as time.Duration; code converts duration to milliseconds (float64) when reporting existing metrics to preserve metric values.
Structured logging tests
router-tests/structured_logging_test.go
Adds subtests asserting data_source_fetch_duration presence/type for single, multiple, and conditional subgraph cases; updates assertions to expect time.Duration; adjusts related assertions (conn acquire) to duration semantics and enables parallel execution.
Telemetry tests
router-tests/telemetry/telemetry_test.go
Updates test expressions to use .Seconds() for duration attributes and adds assertions verifying fetch_duration.subgraph on Engine - Fetch spans (parses to positive float).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions Bot added the router label Sep 2, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Sep 2, 2025

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-057d1e5f8bf73672622e686e0961cb7c5b375dbd-nonroot

@SkArchon SkArchon marked this pull request as ready for review September 2, 2025 11:59
@SkArchon SkArchon marked this pull request as draft September 2, 2025 12:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/internal/expr/expr.go (1)

55-60: Bug: wrong map updated while cloning URL query.

You’re writing into claims instead of query; this corrupts the clone.

Apply:

-	for k, v := range copyCtx.Request.URL.Query {
-		claims[k] = v
-	}
-	copyCtx.Request.URL.Query = query
+	for k, v := range copyCtx.Request.URL.Query {
+		query[k] = v
+	}
+	copyCtx.Request.URL.Query = query
🧹 Nitpick comments (7)
router/internal/context/keys.go (1)

15-16: New context key looks fine; add units doc and avoid iota surprises.

  • Please add a short doc comment clarifying that the accumulated value stored under this key is nanoseconds.
  • Consider not inserting new keys mid-iota to avoid unintended renumbering churn; or switch to unexported struct types for keys to eliminate any collision risk across packages.
router/pkg/trace/transport.go (1)

3-9: Alias internal context import to avoid shadowing stdlib context.

Importing internal/context as “context” is confusing and risky if this file ever needs the stdlib context. Alias it.

Apply:

-import (
-	"github.com/wundergraph/cosmo/router/internal/context"
+import (
+	rcontext "github.com/wundergraph/cosmo/router/internal/context"

And update usage below:

-	if value := r.Context().Value(context.FetchTimingKey); value != nil {
+	if value := r.Context().Value(rcontext.FetchTimingKey); value != nil {
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)

155-165: Measure only the actual Invoke time to match HTTP semantics.

Currently the timer starts before plugin-availability checks and metadata work, inflating “fetch” vs. HTTP (which measures just RoundTrip). Start timing immediately before g.cc.Invoke and add after it returns.

Apply:

-	startTime := time.Now()
-	defer func() {
-		// In case of GRPC Clients there will be multiple invocations
-		// so adding is required
-		if value := ctx.Value(rcontext.FetchTimingKey); value != nil {
-			if fetchTiming, ok := value.(*atomic.Int64); ok {
-				fetchTiming.Add(int64(time.Since(startTime)))
-			}
-		}
-	}()

…and replace the return at the end of Invoke:

-	return g.cc.Invoke(ctx, method, args, reply, opts...)
+	start := time.Now()
+	err := g.cc.Invoke(ctx, method, args, reply, opts...)
+	if value := ctx.Value(rcontext.FetchTimingKey); value != nil {
+		if fetchTiming, ok := value.(*atomic.Int64); ok {
+			fetchTiming.Add(int64(time.Since(start)))
+		}
+	}
+	return err
router/internal/expr/expr.go (1)

136-138: Expose a numeric variant to simplify expressions and logging.

Using time.Duration in expr contexts can be awkward for arithmetic/formatting. Consider adding a millisecond numeric alongside the Duration.

Apply:

 type ClientTrace struct {
-	FetchDuration             time.Duration `expr:"fetchDuration"`
+	FetchDuration             time.Duration `expr:"fetchDuration"`
+	FetchDurationMs           float64       `expr:"fetchDurationMs"`
 	ConnectionAcquireDuration float64       `expr:"connAcquireDuration"`
 }

Then set it where you assign FetchDuration (see engine_loader_hooks.go suggestion).

router/core/engine_loader_hooks.go (3)

97-99: Context-stored atomic is OK; minor clarity nit.

This safely escapes to heap. Consider naming it fetchTiming for readability.

-	duration := atomic.Int64{}
-	ctx = context.WithValue(ctx, rcontext.FetchTimingKey, &duration)
+	fetchTiming := atomic.Int64{}
+	ctx = context.WithValue(ctx, rcontext.FetchTimingKey, &fetchTiming)

160-165: Set both Duration and a numeric ms helper (if added).

You’re storing ns as time.Duration, which is good. If you add FetchDurationMs, populate it here.

 	if value := ctx.Value(rcontext.FetchTimingKey); value != nil {
 		if fetchTiming, ok := value.(*atomic.Int64); ok {
-			exprCtx.Subgraph.Request.ClientTrace.FetchDuration = time.Duration(fetchTiming.Load())
+			ns := fetchTiming.Load()
+			exprCtx.Subgraph.Request.ClientTrace.FetchDuration = time.Duration(ns)
+			exprCtx.Subgraph.Request.ClientTrace.FetchDurationMs = float64(ns) / 1e6
 		}
 	}

202-217: Optional: include fetch_duration in default access log fields.

If you want fetch_duration without requiring users to add expressions, append zap.Duration("fetch_duration", exprCtx.Subgraph.Request.ClientTrace.FetchDuration).

 		fields := []zap.Field{
 			zap.String("subgraph_name", ds.Name),
 			zap.String("subgraph_id", ds.ID),
 			zap.Int("status", responseInfo.StatusCode),
 			zap.Duration("latency", latency),
+			zap.Duration("fetch_duration", exprCtx.Subgraph.Request.ClientTrace.FetchDuration),
 		}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0f15d1e and 1b15ed7.

📒 Files selected for processing (5)
  • router/core/engine_loader_hooks.go (3 hunks)
  • router/internal/context/keys.go (1 hunks)
  • router/internal/expr/expr.go (2 hunks)
  • router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (2 hunks)
  • router/pkg/trace/transport.go (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
router/pkg/trace/transport.go (1)
router/internal/context/keys.go (1)
  • FetchTimingKey (15-15)
router/core/engine_loader_hooks.go (3)
router/internal/context/keys.go (1)
  • FetchTimingKey (15-15)
router/internal/expr/expr.go (3)
  • Subgraph (141-146)
  • Request (66-75)
  • ClientTrace (135-138)
router-tests/testenv/testenv.go (1)
  • Subgraph (2852-2862)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)
router/internal/context/keys.go (1)
  • FetchTimingKey (15-15)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
🔇 Additional comments (2)
router/pkg/trace/transport.go (1)

41-49: HTTP fetch-duration instrumentation LGTM.

Start/stop placement captures only the RoundTripper latency and safely accumulates into the atomic counter even on errors.

router/internal/expr/expr.go (1)

9-10: Import addition OK.

time is needed for the new Duration field.

Comment thread router/internal/traceclient/traceclient.go
@SkArchon SkArchon marked this pull request as ready for review September 3, 2025 11:29
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
router/internal/expr/expr.go (1)

55-59: Bug: URL query copied into claims; query left empty

Clone() writes into claims instead of query, dropping URL params and corrupting claims.

-	for k, v := range copyCtx.Request.URL.Query {
-		claims[k] = v
-	}
+	for k, v := range copyCtx.Request.URL.Query {
+		query[k] = v
+	}
router/internal/traceclient/traceclient.go (1)

107-111: Avoid potential panic: use safe type assertion for subgraph name.

A non-string value under CurrentSubgraphContextKey would panic. Prefer ok-assertion.

Apply within this hunk:

-	subgraphCtxVal := ctx.Value(rcontext.CurrentSubgraphContextKey{})
-	if subgraphCtxVal != nil {
-		subgraph = subgraphCtxVal.(string)
-	}
+	if v := ctx.Value(rcontext.CurrentSubgraphContextKey{}); v != nil {
+		if s, ok := v.(string); ok {
+			subgraph = s
+		}
+	}
router-tests/telemetry/telemetry_test.go (1)

9842-9844: Fix attribute key type: pass attribute.Key, not string.

attribute.Set.Value expects an attribute.Key. Passing a plain string won’t compile.

-                val, ok := atts.Value("custom.subgraph")
+                val, ok := atts.Value(attribute.Key("custom.subgraph"))
🧹 Nitpick comments (8)
router/demo.config.yaml (1)

45-47: Rename stray key "there"

Looks like a leftover. Use a meaningful name or remove it.

-      - key: there
+      - key: conn_acquire_duration_string
         value_from:
           expression: "subgraph.request.clientTrace.connAcquireDuration.String()"
router/internal/expr/expr.go (1)

136-138: Type change and naming consistency

  • ConnectionAcquireDuration switched to time.Duration. This may break user expressions expecting a number. Consider documenting the change and/or adding a computed ms field for compatibility.
  • Config refers to fetchDuration; struct exposes dataSourceFetchDuration. Either keep config updated (preferred, already suggested) or optionally add an alias field with expr:"fetchDuration".

Optional alias example:

 type ClientTrace struct {
-	DataSourceFetchDuration   time.Duration `expr:"dataSourceFetchDuration"`
+	DataSourceFetchDuration   time.Duration `expr:"dataSourceFetchDuration"`
+	// Optional alias to match existing config snippets:
+	FetchDuration             time.Duration `expr:"fetchDuration"`
 	ConnectionAcquireDuration time.Duration `expr:"connAcquireDuration"`
 }
router/core/engine_loader_hooks.go (2)

97-99: Allocate atomic counter on heap for clarity

The pointer escapes so this is safe, but new(atomic.Int64) is clearer and avoids confusion about lifetime.

-	duration := atomic.Int64{}
-	ctx = context.WithValue(ctx, rcontext.FetchTimingKey, &duration)
+	duration := new(atomic.Int64)
+	ctx = context.WithValue(ctx, rcontext.FetchTimingKey, duration)

160-164: Optionally populate alias if added

If you add ClientTrace.FetchDuration (expr:"fetchDuration") as an alias, set it here too to keep fields in sync.

 		if fetchTiming, ok := value.(*atomic.Int64); ok {
 			exprCtx.Subgraph.Request.ClientTrace.DataSourceFetchDuration = time.Duration(fetchTiming.Load())
+			// Optional: keep alias in sync if present.
+			exprCtx.Subgraph.Request.ClientTrace.FetchDuration = time.Duration(fetchTiming.Load())
 		}
router/internal/traceclient/traceclient.go (1)

126-129: Clarify units in variable name (milliseconds) or add a comment.

The metric expects ms; rename for readability or add a unit comment.

-		connAcquireTime := float64(duration) / float64(time.Millisecond)
-		t.connectionMetricStore.MeasureConnectionAcquireDuration(ctx,
-			connAcquireTime,
+		// milliseconds
+		connAcquireMs := float64(duration) / float64(time.Millisecond)
+		t.connectionMetricStore.MeasureConnectionAcquireDuration(ctx,
+			connAcquireMs,
 			serverAttributes...)
router-tests/telemetry/telemetry_test.go (1)

9858-9908: Avoid brittle count; derive expected attributes from actual Engine - Fetch spans.

Hard-coding 2 assumes exactly two subgraph fetches. Make the assertion resilient to topology changes by counting Engine-Fetch spans.

-               var attributesDetected int
+               var attributesDetected int
+               var engineFetchCount int
@@
-                   if slices.Contains([]string{"Engine - Fetch"}, sn[i].Name()) {
+                   if slices.Contains([]string{"Engine - Fetch"}, sn[i].Name()) {
+                       engineFetchCount++
                        for _, attributeEntry := range attributes {
@@
-               require.Equal(t, 2, attributesDetected)
+               require.Equal(t, engineFetchCount, attributesDetected)
router-tests/structured_logging_test.go (2)

2898-2901: Avoid int cast; compare durations directly

Casting time.Duration to int is unnecessary and can be brittle on 32-bit builds. Compare against a zero duration.

-				require.Greater(t, int(dataSourceFetchDuration), 0)
+				require.Greater(t, dataSourceFetchDuration, time.Duration(0))

2929-2937: Use duration comparison without narrowing conversions

Apply the same zero-duration comparison for both subgraph entries.

-				require.Greater(t, int(dataSourceFetchDuration1), 0)
+				require.Greater(t, dataSourceFetchDuration1, time.Duration(0))
@@
-				require.Greater(t, int(dataSourceFetchDuration2), 0)
+				require.Greater(t, dataSourceFetchDuration2, time.Duration(0))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1b15ed7 and 01d2a52.

📒 Files selected for processing (7)
  • router-tests/structured_logging_test.go (2 hunks)
  • router-tests/telemetry/telemetry_test.go (2 hunks)
  • router/core/engine_loader_hooks.go (3 hunks)
  • router/demo.config.yaml (1 hunks)
  • router/internal/expr/expr.go (2 hunks)
  • router/internal/traceclient/traceclient.go (3 hunks)
  • router/pkg/trace/transport.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/pkg/trace/transport.go
🧰 Additional context used
🧬 Code graph analysis (4)
router/internal/traceclient/traceclient.go (1)
router/internal/expr/expr.go (3)
  • Subgraph (141-146)
  • Request (66-75)
  • ClientTrace (135-138)
router/core/engine_loader_hooks.go (3)
router/internal/context/keys.go (1)
  • FetchTimingKey (15-15)
router/internal/expr/expr.go (3)
  • Subgraph (141-146)
  • Request (66-75)
  • ClientTrace (135-138)
router-tests/testenv/testenv.go (1)
  • Subgraph (2852-2862)
router-tests/telemetry/telemetry_test.go (3)
router-tests/testenv/testenv.go (3)
  • Run (105-122)
  • Config (284-340)
  • Environment (1727-1763)
router/pkg/trace/tracetest/tracetest.go (1)
  • NewInMemoryExporter (11-17)
router/pkg/config/config.go (3)
  • Config (985-1059)
  • CustomAttribute (43-47)
  • CustomDynamicAttribute (36-41)
router-tests/structured_logging_test.go (2)
router-tests/testenv/testenv.go (7)
  • Run (105-122)
  • Config (284-340)
  • LogObservationConfig (386-389)
  • Environment (1727-1763)
  • GraphQLRequest (1903-1911)
  • SubgraphsConfig (365-378)
  • SubgraphConfig (380-384)
router/internal/expr/expr.go (1)
  • Request (66-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
🔇 Additional comments (6)
router/demo.config.yaml (1)

31-33: Verify OTLP HTTP endpoint

Default OTLP HTTP receiver port is typically 4318. Confirm 4319 is intentional for your demo collector.

router/internal/traceclient/traceclient.go (2)

9-11: Imports look correct and scoped.

New imports are used below and keep layering within router/internal. No issues.


117-117: Confirm time.Duration marshalling to float seconds
The expr package’s default marshaller converts time.Duration fields (like ConnectionAcquireDuration) into seconds (float64) when evaluating expressions (validated by structured logging tests casting to float64) and for metrics you explicitly divide by time.Millisecond (yielding milliseconds). Units remain consistent: logs use seconds as float, metrics store milliseconds. No changes needed.

router-tests/telemetry/telemetry_test.go (1)

9796-9801: Using Seconds() on time.Duration is correct.

Switching to connAcquireDuration.Seconds() matches the underlying type change to time.Duration and keeps the expression engine behavior consistent. Looks good.

router-tests/structured_logging_test.go (2)

13-13: Import of time is appropriate

Used for time.Duration assertions below. No concerns.


2871-2872: Parallelize the subgraph expression tests

Good use of t.Parallel() to keep suite fast.

Comment thread router-tests/structured_logging_test.go
Comment thread router-tests/structured_logging_test.go Outdated
Comment thread router/demo.config.yaml Outdated
Comment thread router/internal/traceclient/traceclient.go
@SkArchon SkArchon force-pushed the milinda/eng-7817-unusual-timings-in-trace branch from a2f540b to 0fd2225 Compare September 3, 2025 12:52
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
router-tests/structured_logging_test.go (1)

2898-2901: Type consistency fix acknowledged

Asserting time.Duration (not float64) for data_source_fetch_duration is now consistent with prior tests and the expr contract.

🧹 Nitpick comments (2)
router-tests/structured_logging_test.go (2)

2928-2937: Don’t depend on subgraph log ordering

Index-based assumptions (0=employees, 1=availability) can flake. Select entries by subgraph_name.

Example refactor pattern for these blocks:

var empCtx, availCtx map[string]interface{}
for _, e := range requestLogAll {
	m := e.ContextMap()
	if name, ok := m["subgraph_name"].(string); ok {
		switch name {
		case "employees":
			empCtx = m
		case "availability":
			availCtx = m
		}
	}
}

Then assert using empCtx/availCtx. Apply similarly in the referenced ranges.

Also applies to: 2975-2983, 3042-3051, 3089-3097


2898-2901: Avoid int casts on time.Duration in assertions
Replace all require.Greater(t, int(<duration>), 0) calls with direct duration comparisons, e.g.:

require.Greater(t, dataSourceFetchDuration, time.Duration(0))

Occurrences to update in router-tests/structured_logging_test.go:

  • Line 2900
  • Line 2931
  • Line 2936
  • Line 2982
  • Line 3014
  • Line 3045
  • Line 3050
  • Line 3096
  • Line 3139
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 8971da3 and a2f540b.

📒 Files selected for processing (1)
  • router-tests/structured_logging_test.go (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/structured_logging_test.go (2)
router-tests/testenv/testenv.go (7)
  • Run (105-122)
  • Config (284-340)
  • LogObservationConfig (386-389)
  • Environment (1727-1763)
  • GraphQLRequest (1903-1911)
  • SubgraphsConfig (365-378)
  • SubgraphConfig (380-384)
router/pkg/config/config.go (3)
  • Config (985-1059)
  • CustomAttribute (43-47)
  • CustomDynamicAttribute (36-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
router-tests/structured_logging_test.go (2)

13-13: Import added is appropriate

time is needed for Duration assertions. Looks good.


2871-2872: Parallelization enabled — OK

Consistent with surrounding tests and isolation via testenv.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
router-tests/structured_logging_test.go (2)

2898-2901: Avoid int casts when asserting durations

Casting time.Duration to int can overflow on 32-bit platforms and is unnecessary. Compare durations directly against time.Duration(0).

Apply these diffs (repeat pattern for all occurrences shown in the selected ranges):

- require.Greater(t, int(dataSourceFetchDuration), 0)
+ require.Greater(t, dataSourceFetchDuration, time.Duration(0))
- require.Greater(t, int(dataSourceFetchDuration1), 0)
+ require.Greater(t, dataSourceFetchDuration1, time.Duration(0))
- require.Greater(t, int(dataSourceFetchDuration2), 0)
+ require.Greater(t, dataSourceFetchDuration2, time.Duration(0))
- require.Greater(t, int(connAcquireDuration), 0)
+ require.Greater(t, connAcquireDuration, time.Duration(0))
- require.Greater(t, int(connAcquireDuration1), 0)
+ require.Greater(t, connAcquireDuration1, time.Duration(0))
- require.Greater(t, int(connAcquireDuration2), 0)
+ require.Greater(t, connAcquireDuration2, time.Duration(0))

Also applies to: 2930-2932, 2934-2937, 2980-2983, 3011-3015, 3043-3046, 3048-3051, 3094-3097, 3137-3140


2895-2897: Guard against index panics by asserting log count before indexing

These tests index requestLogAll[0]/[1] without confirming length. Add a quick Len check to make failures clearer and avoid panics.

   requestLog := xEnv.Observer().FilterMessage("/graphql")
   requestLogAll := requestLog.All()
+  require.GreaterOrEqual(t, requestLog.Len(), 1)
   requestLog := xEnv.Observer().FilterMessage("/graphql")
   requestLogAll := requestLog.All()
+  require.GreaterOrEqual(t, requestLog.Len(), 2)
   requestLog := xEnv.Observer().FilterMessage("/graphql")
   requestLogAll := requestLog.All()
+  require.GreaterOrEqual(t, requestLog.Len(), 2)

Also applies to: 2926-2927, 2972-2974

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a2f540b and 0fd2225.

📒 Files selected for processing (1)
  • router-tests/structured_logging_test.go (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
router-tests/structured_logging_test.go (3)
router-tests/testenv/testenv.go (6)
  • Run (105-122)
  • Config (284-340)
  • LogObservationConfig (386-389)
  • Environment (1727-1763)
  • SubgraphsConfig (365-378)
  • SubgraphConfig (380-384)
router/pkg/config/config.go (3)
  • Config (985-1059)
  • CustomAttribute (43-47)
  • CustomDynamicAttribute (36-41)
router/internal/expr/expr.go (1)
  • Request (66-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
router-tests/structured_logging_test.go (2)

13-13: LGTM: time import

Required for the new duration assertions.


2873-2884: LGTM: correct expression path and snake_case key

Using subgraph.request.clientTrace.dataSourceFetchDuration and emitting it under data_source_fetch_duration matches the expr tag and existing naming.

Comment thread router/internal/expr/expr.go Outdated
Comment thread router/internal/traceclient/traceclient.go
Comment thread router/core/engine_loader_hooks.go
@SkArchon SkArchon requested a review from StarpTech September 3, 2025 15:14
@SkArchon SkArchon changed the title feat: add timings per datasource fetch feat: add timings per client fetch for GraphQL http Sep 3, 2025
Copy link
Copy Markdown
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
router/internal/expr/expr.go (1)

135-138: Duration types: good change; document units and confirm external compatibility.

Switching ConnectionAcquireDuration from float64 to time.Duration and adding FetchDuration improves type safety, but it’s a breaking change for existing expressions/configs expecting numeric types. Please:

  • Document that both fields are time.Duration and express units (ns) when serialized.
  • Call out the breaking change in the PR description/CHANGELOG.
  • Ensure any telemetry/metrics expressions use .Seconds()/.Milliseconds() as needed.

Optionally, consider a temporary alias field (deprecated) if you need smoother upgrade paths.

Would you like me to draft a short upgrade note and grep script to locate downstream usages?

router-tests/structured_logging_test.go (8)

2871-2902: Avoid int cast; compare durations directly and assert subgraph log selection.

  • Prefer comparing durations to time.Duration(0) to avoid needless casts.
  • To reduce ordering brittleness, assert the selected entry is a subgraph log.

Apply:

-				fetchDuration, ok := requestContextMap["fetch_duration"].(time.Duration)
-				require.True(t, ok)
-				require.Greater(t, int(fetchDuration), 0)
+				fetchDuration, ok := requestContextMap["fetch_duration"].(time.Duration)
+				require.True(t, ok)
+				require.Greater(t, fetchDuration, time.Duration(0))
+				require.Equal(t, "client/subgraph", requestContextMap["log_type"])

2904-2937: Same here: compare durations directly and assert target log is subgraph.

-				fetchDuration1, ok := employeeSubgraphLogs.ContextMap()["fetch_duration"].(time.Duration)
+				fetchDuration1, ok := employeeSubgraphLogs.ContextMap()["fetch_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(fetchDuration1), 0)
+				require.Greater(t, fetchDuration1, time.Duration(0))
+				require.Equal(t, "client/subgraph", employeeSubgraphLogs.ContextMap()["log_type"])

-				fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration)
+				fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(fetchDuration2), 0)
+				require.Greater(t, fetchDuration2, time.Duration(0))
+				require.Equal(t, "client/subgraph", availabilitySubgraphLogs.ContextMap()["log_type"])

2940-2983: Conditional case: remove int cast in duration comparison.

-				fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration)
+				fetchDuration2, ok := availabilitySubgraphLogs.ContextMap()["fetch_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(fetchDuration2), 0)
+				require.Greater(t, fetchDuration2, time.Duration(0))

3011-3015: connAcquireDuration: compare time.Duration directly.

-				connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration)
+				connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(connAcquireDuration), 0)
+				require.Greater(t, connAcquireDuration, time.Duration(0))

3043-3046: Same: remove int cast for first subgraph.

-				connAcquireDuration1, ok := employeeSubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration)
+				connAcquireDuration1, ok := employeeSubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(connAcquireDuration1), 0)
+				require.Greater(t, connAcquireDuration1, time.Duration(0))

3048-3051: Same: remove int cast for second subgraph.

-				connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration)
+				connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(connAcquireDuration2), 0)
+				require.Greater(t, connAcquireDuration2, time.Duration(0))

3094-3097: Conditional case: remove int cast.

-				connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration)
+				connAcquireDuration2, ok := availabilitySubgraphLogs.ContextMap()["conn_acquire_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(connAcquireDuration2), 0)
+				require.Greater(t, connAcquireDuration2, time.Duration(0))

3137-3140: Cleanup/unification: compare durations without casting.

-				connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration)
+				connAcquireDuration, ok := requestContextMap["conn_acquire_duration"].(time.Duration)
 				require.True(t, ok)
-				require.Greater(t, int(connAcquireDuration), 0)
+				require.Greater(t, connAcquireDuration, time.Duration(0))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0fd2225 and 78bc881.

📒 Files selected for processing (4)
  • router-tests/structured_logging_test.go (6 hunks)
  • router-tests/telemetry/telemetry_test.go (2 hunks)
  • router/core/engine_loader_hooks.go (3 hunks)
  • router/internal/expr/expr.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • router/core/engine_loader_hooks.go
  • router-tests/telemetry/telemetry_test.go
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-03T11:38:45.827Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2183
File: router/internal/traceclient/traceclient.go:115-118
Timestamp: 2025-09-03T11:38:45.827Z
Learning: In the Cosmo codebase, the exprContext obtained from getExprContext in TraceInjectingRoundTripper.processConnectionMetrics is expected to always be non-nil as part of the existing system behavior, so defensive nil checks are not needed.

Applied to files:

  • router/internal/expr/expr.go
🧬 Code graph analysis (1)
router-tests/structured_logging_test.go (3)
router-tests/testenv/testenv.go (5)
  • Run (105-122)
  • Config (284-340)
  • LogObservationConfig (386-389)
  • Environment (1727-1763)
  • SubgraphsConfig (365-378)
router/pkg/config/config.go (3)
  • Config (985-1059)
  • CustomAttribute (43-47)
  • CustomDynamicAttribute (36-41)
router/internal/expr/expr.go (1)
  • Request (66-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
router/internal/expr/expr.go (1)

9-9: LGTM: time import is appropriate.

router-tests/structured_logging_test.go (2)

13-13: LGTM: import time for duration assertions.


2871-2985: Naming consistency check: confirm canonical field name in expressions.

PR text/examples mention dataSourceFetchDuration, while tests use fetchDuration. Please confirm the intended canonical expr path and align PR text and any docs accordingly.

Would you like a quick repo grep script to verify no stale references remain?

@SkArchon SkArchon merged commit 7c764de into main Sep 3, 2025
31 checks passed
@SkArchon SkArchon deleted the milinda/eng-7817-unusual-timings-in-trace branch September 3, 2025 15:37
@Noroth Noroth mentioned this pull request Sep 30, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants