Skip to content

feat: add span for each grpc invocation#2158

Merged
SkArchon merged 12 commits intomainfrom
milinda/eng-7728-otel-router-plugins-spans
Aug 27, 2025
Merged

feat: add span for each grpc invocation#2158
SkArchon merged 12 commits intomainfrom
milinda/eng-7728-otel-router-plugins-spans

Conversation

@SkArchon
Copy link
Copy Markdown
Contributor

@SkArchon SkArchon commented Aug 20, 2025

This PR Adds a span with all attributes for each grpc invocation. The following shows the "GRPC Plugin Client - Invoke" spans. These spans also overrides the protocol type to grpc. This is to make it similar to the normal subgrpah spans, which has an additional span when the datasource is a subgraph.

image image

Summary by CodeRabbit

  • New Features
    • Enhanced tracing: per-request and per-invocation spans with enriched attributes, configurable trace attribute getters and tracer plumbing, explicit gRPC protocol tagging, and new GRPC operation protocol constant; new helper to build GRPC span names/attributes.
  • Bug Fixes
    • Surface plugin preparation errors during startup while preserving resiliency.
  • Tests
    • Expanded telemetry tests: parallelized subtests validating multiple spans, span attributes, alias-based invocations, and per-invocation trace behavior.
  • UI
    • Analytics view: show GRPC-specific icon and "Cosmo Connect" note for gRPC spans.

Checklist

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 20, 2025

Walkthrough

Threads per-request OpenTelemetry tracers and trace-attribute getters through connector setup, GRPC plugin constructors, and the GRPC client; adds OperationProtocolGRPC and CreateGRPCTraceGetter; updates tests to assert per-invocation OTEL spans with aliased GraphQL queries; small control-plane and UI mapping for grpc protocol.

Changes

Cohort / File(s) Summary
Tests: router plugin telemetry
router-tests/router_plugin_test.go
Scopes InMemoryExporter per subtest, enables t.Parallel(), updates GraphQL queries to use aliases, increases expected spans/invocations, and adds a parallel subtest validating per-invocation OTEL spans and attributes.
Core wiring & protocol constant
router/core/graph_server.go, router/core/operation_metrics.go, router/core/transport.go
buildGraphMux/setupConnector now accept telemetry/tracing attribute-expression handlers; create per-connector Tracer and GetTraceAttributes; add CreateGRPCTraceGetter; introduce OperationProtocolGRPC = "grpc".
gRPC client instrumentation (common)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go
Adds GRPCTraceAttributeGetter and GRPCPluginClientOpts; NewGRPCPluginClient accepts opts; client stores tracer and getTraceAttributes; Invoke starts OTEL spans and records errors.
Native gRPC plugin path
router/pkg/grpcconnector/grpcplugin/.../grpc_plugin.go
Adds Tracer and GetTraceAttributes to GRPCPluginConfig and plugin, forwards them via grpccommon.GRPCPluginClientOpts into client creation, imports OTEL trace, and surfaces PrepareCommand errors.
OCI gRPC plugin path
router/pkg/grpcconnector/grpcpluginoci/.../grpc_oci_plugin.go
Adds Tracer and GetTraceAttributes to GRPCPluginConfig and plugin; passes tracing options into GRPCPluginClient creation when starting/restarting OCI-based plugins.
Transport helper
router/core/transport.go
Adds CreateGRPCTraceGetter(...) returning a closure that computes span name and OTEL start options combining request telemetry attributes, configured attribute expressions, and forcing protocol to grpc.
Control plane trace mapping
controlplane/src/core/repositories/analytics/TraceRepository.ts
Selects wg.operation.protocol as attrOperationProtocol and maps it into returned span attributes as operationProtocol.
Studio UI: trace rendering
studio/src/components/analytics/trace.tsx
Renders a lightning icon and "Cosmo Connect" tooltip line when span.attributes?.operationProtocol === "grpc"; minor import/layout tweaks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1120a54 and 9d4d477.

📒 Files selected for processing (1)
  • router/core/transport.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/core/transport.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: build-router
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch milinda/eng-7728-otel-router-plugins-spans

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Aug 20, 2025

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-2f7962c3135618140a80c0b43a025327b8e07739-nonroot

Comment thread router/pkg/grpcconnector/grpcplugin/grpc_plugin.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)

64-82: Guard against nil tracer/getTraceAttributes to avoid panics

If Tracer or GetTraceAttributes is not provided, Invoke will panic (nil interface method call / nil function). Provide safe defaults in the constructor.

Apply this diff:

 func NewGRPCPluginClient(pc *plugin.Client, cc grpc.ClientConnInterface, clientOpts GRPCPluginClientOpts, options ...GRPCPluginClientOption) (*GRPCPluginClient, error) {
@@
-  return &GRPCPluginClient{
-    pc:                 pc,
-    cc:                 cc,
-    config:             config,
-    tracer:             clientOpts.Tracer,
-    getTraceAttributes: clientOpts.GetTraceAttributes,
-  }, nil
+  g := &GRPCPluginClient{
+    pc:                 pc,
+    cc:                 cc,
+    config:             config,
+    tracer:             clientOpts.Tracer,
+    getTraceAttributes: clientOpts.GetTraceAttributes,
+  }
+  if g.tracer == nil {
+    // fall back to a package-scoped tracer if none provided
+    g.tracer = otel.Tracer("cosmo/grpc_plugin_client")
+  }
+  if g.getTraceAttributes == nil {
+    g.getTraceAttributes = func(context.Context) []attribute.KeyValue { return nil }
+  }
+  return g, nil
 }
router/core/graph_server.go (1)

1517-1529: Add tracing instrumentation to RemoteGRPCProvider

Verified that RemoteGRPCProviderConfig only carries Logger and Endpoint, and none of the RPC-setup or invocation paths in router/pkg/grpcconnector/grpcremote/grpc_remote.go create or propagate OpenTelemetry spans. To meet the PR goal of “add span for each gRPC invocation,” the provider must be extended to accept a Tracer and attribute-getter and to wrap every RPC in a span.

• In router/core/graph_server.go (lines 1517–1529), update the instantiation to pass the new tracing fields:

remoteProvider, err := grpcremote.NewRemoteGRPCProvider(grpcremote.RemoteGRPCProviderConfig{
    Logger:              s.logger,
    Endpoint:            sg.RoutingUrl,
    Tracer:              tracer,                   // oteltrace.Tracer
    GetTraceAttributes:  sg.GetTraceAttributes,    // func(context.Context) []attribute.KeyValue
})

• In router/pkg/grpcconnector/grpcremote/grpc_remote.go:
– Extend RemoteGRPCProviderConfig:

type RemoteGRPCProviderConfig struct {
    Logger             *zap.Logger
    Endpoint           string
    Tracer             oteltrace.Tracer
    GetTraceAttributes func(context.Context) []attribute.KeyValue
}

– In NewRemoteGRPCProvider, default nil fields:

if config.Tracer == nil {
    config.Tracer = oteltrace.NewNoopTracerProvider().Tracer("")
}
if config.GetTraceAttributes == nil {
    config.GetTraceAttributes = func(context.Context) []attribute.KeyValue { return nil }
}

– Wrap each RPC dial/invocation in a span:

ctx, span := config.Tracer.Start(ctx, "RemoteGRPCProvider.Start")
span.SetAttributes(config.GetTraceAttributes(ctx)...)
defer span.End()
cc, err := grpc.DialContext(
    ctx,
    config.Endpoint,
    grpc.WithUnaryInterceptor(otelgrpc.UnaryClientInterceptor(config.Tracer)),
    grpc.WithTransportCredentials(insecure.NewCredentials()),
)

These changes ensure that every gRPC call from the standalone plugin path is traced and annotated.

🧹 Nitpick comments (4)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)

58-63: Document exported types and define nil-safety contract

Add short doc comments for TraceAttributeGetter and GRPCPluginClientOpts, and state whether Tracer/GetTraceAttributes may be nil. This clarifies expectations for callers and helps prevent misuse.

router/core/graph_server.go (3)

1487-1493: Avoid parameter shadowing and align naming with nearby helpers

The parameter name config shadows the imported package name “config” used elsewhere in this file. Prefer engineConfig for clarity and consistency (see configureSubgraphOverwrites’ engineConfig).

Apply:

-func (s *graphServer) setupConnector(
-	ctx context.Context,
-	config *nodev1.EngineConfiguration,
-	configSubgraphs []*nodev1.Subgraph,
+func (s *graphServer) setupConnector(
+	ctx context.Context,
+	engineConfig *nodev1.EngineConfiguration,
+	configSubgraphs []*nodev1.Subgraph,
 	telemetryAttributeExpressions *attributeExpressions,
 	tracingAttributeExpressions *attributeExpressions,
 ) error {
@@
-	for _, dsConfig := range config.DatasourceConfigurations {
+	for _, dsConfig := range engineConfig.DatasourceConfigurations {

1546-1546: Hard-coded instrumentation version; use the router Version or a shared constant

The tracer uses "0.0.1". Consider using Version (already exported in this package) or a central const to keep it in sync with releases.

- tracer := s.tracerProvider.Tracer("wundergraph/cosmo/router/engine/grpc", oteltrace.WithInstrumentationVersion("0.0.1"))
+ tracer := s.tracerProvider.Tracer("wundergraph/cosmo/router/engine/grpc", oteltrace.WithInstrumentationVersion(Version))

If you prefer a dedicated instrumentation version, extract a const and reference it from all tracer initializations.


1574-1576: Attribute override semantics could cause duplicate keys (minor)

Appending WgOperationProtocol after existing attributes yields duplicate keys. OTel keeps the last value, but to avoid surprises, upsert the attribute in-place.

If you want a small helper:

func upsertAttr(attrs []attribute.KeyValue, k attribute.Key, v string) []attribute.KeyValue {
	for i := range attrs {
		if attrs[i].Key == k {
			attrs[i].Value = attribute.StringValue(v)
			return attrs
		}
	}
	return append(attrs, k.String(v))
}

Then replace the append with:
attrs = upsertAttr(attrs, otel.WgOperationProtocol, OperationProtocolGRPC.String())

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bf88521 and 5d8da8d.

📒 Files selected for processing (6)
  • router-tests/router_plugin_test.go (4 hunks)
  • router/core/graph_server.go (5 hunks)
  • router/core/operation_metrics.go (1 hunks)
  • router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (5 hunks)
  • router/pkg/grpcconnector/grpcplugin/grpc_plugin.go (6 hunks)
  • router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
router-tests/router_plugin_test.go (3)
router-tests/testenv/testenv.go (4)
  • Run (105-122)
  • Config (284-340)
  • PluginConfig (342-345)
  • Environment (1727-1763)
router/pkg/trace/tracetest/tracetest.go (1)
  • NewInMemoryExporter (11-17)
router/pkg/otel/attributes.go (3)
  • WgOperationProtocol (15-15)
  • WgOperationType (11-11)
  • WgOperationName (10-10)
router/core/graph_server.go (5)
router/internal/expr/expr.go (2)
  • Context (34-38)
  • Subgraph (139-144)
router/core/context.go (1)
  • Subgraph (34-39)
router/pkg/otel/attributes.go (1)
  • WgOperationProtocol (15-15)
router/core/operation_metrics.go (1)
  • OperationProtocolGRPC (22-22)
router-plugin/config/settings.go (1)
  • StartupConfig (32-35)
router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go (1)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (3)
  • TraceAttributeGetter (58-58)
  • NewGRPCPluginClient (64-82)
  • GRPCPluginClientOpts (59-62)
router/pkg/grpcconnector/grpcplugin/grpc_plugin.go (3)
router/pkg/grpcconnector/grpccommon/grpc_startup_config.go (1)
  • GRPCStartupParams (7-10)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (3)
  • TraceAttributeGetter (58-58)
  • NewGRPCPluginClient (64-82)
  • GRPCPluginClientOpts (59-62)
router/pkg/grpcconnector/grpccommon/command_opts.go (1)
  • PrepareCommand (10-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan
  • GitHub Check: build_test
🔇 Additional comments (12)
router/core/operation_metrics.go (1)

20-24: Add grpc operation protocol constant — LGTM

Recognizing gRPC alongside HTTP and WS is consistent with how OperationProtocol is used elsewhere.

router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)

64-64: Signature Update Verified

All internal callers of NewGRPCPluginClient now supply the required GRPCPluginClientOpts argument—no two-argument invocations remain in the codebase. No further changes are needed.

router-tests/router_plugin_test.go (2)

101-137: Telemetry spans count update — LGTM

The expected snapshot count adjustment (8 → 9) for “query projects simple” matches the added invocation span.


98-115: Ignore global tracer provider race concerns

The test environment does not configure a process‐global tracer provider. Instead, CreateTestEnv sets up tracing via core.WithTracing(c), passing the in-memory exporter into the router’s local rtrace.Config, and never calls otel.SetTracerProvider globally.

• In router-tests/testenv/testenv.go lines 1462–1484, TraceExporter is injected into the router options via core.WithTracing(c), with no global provider override.
• A repository‐wide search for SetTracerProvider shows calls only in the standalone tracing and metric packages—not in the testenv or router core—confirming no global tracer is set in tests.

You can safely run these subtests in parallel without cross-contamination of exporters.

Likely an incorrect or invalid review comment.

router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go (2)

23-30: Wiring tracer and trace-attributes into OCI plugin config — LGTM

Config now carries Tracer/GetTraceAttributes; good for per-invocation spans.


146-151: Propagate tracing options into plugin client (first start) — LGTM; relies on client-side nil-safety

Given GRPCPluginClient now has safe defaults, this path is robust even if config fields are unset.

router/pkg/grpcconnector/grpcplugin/grpc_plugin.go (4)

21-28: Add Tracer/GetTraceAttributes to GRPCPluginConfig — LGTM

This aligns the native plugin path with the OCI path.


40-45: Store tracer and attribute getter on plugin — LGTM

Ensures the values are available during client construction.


109-113: Good: handle PrepareCommand error

Previously ignored, now returned with context. Nice catch.


146-149: Propagate tracing options into client creation — LGTM; consider nil-safety outside ctor

Client-side defaults will prevent panics; if you ever set p.client directly in tests/utilities, ensure Invoke guards stay in place (see grpc_plugin_client.go suggestion).

router/core/graph_server.go (2)

1587-1593: OCI plugin config now receives Tracer and GetTraceAttributes — LGTM

This wires tracing into OCI-based plugins and ensures per-invocation attributes are available. Matches PR intent.


1609-1615: Native plugin config now receives Tracer and GetTraceAttributes — LGTM

Parity with OCI path is good; keeps the tracing behavior consistent across plugin deployment modes.

Comment thread router-tests/router_plugin_test.go
Comment thread router/core/graph_server.go Outdated
Comment thread router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (3)

84-101: Avoid busy-spin while waiting; poll on a ticker instead of a tight loop

waitForPluginToBeActive spins in a tight loop when the plugin is absent (g.pc == nil) because isPluginActive returns (false, nil) without sleeping in that case. This can burn CPU until timeout.

Apply this diff to poll on a ticker and remove the busy loop:

 func (g *GRPCPluginClient) waitForPluginToBeActive() error {
-	timeout := time.After(g.config.ReconnectTimeout)
-	for {
-		select {
-		case <-timeout:
-			return errors.New("plugin was not active in time")
-		default:
-			isActive, err := g.isPluginActive()
-			if err != nil {
-				return err
-			}
-
-			if isActive {
-				return nil
-			}
-		}
-	}
+	timeout := time.After(g.config.ReconnectTimeout)
+	ticker := time.NewTicker(g.config.PingInterval)
+	defer ticker.Stop()
+	for {
+		select {
+		case <-timeout:
+			return errors.New("plugin was not active in time")
+		case <-ticker.C:
+			isActive, err := g.isPluginActive()
+			if err != nil {
+				return err
+			}
+			if isActive {
+				return nil
+			}
+		}
+	}
 }

106-123: Release read-lock before network calls and sleeps to avoid lock-holds

isPluginActive holds the RLock across pc.Client(), clientProtocol.Ping(), and even time.Sleep on failure. This unnecessarily blocks SetClients (which needs the write lock) and can amplify startup latency.

Apply this diff to snapshot the pointer under lock and perform calls/sleeps without holding the lock:

 func (g *GRPCPluginClient) isPluginActive() (bool, error) {
-	g.mu.RLock()
-	defer g.mu.RUnlock()
-	if g.pc == nil {
-		return false, nil
-	}
-
-	clientProtocol, err := g.pc.Client()
+	g.mu.RLock()
+	pc := g.pc
+	g.mu.RUnlock()
+
+	if pc == nil {
+		// Avoid busy-spin when plugin is not yet set
+		time.Sleep(g.config.PingInterval)
+		return false, nil
+	}
+
+	clientProtocol, err := pc.Client()
 	if err != nil {
 		return false, err
 	}
 
 	if err := clientProtocol.Ping(); err != nil {
-		time.Sleep(g.config.PingInterval)
+		// Sleep outside of locks to avoid blocking writers
+		time.Sleep(g.config.PingInterval)
 		return false, nil
 	}
 
 	return true, nil
 }

189-204: Guard Close() with a write lock and close the client conn to avoid races and leaks

Close mutates g.pc and g.cc without taking the mutex, which can race with Invoke/isPluginActive/SetClients. Also, g.cc isn’t closed here, which can leak connections.

Apply this diff:

 func (g *GRPCPluginClient) Close() error {
-	if g.pc == nil {
-		return nil
-	}
-
-	if g.pc.Exited() || g.isClosed.Load() {
-		return nil
-	}
-
-	g.isClosed.Store(true)
-
-	g.pc.Kill()
-	g.cc = nil
-
-	return nil
+	g.mu.Lock()
+	defer g.mu.Unlock()
+
+	if g.pc == nil || g.isClosed.Load() || g.pc.Exited() {
+		return nil
+	}
+
+	g.isClosed.Store(true)
+	g.pc.Kill()
+
+	if g.cc != nil {
+		if closer, ok := g.cc.(io.Closer); ok {
+			_ = closer.Close()
+		}
+		g.cc = nil
+	}
+	return nil
 }
♻️ Duplicate comments (1)
router-tests/router_plugin_test.go (1)

161-177: Deflake: don’t rely on snapshot positions; filter spans by name and relax attribute count

Indexing snapshots[5] and [6] is brittle and previously caused mistakes in assertions. Filter by span.Name() and assert attributes on both. Also, avoid asserting exact attribute count to keep tests resilient to additive changes.

Apply this diff:

-				snapshots := exporter.GetSpans().Snapshots()
-				require.Len(t, snapshots, 10)
-
-				span1 := snapshots[5]
-				require.Equal(t, "GRPC Plugin Client - Invoke", span1.Name())
-				require.Contains(t, span1.Attributes(), otel.WgOperationProtocol.String("grpc"))
-				require.Contains(t, span1.Attributes(), otel.WgOperationType.String("query"))
-				require.Contains(t, span1.Attributes(), otel.WgOperationName.String("projects"))
-				require.Len(t, span1.Attributes(), 10)
-
-				span2 := snapshots[6]
-				require.Equal(t, "GRPC Plugin Client - Invoke", span2.Name())
-				require.Contains(t, span2.Attributes(), otel.WgOperationProtocol.String("grpc"))
-				require.Contains(t, span2.Attributes(), otel.WgOperationType.String("query"))
-				require.Contains(t, span2.Attributes(), otel.WgOperationName.String("projects"))
-				require.Len(t, span2.Attributes(), 10)
+				snapshots := exporter.GetSpans().Snapshots()
+				require.GreaterOrEqual(t, len(snapshots), 10)
+
+				// Find invocation spans by name to avoid relying on ordering
+				invocations := make([]int, 0, len(snapshots))
+				for i, sn := range snapshots {
+					if sn.Name() == "GRPC Plugin Client - Invoke" {
+						invocations = append(invocations, i)
+					}
+				}
+				require.Len(t, invocations, 2, "expected exactly two gRPC invocation spans")
+
+				span1 := snapshots[invocations[0]]
+				require.Equal(t, "GRPC Plugin Client - Invoke", span1.Name())
+				require.Contains(t, span1.Attributes(), otel.WgOperationProtocol.String("grpc"))
+				require.Contains(t, span1.Attributes(), otel.WgOperationType.String("query"))
+				require.Contains(t, span1.Attributes(), otel.WgOperationName.String("projects"))
+				require.GreaterOrEqual(t, len(span1.Attributes()), 6)
+
+				span2 := snapshots[invocations[1]]
+				require.Equal(t, "GRPC Plugin Client - Invoke", span2.Name())
+				require.Contains(t, span2.Attributes(), otel.WgOperationProtocol.String("grpc"))
+				require.Contains(t, span2.Attributes(), otel.WgOperationType.String("query"))
+				require.Contains(t, span2.Attributes(), otel.WgOperationName.String("projects"))
+				require.GreaterOrEqual(t, len(span2.Attributes()), 6)
🧹 Nitpick comments (6)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (4)

149-153: Record and surface errors from the underlying gRPC invocation

Currently, errors returned by g.cc.Invoke are not recorded on the span, which reduces observability for the main failure path of this method. Record the error (and optionally set span status) before returning.

Apply this diff:

@@
-	ctx, span := g.tracer.Start(ctx,
+	ctx, span := g.tracer.Start(ctx,
 		"GRPC Plugin Client - Invoke",
 		trace.WithAttributes(g.getTraceAttributes(ctx)...))
 	defer span.End()
@@
-	return g.cc.Invoke(ctx, method, args, reply, opts...)
+	err := g.cc.Invoke(ctx, method, args, reply, opts...)
+	if err != nil {
+		span.RecordError(err)
+		// Optionally set status (requires an import alias to avoid clashing with grpc/codes):
+		// span.SetStatus(otelcodes.Error, err.Error())
+	}
+	return err

If you opt to set status, add the import:

 import (
@@
-	"google.golang.org/grpc/codes"
+	"google.golang.org/grpc/codes"
+	otelcodes "go.opentelemetry.io/otel/codes"
 )

Also applies to: 173-174


169-172: Don’t clobber existing outgoing metadata; merge before injecting context

Creating a fresh metadata.MD loses any existing outgoing metadata on ctx. Merge existing MD first, then inject trace context.

Apply this diff:

-	md := make(metadata.MD)
-	otel.GetTextMapPropagator().Inject(ctx, metadataCarrier{md})
-	ctx = metadata.NewOutgoingContext(ctx, md)
+	// Preserve existing outgoing metadata (if any)
+	existingMD, _ := metadata.FromOutgoingContext(ctx)
+	md := existingMD.Copy()
+	otel.GetTextMapPropagator().Inject(ctx, metadataCarrier{md})
+	ctx = metadata.NewOutgoingContext(ctx, md)

161-164: Record and return the same error instance for consistency

You record errors with a new errors.New("plugin is not active") but return a different status.Error. Record the same error you return for consistent traces/logs.

Apply this diff:

-	if g.isClosed.Load() {
-		span.RecordError(errors.New("plugin is not active"))
-		return status.Error(codes.Unavailable, "plugin is not active")
-	}
+	if g.isClosed.Load() {
+		err := status.Error(codes.Unavailable, "plugin is not active")
+		span.RecordError(err)
+		return err
+	}

64-64: Optional: make wait cancellable by request context

Consider changing waitForPluginToBeActive to accept a context and honor ctx.Done() so canceled requests don’t block up to ReconnectTimeout. This improves tail latency under client cancellation.

If you’d like, I can propose a minimal ctx-aware version of waitForPluginToBeActive and its call site changes.

router-tests/router_plugin_test.go (2)

124-126: Avoid brittle exact span-count assertions

As instrumentation evolves, exact counts tend to fluctuate. Prefer lower bounds or targeted presence checks.

Apply this diff:

-				require.Len(t, snapshots, 9)
+				require.GreaterOrEqual(t, len(snapshots), 9)

134-136: Align comment with assertion

The comment mentions “twice” but the assertion expects 1. Fix the comment for clarity.

Apply this diff:

-				// Normal http spans would have query sample twice
-				require.Equal(t, queryNameInstances, 1)
+				// We expect exactly one span named "query sample" in this path
+				require.Equal(t, queryNameInstances, 1)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5d8da8d and 0c0e032.

📒 Files selected for processing (2)
  • router-tests/router_plugin_test.go (4 hunks)
  • router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2158
File: router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:149-153
Timestamp: 2025-08-20T21:05:58.713Z
Learning: In the Cosmo codebase, the GRPCPluginClient struct fields `tracer` and `getTraceAttributes` are always properly initialized during construction and will never be nil, so defensive nil checks are not needed.
📚 Learning: 2025-08-20T21:05:58.713Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2158
File: router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:149-153
Timestamp: 2025-08-20T21:05:58.713Z
Learning: In the Cosmo codebase, the GRPCPluginClient struct fields `tracer` and `getTraceAttributes` are always properly initialized during construction and will never be nil, so defensive nil checks are not needed.

Applied to files:

  • router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go
🧬 Code Graph Analysis (1)
router-tests/router_plugin_test.go (3)
router-tests/testenv/testenv.go (3)
  • Run (105-122)
  • Config (284-340)
  • Environment (1727-1763)
router/pkg/trace/tracetest/tracetest.go (1)
  • NewInMemoryExporter (11-17)
router/pkg/otel/attributes.go (3)
  • WgOperationProtocol (15-15)
  • WgOperationType (11-11)
  • WgOperationName (10-10)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: build_test
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (3)

58-64: LGTM: clean tracer/attributes injection via GRPCPluginClientOpts

Introducing TraceAttributeGetter and GRPCPluginClientOpts to thread tracing concerns through construction is a good design choice. It cleanly separates transport logic from observability. Based on the retrieved learning that tracer and getTraceAttributes are always initialized at construction, no extra nil-guards are necessary here.

Also applies to: 75-81


149-153: LGTM: span name and attributes align with PR intent

Starting a span named "GRPC Plugin Client - Invoke" and attaching request-scoped attributes via getTraceAttributes(ctx) is exactly what we want to make per-invocation telemetry visible.


64-64: All callers supply non-nil tracer and attribute getter

I’ve verified that every invocation of NewGRPCPluginClient includes both Tracer and GetTraceAttributes in the GRPCPluginClientOpts literal:

  • In router/pkg/grpcconnector/grpcplugin/grpc_plugin.go at lines 146–149, p.tracer and p.getTraceAttributes are passed.
  • In router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go at lines 148–151, the same fields are provided.

Since tracer and getTraceAttributes on the client structs are always initialized (and never nil) per our constructor guarantees, no further nil-checks or guards are needed here.

router-tests/router_plugin_test.go (1)

104-116: LGTM: isolate exporter per subtest to avoid cross-talk

Creating a fresh InMemoryExporter for each parallel subtest keeps traces scoped and prevents inter-test contamination. Good choice.

Comment thread router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go Outdated
Comment thread router/core/graph_server.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (2)

168-171: Do not clobber existing outgoing gRPC metadata during propagation

Injecting into a fresh metadata.MD and then replacing the context can drop headers previously set upstream. Merge with existing MD before injecting.

Apply this diff:

-	md := make(metadata.MD)
-	otel.GetTextMapPropagator().Inject(ctx, metadataCarrier{md})
-	ctx = metadata.NewOutgoingContext(ctx, md)
+	// Preserve existing outgoing metadata and inject OTEL context into it
+	existingMD, _ := metadata.FromOutgoingContext(ctx)
+	md := existingMD.Copy()
+	otel.GetTextMapPropagator().Inject(ctx, metadataCarrier{md})
+	ctx = metadata.NewOutgoingContext(ctx, md)

172-173: Record Invoke() transport errors on the span

If the RPC itself fails, we currently return the error but don’t annotate the span. Recording it significantly improves debuggability.

Apply this diff:

-	return g.cc.Invoke(ctx, method, args, reply, opts...)
+	err := g.cc.Invoke(ctx, method, args, reply, opts...)
+	if err != nil {
+		span.RecordError(err)
+		// optionally: span.SetStatus(otelcodes.Error, err.Error())
+	}
+	return err

If you also set status, remember the import alias noted above to avoid clashing with grpc/codes.

router/core/transport.go (1)

467-505: Fix: avoid returning a span option built on a pooled slice (use-after-return risk)

traceAttrs is acquired from a pool and released via defer, but you return otrace.WithAttributes(traceAttrs...). If WithAttributes captures the slice and Start applies it after this function returns, the deferred Release may recycle the backing array before it’s used. Build attributes on a fresh slice (or clone) that you don’t return to the pool.

Apply this diff:

 func CreateGRPCTraceGetter(
 	telemetryAttributeExpressions *attributeExpressions,
 	tracingAttributeExpressions *attributeExpressions,
 ) func(context.Context) (string, otrace.SpanStartEventOption) {
 	traceFunc := func(ctx context.Context) (string, otrace.SpanStartEventOption) {
 		reqCtx := getRequestContext(ctx)
 		if reqCtx == nil {
 			return "GRPC Plugin Client - Invoke", otrace.WithAttributes()
 		}
 
-		traceAttrs := *reqCtx.telemetry.AcquireAttributes()
-		defer reqCtx.telemetry.ReleaseAttributes(&traceAttrs)
-		traceAttrs = append(traceAttrs, reqCtx.telemetry.traceAttrs...)
+		// Build on a fresh slice to avoid lifetime issues with pooled buffers.
+		traceAttrs := make([]attribute.KeyValue, 0, len(reqCtx.telemetry.traceAttrs)+8)
+		traceAttrs = append(traceAttrs, reqCtx.telemetry.traceAttrs...)
 
 		if telemetryAttributeExpressions != nil {
 			telemetryValues, err := telemetryAttributeExpressions.expressionsAttributesWithSubgraph(&reqCtx.expressionContext)
 			if err != nil {
 				reqCtx.Logger().Warn("failed to resolve grpc plugin expression for telemetry", zap.Error(err))
 			}
 			traceAttrs = append(traceAttrs, telemetryValues...)
 		}
 
 		if tracingAttributeExpressions != nil {
 			tracingValues, err := tracingAttributeExpressions.expressionsAttributesWithSubgraph(&reqCtx.expressionContext)
 			if err != nil {
 				reqCtx.Logger().Warn("failed to resolve grpc plugin expression for tracing", zap.Error(err))
 			}
 			traceAttrs = append(traceAttrs, tracingValues...)
 		}
 
 		// Override http operation protocol with grpc
 		traceAttrs = append(traceAttrs, otel.WgOperationProtocol.String(OperationProtocolGRPC.String()))
 
 		spanName := SpanNameFormatter("", reqCtx.request)
 		return spanName, otrace.WithAttributes(traceAttrs...)
 	}
 	return traceFunc
 }

If you want to keep using the pool for perf, build into the pooled slice, then return a copy:
safe := append([]attribute.KeyValue(nil), traceAttrs...) before releasing.

♻️ Duplicate comments (1)
router/core/graph_server.go (1)

1546-1552: Tracer creation and GRPC trace getter align naming with HTTP spans

Using the shared SpanNameFormatter ensures “query <operation_name>” style names, addressing prior feedback to keep naming consistent across protocols.

🧹 Nitpick comments (2)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (1)

153-163: Record the status/error consistently on the span

Great that errors are recorded when the plugin is inactive or the wait fails. For consistency, prefer recording the same status error you return and consider setting the span status.

Apply this diff:

-	if g.isClosed.Load() {
-		span.RecordError(errors.New("plugin is not active"))
-		return status.Error(codes.Unavailable, "plugin is not active")
-	}
+	if g.isClosed.Load() {
+		err := status.Error(codes.Unavailable, "plugin is not active")
+		span.RecordError(err)
+		// optionally: span.SetStatus(otelcodes.Error, err.Error())
+		return err
+	}

To set span status, add the import alias in this file’s imports:

otelcodes "go.opentelemetry.io/otel/codes"
router/core/graph_server.go (1)

1546-1546: Minor: extract instrumentation version to a constant

Hardcoding "0.0.1" in multiple places can drift. Consider a shared const or deriving it from your build/version.

For example:

// near other tracing config
const grpcInstrVersion = "0.0.1"

// usage:
tracer := s.tracerProvider.Tracer("wundergraph/cosmo/router/engine/grpc", oteltrace.WithInstrumentationVersion(grpcInstrVersion))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0c0e032 and e22cb78.

📒 Files selected for processing (5)
  • router/core/graph_server.go (5 hunks)
  • router/core/transport.go (2 hunks)
  • router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (5 hunks)
  • router/pkg/grpcconnector/grpcplugin/grpc_plugin.go (6 hunks)
  • router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/pkg/grpcconnector/grpcplugin/grpc_plugin.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2158
File: router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:149-153
Timestamp: 2025-08-20T21:05:58.731Z
Learning: In the Cosmo codebase, the GRPCPluginClient struct fields `tracer` and `getTraceAttributes` are always properly initialized during construction and will never be nil, so defensive nil checks are not needed.
📚 Learning: 2025-08-20T21:05:58.731Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2158
File: router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:149-153
Timestamp: 2025-08-20T21:05:58.731Z
Learning: In the Cosmo codebase, the GRPCPluginClient struct fields `tracer` and `getTraceAttributes` are always properly initialized during construction and will never be nil, so defensive nil checks are not needed.

Applied to files:

  • router/core/transport.go
  • router/core/graph_server.go
  • router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go
  • router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go
🧬 Code graph analysis (3)
router/core/transport.go (3)
router/internal/expr/expr.go (1)
  • Context (34-38)
router/pkg/otel/attributes.go (1)
  • WgOperationProtocol (15-15)
router/core/operation_metrics.go (1)
  • OperationProtocolGRPC (22-22)
router/core/graph_server.go (2)
router/core/transport.go (1)
  • CreateGRPCTraceGetter (467-504)
router-plugin/config/settings.go (1)
  • StartupConfig (32-35)
router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go (2)
router/pkg/grpcconnector/grpccommon/grpc_startup_config.go (1)
  • GRPCStartupParams (7-10)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (3)
  • GRPCTraceAttributeGetter (57-57)
  • NewGRPCPluginClient (64-82)
  • GRPCPluginClientOpts (59-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: build_test
🔇 Additional comments (5)
router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go (2)

32-34: Good wiring for tracer and trace-attrs getter

Fields and opts are cleanly threaded through the constructor; keeping this on the client avoids scattering tracing concerns. Not adding nil-guards aligns with the learning that these are always initialized.

Also applies to: 57-64, 75-81


149-152: Span creation is correct and localized around each Invoke

Starting the span with a per-request name/attributes from the getter is the right abstraction and keeps GRPC spans consistent with HTTP spans.

router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go (1)

23-30: Tracing fields are cleanly plumbed through OCI plugin creation

Config/struct fields and client wiring look good; this keeps the client self-contained. No extra nil-guards needed per our earlier learning about initialization guarantees.

Also applies to: 52-54, 81-83, 148-151

router/core/graph_server.go (2)

1148-1151: Good: connector now receives telemetry/tracing expression resolvers

Passing telemetryAttExpressions and tracingAttExpressions into setupConnector is the right point to centralize per-request attribute evaluation.


1560-1567: Consistent propagation of tracer and trace getter to both OCI and native plugins

Both plugin paths receive identical tracing configuration; this prevents observability gaps.

Also applies to: 1582-1589

@SkArchon SkArchon requested a review from StarpTech August 25, 2025 13:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
router/core/transport.go (6)

467-467: Add a doc comment for the exported helper.

Exported functions should have a leading comment for discoverability and linting.

+// CreateGRPCTraceGetter returns a closure that builds the span name and attributes
+// for gRPC plugin invocations, overriding the operation protocol to "grpc".
 func CreateGRPCTraceGetter(

472-476: Ensure protocol override is preserved in the nil-context fallback.

If reqCtx is unexpectedly nil, we currently drop the protocol override. Keep wg.operation.protocol="grpc" even in the fallback.

-        if reqCtx == nil {
-            return "GRPC Plugin Client - Invoke", otrace.WithAttributes()
-        }
+        if reqCtx == nil {
+            return "GRPC Plugin Client - Invoke",
+                otrace.WithAttributes(otel.WgOperationProtocol.String(OperationProtocolGRPC.String()))
+        }

477-484: Avoid extra allocation: build on the pooled slice directly.

You acquire a pooled slice (traceAttrs) but then allocate a separate attrs slice and copy. Build attrs directly on the pooled buffer to reduce allocations.

-        traceAttrs := *reqCtx.telemetry.AcquireAttributes()
-        defer reqCtx.telemetry.ReleaseAttributes(&traceAttrs)
-
-        attrs := make([]attribute.KeyValue, 0, len(reqCtx.telemetry.traceAttrs))
-
-        attrs = append(attrs, traceAttrs...)
-        attrs = append(attrs, reqCtx.telemetry.traceAttrs...)
+        attrs := *reqCtx.telemetry.AcquireAttributes()
+        defer reqCtx.telemetry.ReleaseAttributes(&attrs)
+        // attrs is an empty, pooled slice; append all attributes to it.
+        attrs = append(attrs, reqCtx.telemetry.traceAttrs...)

501-503: Verify component attribute: EngineTransportAttribute may mislabel gRPC plugin spans.

Using wg.component.name="engine-transport" on plugin spans can confuse attribution. If there’s a plugin-specific component attribute (e.g., "grpc-plugin-client"), prefer that; otherwise consider omitting the component attribute here and rely on wg.operation.protocol="grpc".

If you decide to omit it for now:

-        // Override http operation protocol with grpc
-        attrs = append(attrs, otel.EngineTransportAttribute, otel.WgOperationProtocol.String(OperationProtocolGRPC.String()))
+        // Override HTTP operation protocol with grpc
+        attrs = append(attrs, otel.WgOperationProtocol.String(OperationProtocolGRPC.String()))

If there is (or you add) a plugin component attribute:

-        attrs = append(attrs, otel.EngineTransportAttribute, otel.WgOperationProtocol.String(OperationProtocolGRPC.String()))
+        attrs = append(attrs,
+            otel.WgComponentName.String("grpc-plugin-client"),
+            otel.WgOperationProtocol.String(OperationProtocolGRPC.String()),
+        )

501-501: Nit: capitalize HTTP in the comment.

-        // Override http operation protocol with grpc
+        // Override HTTP operation protocol with grpc

504-505: Span naming: consider a fixed plugin span name for clearer timelines.

PR intent mentions “GRPC Plugin Client - Invoke” spans. Today, spanName uses the GraphQL operation formatter, which can make plugin spans visually indistinguishable from upstream HTTP spans. Consider a fixed name and rely on attributes to convey operation details.

-        spanName := SpanNameFormatter("", reqCtx.request)
+        spanName := "GRPC Plugin Client - Invoke"

If the current operation-style naming is intentional to mirror subgraph HTTP spans, feel free to ignore.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d1a455c and 0fef589.

📒 Files selected for processing (1)
  • router/core/transport.go (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-20T21:05:58.731Z
Learnt from: SkArchon
PR: wundergraph/cosmo#2158
File: router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:149-153
Timestamp: 2025-08-20T21:05:58.731Z
Learning: In the Cosmo codebase, the GRPCPluginClient struct fields `tracer` and `getTraceAttributes` are always properly initialized during construction and will never be nil, so defensive nil checks are not needed.

Applied to files:

  • router/core/transport.go
🧬 Code graph analysis (1)
router/core/transport.go (2)
router/pkg/otel/attributes.go (2)
  • EngineTransportAttribute (89-89)
  • WgOperationProtocol (15-15)
router/core/operation_metrics.go (1)
  • OperationProtocolGRPC (22-22)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
router/core/transport.go (2)

15-16: Import move is fine.

Reordering the internal/circuit import within the block has no functional impact. Looks good.


467-508: SpanKind Missing in gRPC Plugin Spans
The gRPC-plugin client is currently starting spans with only WithAttributes, so they default to SpanKindInternal. To mirror the HTTP subgraph’s Client spans, you must pass WithSpanKind(SpanKindClient) when invoking tracer.Start.

Please update the call sites in the plugin clients to include the client span kind, for example:

// before
ctx, span := g.tracer.Start(ctx, spanName, traceAttributes)

// after
ctx, span := g.tracer.Start(
    ctx,
    spanName,
    traceAttributes,
    otrace.WithSpanKind(otrace.SpanKindClient),
)
defer span.End()

Locations to fix:

  • router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:150
  • router/pkg/grpcconnector/grpcpluginoci/grpc_oci_plugin.go:149
⛔ Skipped due to learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2158
File: router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:149-153
Timestamp: 2025-08-20T21:05:58.731Z
Learning: In the Cosmo codebase, the GRPCPluginClient struct fields `tracer` and `getTraceAttributes` are always properly initialized during construction and will never be nil, so defensive nil checks are not needed.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
router-tests/router_plugin_test.go (1)

160-176: Deflake: find invocation spans by name instead of relying on slice indexes; assert attribute subset, not exact count

Indexing snapshots[5] and snapshots[6] is brittle and order-dependent. Also, asserting exactly 11 attributes risks flakiness with future telemetry additions. Given the PR introduces “GRPC Plugin Client - Invoke” spans, filter by that name and assert a stable subset of attributes.

Apply this diff:

-                require.Len(t, snapshots, 10)
+                // Avoid brittle exact count; assert a sensible lower bound
+                assert.GreaterOrEqual(t, len(snapshots), 6)

-                span1 := snapshots[5]
-                require.Equal(t, "query projects", span1.Name())
-                require.Contains(t, span1.Attributes(), otel.WgOperationProtocol.String("grpc"))
-                require.Contains(t, span1.Attributes(), otel.WgOperationType.String("query"))
-                require.Contains(t, span1.Attributes(), otel.WgOperationName.String("projects"))
-                require.Len(t, span1.Attributes(), 11)
-
-                span2 := snapshots[6]
-                require.Equal(t, "query projects", span2.Name())
-                require.Contains(t, span2.Attributes(), otel.WgOperationProtocol.String("grpc"))
-                require.Contains(t, span2.Attributes(), otel.WgOperationType.String("query"))
-                require.Contains(t, span2.Attributes(), otel.WgOperationName.String("projects"))
-                require.Len(t, span2.Attributes(), 11)
+                // Find invocation spans by name to avoid relying on ordering
+                invocationIdx := make([]int, 0, len(snapshots))
+                for i, sn := range snapshots {
+                    if sn.Name() == "GRPC Plugin Client - Invoke" {
+                        invocationIdx = append(invocationIdx, i)
+                    }
+                }
+                require.Len(t, invocationIdx, 2, "expected exactly two gRPC invocation spans")
+
+                span1 := snapshots[invocationIdx[0]]
+                require.Equal(t, "GRPC Plugin Client - Invoke", span1.Name())
+                require.Contains(t, span1.Attributes(), otel.WgOperationProtocol.String("grpc"))
+                require.Contains(t, span1.Attributes(), otel.WgOperationType.String("query"))
+                require.Contains(t, span1.Attributes(), otel.WgOperationName.String("projects"))
+                // Keep attribute-count assertion flexible to avoid flakiness on future changes
+                assert.GreaterOrEqual(t, len(span1.Attributes()), 6)
+
+                span2 := snapshots[invocationIdx[1]]
+                require.Equal(t, "GRPC Plugin Client - Invoke", span2.Name())
+                require.Contains(t, span2.Attributes(), otel.WgOperationProtocol.String("grpc"))
+                require.Contains(t, span2.Attributes(), otel.WgOperationType.String("query"))
+                require.Contains(t, span2.Attributes(), otel.WgOperationName.String("projects"))
+                assert.GreaterOrEqual(t, len(span2.Attributes()), 6)
🧹 Nitpick comments (2)
router-tests/router_plugin_test.go (2)

124-126: Deflake by avoiding exact span-count assertion (10)

Asserting an exact total span count tends to be brittle as instrumentation evolves. Prefer asserting a floor or checking for the specific spans you care about.

Apply this diff:

-                require.Len(t, snapshots, 10)
+                // Avoid brittle exact count; assert a sensible lower bound
+                assert.GreaterOrEqual(t, len(snapshots), 8, "unexpectedly low span count")

127-135: Fix Equal arg order; also assert two gRPC invocation spans exist

Minor nit: Testify’s Equal takes expected before actual. Additionally, assert that exactly two GRPC invocation spans were produced (one per alias), which ties the test more closely to the PR objective.

Apply this diff:

-                require.Equal(t, queryNameInstances, 3)
+                require.Equal(t, 3, queryNameInstances)

+                // Also assert that per-invocation spans exist (2 invocations via aliases)
+                invocationCount := 0
+                for _, sn := range snapshots {
+                    if sn.Name() == "GRPC Plugin Client - Invoke" {
+                        invocationCount++
+                    }
+                }
+                require.Equal(t, 2, invocationCount, "expected two gRPC invocation spans")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0fef589 and dd473fb.

📒 Files selected for processing (1)
  • router-tests/router_plugin_test.go (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: SkArchon
PR: wundergraph/cosmo#2158
File: router/pkg/grpcconnector/grpccommon/grpc_plugin_client.go:149-153
Timestamp: 2025-08-20T21:05:58.731Z
Learning: In the Cosmo codebase, the GRPCPluginClient struct fields `tracer` and `getTraceAttributes` are always properly initialized during construction and will never be nil, so defensive nil checks are not needed.
🧬 Code graph analysis (1)
router-tests/router_plugin_test.go (3)
router-tests/testenv/testenv.go (5)
  • Run (105-122)
  • Config (284-340)
  • PluginConfig (342-345)
  • Environment (1727-1763)
  • GraphQLRequest (1903-1911)
router/pkg/trace/tracetest/tracetest.go (1)
  • NewInMemoryExporter (11-17)
router/pkg/otel/attributes.go (3)
  • WgOperationProtocol (15-15)
  • WgOperationType (11-11)
  • WgOperationName (10-10)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build-router
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: build_push_image
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./events)
  • GitHub Check: image_scan
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
🔇 Additional comments (4)
router-tests/router_plugin_test.go (4)

5-5: LGTM: otel import for span attribute assertions

Importing router/pkg/otel is appropriate for validating span attributes in the tests.


99-103: LGTM: parallelized test suites

Enabling t.Parallel() at the suite and subtest level is fine here; testenv appears isolated per subtest.


104-114: LGTM: per-subtest InMemoryExporter avoids cross-test contamination

Creating a fresh tracetest.NewInMemoryExporter(t) per subtest and passing it via Config.TraceExporter is the right move to keep spans scoped.


141-152: LGTM: independent exporter and environment for the second subtest

Good isolation for the “verify each invocation having its span” scenario.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
controlplane/src/core/repositories/analytics/TraceRepository.ts (3)

64-67: Parameterize identifiers in the ClickHouse query (optional hardening + perf)

Although pre-existing, the template string injects organizationID, federatedGraphId, spanID, and traceID directly into SQL. If the ClickHouse client supports named parameters, switch to placeholders to reduce injection risk and improve plan caching.

Apply this diff (adapt to your client’s parameter API if it differs):

-        AND SpanAttributes['wg.organization.id'] = '${organizationID}'
-        AND SpanAttributes['wg.federated_graph.id'] = '${federatedGraphId}'
-        AND spanId = '${spanID}'
+        AND SpanAttributes['wg.organization.id'] = {organizationID:String}
+        AND SpanAttributes['wg.federated_graph.id'] = {federatedGraphId:String}
+        AND spanId = {spanID:String}
@@
-    '${traceID}' AS trace_id,
+    {traceID:String} AS trace_id,
@@
-    const results = await this.client.queryPromise(query);
+    const params = { traceID, spanID, organizationID, federatedGraphId };
+    const results = await this.client.queryPromise(query, params);

If parameters aren’t supported, minimally escape these values or centralize query construction through a helper that handles proper quoting.

Also applies to: 76-78, 79-79, 97-98


115-143: Optional: document/normalize nullability for attributes (keep behavior consistent)

Some attributes (including the new operationProtocol) may be absent for certain spans. If downstream consumers assume strings, consider normalizing to undefined (current implicit behavior) or documenting expected nullability in the Span attributes contract. Avoid defaulting here unless you do so for all similar attributes.


15-54: Minor: keep columns grouped for maintainability

Nit: consider grouping related wg.operation.* attributes together (you already added protocol near the others) to reduce churn/merge conflicts over time. No action required.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dd473fb and c5077b1.

📒 Files selected for processing (1)
  • controlplane/src/core/repositories/analytics/TraceRepository.ts (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: build-router
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: build_test
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (1)
controlplane/src/core/repositories/analytics/TraceRepository.ts (1)

139-143: I’ve added a script to confirm where and how Span is imported in TraceRepository.ts. Let’s check the output to locate the type definition and verify whether attributes.operationProtocol is declared on the imported Span.

Comment thread controlplane/src/core/repositories/analytics/TraceRepository.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
studio/src/components/analytics/trace.tsx (1)

101-103: Fix potential runtime TypeError when httpStatusCode is missing

The pattern span.attributes?.httpStatusCode.startsWith("4") can throw if attributes exists but httpStatusCode is undefined/null. The optional chain currently guards only attributes, not the nested property. Use ?. before startsWith.

Apply this minimal fix:

- !!span.attributes?.httpStatusCode.startsWith("4")
+ !!span.attributes?.httpStatusCode?.startsWith("4")

Repeat the same change in Lines 115 and 140.

Optionally, consider treating 5xx as errors too:

- !!span.attributes?.httpStatusCode?.startsWith("4")
+ !!(span.attributes?.httpStatusCode?.startsWith("4") ||
+     span.attributes?.httpStatusCode?.startsWith("5"))

Also applies to: 112-117, 139-141

🧹 Nitpick comments (3)
studio/src/components/analytics/trace.tsx (3)

19-19: Prefer Heroicons BoltIcon to avoid pulling react-icons just for one glyph

We’re already using Heroicons in this component. Using react-icons for a single icon can increase bundle size and adds another dependency surface. Suggest switching to Heroicons’ BoltIcon for consistency and smaller bundles.

Apply this diff:

- import { BsFillLightningChargeFill } from "react-icons/bs";
+ import { BoltIcon } from "@heroicons/react/24/solid";

And here:

- {span.attributes?.operationProtocol === "grpc" && (
-   <BsFillLightningChargeFill className="h-[14px] w-[14px] flex-shrink-0 text-primary" />
- )}
+ {span.attributes?.operationProtocol?.toLowerCase() === "grpc" && (
+   <BoltIcon className="h-4 w-4 flex-shrink-0 text-primary" />
+ )}

Also applies to: 211-216


211-216: Make the gRPC protocol check resilient to undefined and case differences

Use optional chaining for operationProtocol and compare lower-cased values in both places to avoid fragile string checks.

Apply this diff:

- {span.attributes?.operationProtocol === "grpc" && (
+ {span.attributes?.operationProtocol?.toLowerCase() === "grpc" && (
    ...
  )}

And in the tooltip:

- {span.attributes?.operationProtocol === "grpc" && (
+ {span.attributes?.operationProtocol?.toLowerCase() === "grpc" && (
    <div className="text-xs text-secondary">
      <span>Cosmo Connect</span>{" "}
    </div>
  )}

Also applies to: 234-241


234-241: Tooltip copy is fine; consider exposing protocol explicitly if helpful

“Cosmo Connect” is brand-oriented; if users benefit from protocol clarity, consider adding “(gRPC)” next to it. Optional only.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c5077b1 and 1120a54.

📒 Files selected for processing (1)
  • studio/src/components/analytics/trace.tsx (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
  • GitHub Check: build-router
  • GitHub Check: build_push_image
  • GitHub Check: build_push_image
  • GitHub Check: build_test
  • GitHub Check: build_test
  • GitHub Check: image_scan
  • GitHub Check: build_push_image
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (2)
studio/src/components/analytics/trace.tsx (2)

1-1: Good addition: centralizes docs URLs via docsBaseURL

This import aligns the link below with the shared constants. No issues.


17-17: Import looks good

clsx is used extensively below; import is correct. No further action.

Copy link
Copy Markdown
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SkArchon SkArchon merged commit 4f0383f into main Aug 27, 2025
42 of 44 checks passed
@SkArchon SkArchon deleted the milinda/eng-7728-otel-router-plugins-spans branch August 27, 2025 09:06
@Noroth Noroth mentioned this pull request Sep 30, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants