Skip to content

chore: give hint of configuration on otel invalid utf8 errors#2551

Merged
SkArchon merged 12 commits intomainfrom
milinda/log-otel-errors-with-addition
Mar 5, 2026
Merged

chore: give hint of configuration on otel invalid utf8 errors#2551
SkArchon merged 12 commits intomainfrom
milinda/log-otel-errors-with-addition

Conversation

@SkArchon
Copy link
Copy Markdown
Contributor

@SkArchon SkArchon commented Feb 24, 2026

Right now we have options to sanitize span attributes, however this is off by default. Users right now will get the error message of "string field contains invalid UTF-8" when non utf8 data is set into a span anywhere. However users might not know that a workaround exists (right now for span attributes), without logging the error as is, if the error is related to non utf8 data we will log the error with a hint.

Summary by CodeRabbit

  • New Features

    • Added environment variable support and a grouped env-prefix option for configuring the metrics OTLP log exporter (enablement and include/exclude metric lists).
  • Bug Fixes

    • Improved telemetry error handling and logging for export failures involving invalid UTF-8, emitting a clear hint to enable UTF-8 sanitization.
  • Tests

    • Added test coverage that simulates invalid UTF-8 export errors to verify sanitization hints are logged.

Checklist

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7511a706-15d3-4389-b644-51573f3544bc

📥 Commits

Reviewing files that changed from the base of the PR and between 63f138f and 9b2440d.

📒 Files selected for processing (1)
  • router-tests/telemetry/attribute_processor_test.go

Walkthrough

Adds error-detection helpers for OTLP tracing invalid-UTF8 errors, wires the handler into the meter error handler, exposes additional environment bindings for metrics log exporter fields, and adds a test that simulates an invalid-UTF8 export error to verify the sanitization hint log.

Changes

Cohort / File(s) Summary
Error handling utilities
router/pkg/trace/errors.go
New file adding an invalidUTF8Error interface, hasInvalidUTF8Error(err error) bool to inspect error chains, and errHandler(config *ProviderConfig) func(err error) that logs a sanitization hint when invalid UTF-8 is detected and logs a generic OTLP error otherwise.
Meter integration
router/pkg/trace/meter.go
Replaces the previous inline otel error handler with errHandler(config) via otel.SetErrorHandler, centralizing OTLP error handling.
Configuration tags
router/pkg/config/config.go
Adds environment variable bindings to metrics log exporter fields (env:"ENABLED", env:"EXCLUDE_METRICS", env:"INCLUDE_METRICS") and adds envPrefix:"METRICS_OTLP_LOG_EXPORTER_" to MetricsOTLP.LogExporter.
Tests
router-tests/telemetry/attribute_processor_test.go
Adds a test scenario using an invalidUTF8Exporter and errInvalidUTF8 to simulate an export error containing invalid UTF-8 and asserts the presence of the sanitization hint log message. Also adds supporting helper types/imports for the test.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a configuration hint to help users resolve invalid UTF-8 errors in OpenTelemetry, which aligns with the new error handling and logging logic across the modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 24, 2026

Router-nonroot image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-a88c349479c05ad9d4b2b07c161929682ad2ae06-nonroot

@SkArchon SkArchon changed the title chore: give hint of configuration on utf8 errors chore: give hint of configuration on otel invalid utf8 errors Feb 24, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 24, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.29%. Comparing base (fad6b1c) to head (9b2440d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
router/pkg/trace/errors.go 84.61% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2551      +/-   ##
==========================================
+ Coverage   62.81%   63.29%   +0.47%     
==========================================
  Files         243      244       +1     
  Lines       25746    25757      +11     
==========================================
+ Hits        16173    16303     +130     
+ Misses       8202     8087     -115     
+ Partials     1371     1367       -4     
Files with missing lines Coverage Δ
router/pkg/config/config.go 80.51% <ø> (ø)
router/pkg/trace/meter.go 42.85% <100.00%> (+1.53%) ⬆️
router/pkg/trace/errors.go 84.61% <84.61%> (ø)

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
router/pkg/trace/meter.go (1)

218-222: ⚠️ Potential issue | 🟡 Minor

otel.SetErrorHandler should respect the same test-mode guard as otel.SetTracerProvider.

The comment at Line 217 says "Don't set it globally when we use the router in tests," and SetTracerProvider is correctly guarded by if config.MemoryExporter == nil. But SetErrorHandler on Line 222 is unconditional and always mutates global state, even in test mode.

In parallel test runs every call to NewTracerProvider races to replace the one global handler; the last writer wins. The new test at attribute_processor_test.go:191 specifically relies on errors from its invalidUTF8Exporter reaching its own observer — if another parallel test overwrites the handler between setup and span export the assertion can fail non-deterministically.

🛡️ Proposed fix: apply the same guard
-	otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config)))
+	if config.MemoryExporter == nil {
+		otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config)))
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/pkg/trace/meter.go` around lines 218 - 222, The global error handler
is being set unconditionally causing test-time races; guard the
otel.SetErrorHandler call the same way otel.SetTracerProvider is guarded: only
call otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config))) when
config.MemoryExporter == nil (i.e., inside the same if block that sets the
tracer provider) so NewTracerProvider does not mutate global state during tests
and the errHandler/config.MemoryExporter logic remains consistent.
🧹 Nitpick comments (2)
router/pkg/trace/errors.go (1)

8-22: invalidUTF8Error interface approach is sound — worth documenting the coupling.

The interface{ InvalidUTF8() bool } check matches the detection pattern the protobuf team themselves recommend (per the public issue tracker). However, since errInvalidUTF8 lives in google.golang.org/protobuf/internal/impl, there's no stability guarantee — the type or method could be removed in a future protobuf release, causing hasInvalidUTF8Error to silently return false without any compile-time signal. The degradation is graceful (errors still log as plain "otel error"), but a brief comment noting this dependency would help future maintainers.

errors.As with a *invalidUTF8Error (pointer-to-interface) target is valid; the Go stdlib explicitly supports passing a pointer to any interface type.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router/pkg/trace/errors.go` around lines 8 - 22, Add a short comment above
the invalidUTF8Error type and hasInvalidUTF8Error function that documents the
coupling to google.golang.org/protobuf/internal/impl (errInvalidUTF8) — state
that this check relies on the presence of an internal type exposing
InvalidUTF8(), that the internal package is not guaranteed stable and could be
removed or changed in future protobuf releases, and that in such cases
hasInvalidUTF8Error will gracefully return false; reference the symbols
invalidUTF8Error and hasInvalidUTF8Error and mention that errors.As is
intentionally used with an interface target to detect that condition.
router-tests/telemetry/attribute_processor_test.go (1)

206-212: Redundant double log filtering.

FilterMessageSnippet("sanitize_utf8").All() is called once inside require.Eventually (where the result is discarded) and then again on Lines 211–212. After require.Eventually exits successfully the log entries are guaranteed to exist; the second call is unnecessary.

♻️ Proposed refactor
-		require.Eventually(t, func() bool {
-			logs := xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
-			return len(logs) > 0
-		}, 5*time.Second, 100*time.Millisecond)
-
-		logs := xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
-		require.NotEmpty(t, logs)
+		var logs []observer.LoggedEntry
+		require.Eventually(t, func() bool {
+			logs = xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
+			return len(logs) > 0
+		}, 5*time.Second, 100*time.Millisecond)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@router-tests/telemetry/attribute_processor_test.go` around lines 206 - 212,
The test is calling xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All()
twice (once inside require.Eventually and again after), which is redundant;
declare a local variable logs before the require.Eventually call, assign to logs
inside the closure used by require.Eventually (e.g., logs =
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(); return len(logs) >
0), and then after require.Eventually use require.NotEmpty(t, logs) so the log
slice is only retrieved once; references: require.Eventually,
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(), require.NotEmpty.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@router-tests/telemetry/attribute_processor_test.go`:
- Line 213: The test calls require.Equal with the actual value first and the
expected string second, which reverses Testify's expected ordering and produces
confusing failure messages; change the call to require.Equal(t, "otel error:
traces export: string field contains invalid UTF-8: Enable
'telemetry.tracing.sanitize_utf8.enabled' in your config to sanitize invalid
UTF-8 attributes.", logs[0].Message) so the literal expectation is the
second-to-last argument and logs[0].Message is the actual (last) argument;
update the invocation that references require.Equal and logs[0].Message
accordingly.

---

Outside diff comments:
In `@router/pkg/trace/meter.go`:
- Around line 218-222: The global error handler is being set unconditionally
causing test-time races; guard the otel.SetErrorHandler call the same way
otel.SetTracerProvider is guarded: only call
otel.SetErrorHandler(otel.ErrorHandlerFunc(errHandler(config))) when
config.MemoryExporter == nil (i.e., inside the same if block that sets the
tracer provider) so NewTracerProvider does not mutate global state during tests
and the errHandler/config.MemoryExporter logic remains consistent.

---

Nitpick comments:
In `@router-tests/telemetry/attribute_processor_test.go`:
- Around line 206-212: The test is calling
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All() twice (once inside
require.Eventually and again after), which is redundant; declare a local
variable logs before the require.Eventually call, assign to logs inside the
closure used by require.Eventually (e.g., logs =
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(); return len(logs) >
0), and then after require.Eventually use require.NotEmpty(t, logs) so the log
slice is only retrieved once; references: require.Eventually,
xEnv.Observer().FilterMessageSnippet("sanitize_utf8").All(), require.NotEmpty.

In `@router/pkg/trace/errors.go`:
- Around line 8-22: Add a short comment above the invalidUTF8Error type and
hasInvalidUTF8Error function that documents the coupling to
google.golang.org/protobuf/internal/impl (errInvalidUTF8) — state that this
check relies on the presence of an internal type exposing InvalidUTF8(), that
the internal package is not guaranteed stable and could be removed or changed in
future protobuf releases, and that in such cases hasInvalidUTF8Error will
gracefully return false; reference the symbols invalidUTF8Error and
hasInvalidUTF8Error and mention that errors.As is intentionally used with an
interface target to detect that condition.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 153432d and eb1a06e.

📒 Files selected for processing (3)
  • router-tests/telemetry/attribute_processor_test.go
  • router/pkg/trace/errors.go
  • router/pkg/trace/meter.go

Comment thread router-tests/telemetry/attribute_processor_test.go Outdated
Comment thread router/pkg/trace/errors.go
@endigma
Copy link
Copy Markdown
Member

endigma commented Feb 24, 2026

Is there a downside to turning it on by default? Seems like if it inevitably just makes errors we should do this implicitly.

@StarpTech
Copy link
Copy Markdown
Contributor

Is there a downside to turning it on by default? Seems like if it inevitably just makes errors we should do this implicitly.

@SkArchon is validating the overhead. If negligible, I'd also suggest enabling it by default.

@SkArchon
Copy link
Copy Markdown
Contributor Author

SkArchon commented Feb 25, 2026

It takes about 14.70089313 microseconds per request (with no custom metric configuration, more string attributes would mean this increases), lets assume the average is about 140ns per attribute (this is only applied to string attributes). While fast, since this is on the hot path and as only one customer has reported it, I would prefer having the log message, that way when folks do see the log message they are almost likely to go and enable this.

Copy link
Copy Markdown
Contributor

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SkArchon SkArchon merged commit 177a1c0 into main Mar 5, 2026
32 checks passed
@SkArchon SkArchon deleted the milinda/log-otel-errors-with-addition branch March 5, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants