feat: sanitize utf8 strings with binary in spans#2485
Conversation
WalkthroughAdds a composable OpenTelemetry attribute processor (transformers for redaction, IP anonymization, and UTF‑8 sanitization), wires it into tracer initialization and the test environment, replaces the legacy redact package, and ensures span status descriptions are UTF‑8 sanitized before being recorded. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
Comment |
Router-nonroot image scan passed✅ No security vulnerabilities found in image: |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2485 +/- ##
==========================================
- Coverage 62.32% 62.21% -0.11%
==========================================
Files 239 241 +2
Lines 25456 25499 +43
==========================================
- Hits 15865 15864 -1
- Misses 8241 8297 +56
+ Partials 1350 1338 -12
🚀 New features to boost your workflow:
|
|
Please create a proper PR description that explains the situation and solution. |
…-error-traces-export-invalid-utf-8-in-string-field
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@router-tests/telemetry/attribute_processor_test.go`:
- Around line 149-156: The test currently asserts attributes by indexing the
first snapshot (using exporter.GetSpans().Snapshots() and sn[0].Attributes()),
which is order-dependent and flaky; change the assertions to search across all
snapshots and assert that any span contains the expected attribute (e.g.,
iterate over sn := exporter.GetSpans().Snapshots() and check each snapshot's
Attributes() for sanitizedAttr/unsanitizedAttr using attrKey and sanitizedValue)
or add a small helper (spanHasAttribute) to encapsulate the search; apply the
same change to the other occurrences that reference sn[0] for attribute
assertions (the similar checks around the other test assertions).
- Around line 296-324: The test "IPAnonymization redacts http.client_ip
attribute" currently only checks redaction when http.client_ip exists and can
silently pass if the attribute is missing; update the check to assert that
http.client_ip is present at least once: when iterating over
exporter.GetSpans().Snapshots() and span.Attributes(), track a counter/flag for
occurrences of attribute.Key("http.client_ip"), assert the count is > 0 with
require.Greater/True, and continue to assert each found attr.Value.AsString()
equals "[REDACTED]"; apply the same presence-assertion change to the other test
block (lines 327-357) that verifies hashing so both tests fail if the attribute
is absent.
In `@router/pkg/trace/attributeprocessor/redact.go`:
- Around line 24-52: RedactKeys can leave rFunc nil for unsupported
IPAnonymizationMethod, causing a panic when used; ensure you guard against that
by assigning a safe default (e.g., set rFunc to the Redact behavior before the
switch or add a switch default case that sets rFunc to return "[REDACTED]") or
by returning a no-op transformer when the method is unrecognized; update the
RedactKeys function (symbols: RedactKeys, rFunc, IPAnonymizationMethod, Hash,
Redact, attribute.KeyValue) so rFunc is always non-nil (or short-circuit and
return kv.Value,false) to avoid runtime nil dereference.
In `@router/pkg/trace/attributeprocessor/sanitizeutf8.go`:
- Around line 16-40: SanitizeUTF8 currently dereferences config and runs even
when config is nil or disabled; update SanitizeUTF8 to short‑circuit when config
== nil or config.Enabled == false by returning an identity AttributeTransformer
that returns (kv.Value, false) for all inputs, and only access
config.LogSanitizations after that guard; also only create a nop logger when you
actually need logging (i.e., config.LogSanitizations == true and logger == nil)
to avoid nil derefs and unintended sanitization.
In `@router/pkg/trace/meter.go`:
- Around line 149-162: The IP anonymization branch can pass an empty/unknown
Method into attributeprocessor.RedactKeys which may cause a nil-function panic;
before calling attributeprocessor.RedactKeys when config.IPAnonymization.Enabled
is true, validate config.IPAnonymization.Method and default it to the safe
redact option (e.g., the attributeprocessor redact method constant) if
empty/unknown, then call attributeprocessor.RedactKeys(SensitiveAttributes,
validatedMethod); keep the rest of the transformer-building logic (including
SensitiveAttributes, SanitizeUTF8 and NewAttributeProcessorOption) unchanged.
…valid-utf-8-in-string-field
…-error-traces-export-invalid-utf-8-in-string-field
There was a problem hiding this comment.
🧹 Nitpick comments (5)
router/pkg/trace/attributeprocessor/sanitizeutf8.go (1)
11-14: Note: Type duplication withconfig.SanitizeUTF8Config.This struct duplicates
config.SanitizeUTF8Configfromrouter/pkg/config/config.go. While this separation may be intentional (config layer vs. runtime layer), consider whether a shared type or type alias could reduce maintenance burden.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/sanitizeutf8.go` around lines 11 - 14, The SanitizeUTF8Config struct in sanitizeutf8.go duplicates config.SanitizeUTF8Config; replace the local definition by reusing the config package type (either by importing router/pkg/config and referencing config.SanitizeUTF8Config or by creating a type alias like type SanitizeUTF8Config = config.SanitizeUTF8Config) and update all uses in this package (references to SanitizeUTF8Config, constructors or function signatures) to the single shared type so maintenance stays centralized.router/pkg/trace/attributeprocessor/redact.go (1)
11-23: Note: Type duplication withrouter/core/router.go.The
IPAnonymizationMethodtype and constants (Hash,Redact) are also defined inrouter/core/router.goper the relevant code snippets. Consider consolidating these definitions to avoid drift and ensure consistency.#!/bin/bash # Verify the duplication and check for any differences rg -n "IPAnonymizationMethod" --type=go -C2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/redact.go` around lines 11 - 23, Duplicate type/const definitions for IPAnonymizationMethod (and its values Hash, Redact) and IPAnonymizationConfig exist; remove the duplicate by centralizing the type/const in a single package and have the other code import and use that central definition. Specifically, keep one canonical definition of IPAnonymizationMethod, IPAnonymizationConfig, Hash and Redact (where they already exist), delete the duplicate declarations from the other file (remove the block containing IPAnonymizationMethod, IPAnonymizationConfig, Hash, Redact in redact.go) and update any references to use the centralized symbol; if necessary, create or use an existing shared package export for these symbols so both consumers compile against the single source of truth.router/pkg/trace/attributeprocessor/sanitizeutf8_test.go (1)
132-174: Benchmarks mutate attributes in-place, so later iterations don’t exercise sanitization.
Consider resetting/copying the base attributes each iteration to keep results representative.♻️ Proposed benchmark fix (reset attributes each iteration)
- s := sanitizeSpan{attrs: attrs} - ac := NewAttributeProcessor(SanitizeUTF8(&SanitizeUTF8Config{Enabled: true}, nil)) - ctx := context.Background() + scratch := make([]attribute.KeyValue, len(attrs)) + ac := NewAttributeProcessor(SanitizeUTF8(&SanitizeUTF8Config{Enabled: true}, nil)) + ctx := context.Background() return func(b *testing.B) { b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { - ac.OnStart(ctx, s) - ac.OnEnd(s) + copy(scratch, attrs) + s := sanitizeSpan{attrs: scratch} + ac.OnStart(ctx, s) + ac.OnEnd(s) } }- s := sanitizeSpan{attrs: attrs} - ac := NewAttributeProcessor(SanitizeUTF8(&SanitizeUTF8Config{Enabled: true}, nil)) - ctx := context.Background() + scratch := make([]attribute.KeyValue, len(attrs)) + ac := NewAttributeProcessor(SanitizeUTF8(&SanitizeUTF8Config{Enabled: true}, nil)) + ctx := context.Background() b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { - ac.OnStart(ctx, s) - ac.OnEnd(s) + copy(scratch, attrs) + s := sanitizeSpan{attrs: scratch} + ac.OnStart(ctx, s) + ac.OnEnd(s) }Also applies to: 177-203
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/sanitizeutf8_test.go` around lines 132 - 174, The benchmark currently mutates the shared attrs slice (created in benchSanitizeUTF8OnEnd) so later iterations skip real work; modify the returned closure from benchSanitizeUTF8OnEnd (used by BenchmarkSanitizeUTF8OnEnd) to reset or copy the base attributes each iteration before calling ac.OnStart/ac.OnEnd: e.g., keep an immutable template slice or recreate sanitizeSpan with a fresh attrs slice inside the loop, or perform a per-iteration deep copy of attrs into a new sanitizeSpan so sanitization runs every iteration; update both benchSanitizeUTF8OnEnd and the analogous bench function around lines 177-203.router/pkg/trace/attributeprocessor/redact_test.go (1)
233-290: Benchmarks reuse mutated attributes, skewing redaction timing.
Reset/copy the base attributes each iteration to keep the benchmark honest.♻️ Proposed benchmark fix (reset attributes each iteration)
return func(b *testing.B) { transformer, err := RedactKeys(keys, method) if err != nil { b.Fatal(err) } - s := rwSpan{attrs: attrs} + scratch := make([]attribute.KeyValue, len(attrs)) ac := NewAttributeProcessor(transformer) ctx := context.Background() b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { - ac.OnStart(ctx, s) - ac.OnEnd(s) + copy(scratch, attrs) + s := rwSpan{attrs: scratch} + ac.OnStart(ctx, s) + ac.OnEnd(s) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/redact_test.go` around lines 233 - 290, The benchmark benchRedactOnEnd reuses and mutates the same attrs slice across iterations which skews timing; fix it by creating a fresh copy of the base attributes for each iteration (e.g., clone attrs into a new slice and construct a new rwSpan with that slice) before calling ac.OnStart and ac.OnEnd inside the for i := 0; i < b.N loop so each iteration operates on unmodified data; keep the transformer creation (RedactKeys) and processor (NewAttributeProcessor) outside the loop to avoid measuring setup time.router/pkg/trace/attributeprocessor/attributes_test.go (1)
185-245: Benchmark reuses mutated attributes, so later iterations don’t exercise transformations.
Consider resetting/copying the base attributes per-iteration.♻️ Proposed benchmark fix (reset attributes each iteration)
- s := benchSpan{attrs: attrs} + scratch := make([]attribute.KeyValue, len(attrs)) redactTransformer, err := RedactKeys(keys, Redact) if err != nil { panic(err) } ac := NewAttributeProcessor( redactTransformer, SanitizeUTF8(&SanitizeUTF8Config{Enabled: true}, nil), ) ctx := context.Background() return func(b *testing.B) { b.ReportAllocs() b.ResetTimer() for i := 0; i < b.N; i++ { - ac.OnStart(ctx, s) - ac.OnEnd(s) + copy(scratch, attrs) + s := benchSpan{attrs: scratch} + ac.OnStart(ctx, s) + ac.OnEnd(s) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/attributes_test.go` around lines 185 - 245, The benchmark reuses and mutates the same attrs slice across iterations so later runs don't exercise transformations; inside benchCombinedTransformers (the returned func) copy or reset the original attrs into s.attrs each iteration before calling ac.OnStart/ac.OnEnd (e.g., keep a baseline slice created outside the returned func and shallow-copy its elements into s.attrs inside the loop) so each iteration gets fresh attributes to transform; ensure you copy the attribute.KeyValue slice (and values if you mutate them) rather than reusing the same slice referenced by benchSpan, and then run ac.OnStart(ctx, s) / ac.OnEnd(s) as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@router-tests/telemetry/attribute_processor_test.go`:
- Around line 57-60: The test currently checks sn[0].Attributes() for the
sanitized attribute (sanitizedAttr), which assumes span order; instead iterate
over the snapshot slice sn and assert that at least one span's Attributes()
contains attribute.String(attrKey, sanitizedValue). Update the assertions in the
cases referencing sn[0] (including the other instances around lines noted) to
loop through sn, break on first match, and fail the test if no span contains the
expected attribute.
---
Nitpick comments:
In `@router/pkg/trace/attributeprocessor/attributes_test.go`:
- Around line 185-245: The benchmark reuses and mutates the same attrs slice
across iterations so later runs don't exercise transformations; inside
benchCombinedTransformers (the returned func) copy or reset the original attrs
into s.attrs each iteration before calling ac.OnStart/ac.OnEnd (e.g., keep a
baseline slice created outside the returned func and shallow-copy its elements
into s.attrs inside the loop) so each iteration gets fresh attributes to
transform; ensure you copy the attribute.KeyValue slice (and values if you
mutate them) rather than reusing the same slice referenced by benchSpan, and
then run ac.OnStart(ctx, s) / ac.OnEnd(s) as before.
In `@router/pkg/trace/attributeprocessor/redact_test.go`:
- Around line 233-290: The benchmark benchRedactOnEnd reuses and mutates the
same attrs slice across iterations which skews timing; fix it by creating a
fresh copy of the base attributes for each iteration (e.g., clone attrs into a
new slice and construct a new rwSpan with that slice) before calling ac.OnStart
and ac.OnEnd inside the for i := 0; i < b.N loop so each iteration operates on
unmodified data; keep the transformer creation (RedactKeys) and processor
(NewAttributeProcessor) outside the loop to avoid measuring setup time.
In `@router/pkg/trace/attributeprocessor/redact.go`:
- Around line 11-23: Duplicate type/const definitions for IPAnonymizationMethod
(and its values Hash, Redact) and IPAnonymizationConfig exist; remove the
duplicate by centralizing the type/const in a single package and have the other
code import and use that central definition. Specifically, keep one canonical
definition of IPAnonymizationMethod, IPAnonymizationConfig, Hash and Redact
(where they already exist), delete the duplicate declarations from the other
file (remove the block containing IPAnonymizationMethod, IPAnonymizationConfig,
Hash, Redact in redact.go) and update any references to use the centralized
symbol; if necessary, create or use an existing shared package export for these
symbols so both consumers compile against the single source of truth.
In `@router/pkg/trace/attributeprocessor/sanitizeutf8_test.go`:
- Around line 132-174: The benchmark currently mutates the shared attrs slice
(created in benchSanitizeUTF8OnEnd) so later iterations skip real work; modify
the returned closure from benchSanitizeUTF8OnEnd (used by
BenchmarkSanitizeUTF8OnEnd) to reset or copy the base attributes each iteration
before calling ac.OnStart/ac.OnEnd: e.g., keep an immutable template slice or
recreate sanitizeSpan with a fresh attrs slice inside the loop, or perform a
per-iteration deep copy of attrs into a new sanitizeSpan so sanitization runs
every iteration; update both benchSanitizeUTF8OnEnd and the analogous bench
function around lines 177-203.
In `@router/pkg/trace/attributeprocessor/sanitizeutf8.go`:
- Around line 11-14: The SanitizeUTF8Config struct in sanitizeutf8.go duplicates
config.SanitizeUTF8Config; replace the local definition by reusing the config
package type (either by importing router/pkg/config and referencing
config.SanitizeUTF8Config or by creating a type alias like type
SanitizeUTF8Config = config.SanitizeUTF8Config) and update all uses in this
package (references to SanitizeUTF8Config, constructors or function signatures)
to the single shared type so maintenance stays centralized.
|
@endigma let me know if all good |
endigma
left a comment
There was a problem hiding this comment.
looks good, modernize nit: the benchmarks should use b.Loop now
There was a problem hiding this comment.
♻️ Duplicate comments (1)
router/pkg/trace/attributeprocessor/sanitizeutf8_test.go (1)
197-201: Sametesting.B.Loop()compatibility check applies here.This loop has the same version dependency as the previous benchmark.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/sanitizeutf8_test.go` around lines 197 - 201, The benchmark uses testing.B.Loop(), which is only available in newer Go versions; replace the Loop() usage with the standard for i := 0; i < b.N; i++ pattern so the benchmark remains compatible across Go versions — update the loop that calls ac.OnStart(ctx, s) and ac.OnEnd(s) to run inside for i := 0; i < b.N; i++ while keeping b.ReportAllocs() and the same ctx, s, and ac calls.
🧹 Nitpick comments (2)
router/pkg/trace/attributeprocessor/attributes_test.go (1)
38-49: Unused test helper function.
testAttributesAfterCreationis defined but not used anywhere in this test file. Consider removing it if it's not needed, or add a test case that exercises the "set attributes after creation" scenario to ensure that path is also covered.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/attributes_test.go` around lines 38 - 49, The helper function testAttributesAfterCreation is unused; either remove this function entirely or add a test that calls testAttributesAfterCreation to exercise setting attributes after span creation and assert the recorded attributes via attrRecorder (it uses trace.NewTracerProvider with opt, trace.WithSpanProcessor(r), tracer := tp.Tracer("testAttributes"), tracer.Start and s.SetAttributes); choose one of these fixes so the file no longer contains dead code or so the "set attributes after creation" path is covered by a test.router/pkg/trace/attributeprocessor/sanitizeutf8_test.go (1)
13-119: Strengthen sanitization tests by asserting removal of invalid values.
require.Containsalone would pass even if the sanitizer keeps the original invalid attribute (or duplicates it). Addrequire.NotContainsto ensure the invalid value is removed/replaced.Proposed tightening of assertions
@@ attributes := testAttributes(t.Context(), NewAttributeProcessorOption(SanitizeUTF8(&SanitizeUTF8Config{Enabled: true}, nil)), invalidStr) require.Contains(t, attributes, expected) + require.NotContains(t, attributes, invalidStr) @@ attributes := testAttributes(t.Context(), NewAttributeProcessorOption(SanitizeUTF8(&SanitizeUTF8Config{Enabled: true}, nil)), mixedStr) require.Contains(t, attributes, expected) + require.NotContains(t, attributes, mixedStr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/pkg/trace/attributeprocessor/sanitizeutf8_test.go` around lines 13 - 119, The tests currently only use require.Contains which would still pass if the sanitizer left the original invalid attribute; update the relevant subtests in TestSanitizeUTF8 to also assert the invalid original is removed by adding require.NotContains checks: for the "invalid UTF-8 bytes" case assert NotContains(attributes, invalidStr); for "mixed valid and invalid UTF-8" assert NotContains(attributes, mixedStr); for "multiple separate invalid sequences" assert NotContains(attributes, mixedStr) (the mixedBytes/mixedStr variable there); keep the "without sanitizer" case as-is to assert the original remains. Use the existing variable names (invalidStr, mixedStr, etc.) so the assertions locate the exact original attribute to confirm replacement.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@router/pkg/trace/attributeprocessor/sanitizeutf8_test.go`:
- Around line 197-201: The benchmark uses testing.B.Loop(), which is only
available in newer Go versions; replace the Loop() usage with the standard for i
:= 0; i < b.N; i++ pattern so the benchmark remains compatible across Go
versions — update the loop that calls ac.OnStart(ctx, s) and ac.OnEnd(s) to run
inside for i := 0; i < b.N; i++ while keeping b.ReportAllocs() and the same ctx,
s, and ac calls.
---
Nitpick comments:
In `@router/pkg/trace/attributeprocessor/attributes_test.go`:
- Around line 38-49: The helper function testAttributesAfterCreation is unused;
either remove this function entirely or add a test that calls
testAttributesAfterCreation to exercise setting attributes after span creation
and assert the recorded attributes via attrRecorder (it uses
trace.NewTracerProvider with opt, trace.WithSpanProcessor(r), tracer :=
tp.Tracer("testAttributes"), tracer.Start and s.SetAttributes); choose one of
these fixes so the file no longer contains dead code or so the "set attributes
after creation" path is covered by a test.
In `@router/pkg/trace/attributeprocessor/sanitizeutf8_test.go`:
- Around line 13-119: The tests currently only use require.Contains which would
still pass if the sanitizer left the original invalid attribute; update the
relevant subtests in TestSanitizeUTF8 to also assert the invalid original is
removed by adding require.NotContains checks: for the "invalid UTF-8 bytes" case
assert NotContains(attributes, invalidStr); for "mixed valid and invalid UTF-8"
assert NotContains(attributes, mixedStr); for "multiple separate invalid
sequences" assert NotContains(attributes, mixedStr) (the mixedBytes/mixedStr
variable there); keep the "without sanitizer" case as-is to assert the original
remains. Use the existing variable names (invalidStr, mixedStr, etc.) so the
assertions locate the exact original attribute to confirm replacement.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
router/pkg/trace/attributeprocessor/attributes_test.gorouter/pkg/trace/attributeprocessor/sanitizeutf8_test.go
…valid-utf-8-in-string-field
There was a problem hiding this comment.
🧹 Nitpick comments (1)
router/core/router.go (1)
829-833: Type conversion assumes matching enum values.The cast from
core.IPAnonymizationMethodtoattributeprocessor.IPAnonymizationMethodat line 831 relies on both types using identical underlying string values ("hash", "redact"). This is acceptable but creates an implicit coupling between the two packages.Consider adding a compile-time assertion or a brief comment noting this dependency to prevent future divergence.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@router/core/router.go` around lines 829 - 833, The code currently casts core.IPAnonymizationMethod to attributeprocessor.IPAnonymizationMethod (used when constructing IPAnonymization) which implicitly assumes both enum string values stay identical; replace the direct cast with a small explicit converter function (e.g., convertIPAnonymizationMethod(core.IPAnonymizationMethod) attributeprocessor.IPAnonymizationMethod) that switches on known cases ("hash", "redact") and returns a safe default for unknown values, and add a short comment near the IPAnonymization / IPAnonymizationMethod usage documenting the coupling to prevent silent divergence between the two packages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@router/core/router.go`:
- Around line 829-833: The code currently casts core.IPAnonymizationMethod to
attributeprocessor.IPAnonymizationMethod (used when constructing
IPAnonymization) which implicitly assumes both enum string values stay
identical; replace the direct cast with a small explicit converter function
(e.g., convertIPAnonymizationMethod(core.IPAnonymizationMethod)
attributeprocessor.IPAnonymizationMethod) that switches on known cases ("hash",
"redact") and returns a safe default for unknown values, and add a short comment
near the IPAnonymization / IPAnonymizationMethod usage documenting the coupling
to prevent silent divergence between the two packages.
…valid-utf-8-in-string-field
Sometimes spans can contain bad strings which has non utf8 characters, when they are serialized into protobuf for transport to collectors, protobuf throws an error because it cannot serialize non utf8 characters.
This means that users will loose these spans, in order to handle this we need to sanitize strings. We do this two ways.
For statuses and names, we simply by default opt to sanitize the string. However as there are a lot of attributes, we allow the user to specify if they want to sanitize the span. In order to do this we refactor the current
OnEndfunction which has redact and add refactorredactto its own transformer and addsanitizeutf8as another transformer.Summary by CodeRabbit
New Features
Configuration
Tests
Checklist