Skip to content

node: default OpenTelemetry SampleRatio to 1.0#34948

Merged
jrhea merged 2 commits into
ethereum:masterfrom
barnabasbusa:bbusa/fix-sample-ratio
May 13, 2026
Merged

node: default OpenTelemetry SampleRatio to 1.0#34948
jrhea merged 2 commits into
ethereum:masterfrom
barnabasbusa:bbusa/fix-sample-ratio

Conversation

@barnabasbusa
Copy link
Copy Markdown
Member

Summary

The --rpc.telemetry.sample-ratio flag declares Value: 1.0 and geth --help advertises (default: 1). In practice, however, omitting the flag produces a sample ratio of 0, causing sdktrace.TraceIDRatioBased(0) to drop 100% of spans. Users who enable --rpc.telemetry see the OpenTelemetry trace export enabled log line and a clean startup, but no traces ever leave the process.

The root cause is the interaction between two pieces of code:

  1. cmd/utils/flags.go:setOpenTelemetry (added in cmd/utils: guard SampleRatio flag with IsSet check #34062) only copies the flag value when ctx.IsSet(...) returns true:

    if ctx.IsSet(RPCTelemetrySampleRatioFlag.Name) {
        tcfg.SampleRatio = ctx.Float64(RPCTelemetrySampleRatioFlag.Name)
    }

    That is the right pattern for "don't clobber a config-file value with the CLI default," but it implies that something else must initialise the field when neither source sets it.

  2. node/defaults.go:DefaultConfig never initialises OpenTelemetry.SampleRatio, leaving it at the float64 zero value.

The result for the common CLI-only user (no TOML config) is SampleRatio = 0 → every span is silently dropped, despite the documented default of 1.

Change

Seed OpenTelemetry: OpenTelemetryConfig{SampleRatio: 1.0} in node.DefaultConfig so the documented default matches runtime behavior and the ctx.IsSet guard in setOpenTelemetry continues to do what it was designed to do.

Test plan

  • Build geth from this branch and run with --rpc.telemetry --rpc.telemetry.endpoint=grpc://collector:4317 (no --rpc.telemetry.sample-ratio flag). Confirmed an OTLP gRPC connection is established and spans arrive at an OpenTelemetry collector.
  • Re-tested with http://...:4318 to confirm both transports work without the explicit flag.
  • Reproduced the broken behavior on plain master (no flag → no spans), and confirmed the fix restores expected behavior.
  • Unit-test coverage in cmd/utils for the default-value path (not included; happy to add if reviewers prefer).

The --rpc.telemetry.sample-ratio CLI flag declares a default of 1.0,
but cmd/utils.setOpenTelemetry only copies the flag value into the
node config when ctx.IsSet returns true. Without a corresponding
default on node.DefaultConfig.OpenTelemetry.SampleRatio, omitting the
flag leaves the runtime ratio at the float64 zero value, which causes
sdktrace.TraceIDRatioBased(0) to drop 100% of spans. Users see
"OpenTelemetry trace export enabled" but no traces ever leave geth.
Seed the default in DefaultConfig so the documented behavior matches
runtime behavior.
@barnabasbusa barnabasbusa requested a review from fjl as a code owner May 12, 2026 12:56
@jwasinger jwasinger force-pushed the bbusa/fix-sample-ratio branch from 2daeb58 to e1f7bae Compare May 12, 2026 13:25
@fjl
Copy link
Copy Markdown
Contributor

fjl commented May 12, 2026

Please remove the tests. Also please use node.DefaultConfig.OpenTelemetry.SampleRatio to initialize the flag default value. This is what we usually do.

@barnabasbusa
Copy link
Copy Markdown
Member Author

Removed the tests

@jrhea jrhea added this to the 1.17.4 milestone May 13, 2026
@jrhea jrhea merged commit da34eb5 into ethereum:master May 13, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants