docs(collector): update internal telemetry RPC metrics#9554
docs(collector): update internal telemetry RPC metrics#9554dol wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
Align the internal telemetry docs with the RPC metrics emitted by newer Collector builds that use the updated Go semantic conventions. Rename the documented RPC duration metrics to rpc.client.call.duration and rpc.server.call.duration, remove the deprecated *_per_rpc entries from the detailed metrics list, and note that older Collector releases may still expose the legacy rpc.client.duration and rpc.server.duration metric names.
37feaa1 to
0c56862
Compare
| | `rpc.client.call.duration` | Measures the duration of outbound remote procedure calls (RPC). | Histogram | | ||
| | `rpc.client.request.size` | Measures the size of RPC request messages (uncompressed). | Histogram | | ||
| | `rpc.client.response.size` | Measures the size of RPC response messages (uncompressed). | Histogram | | ||
| | `rpc.server.call.duration` | Measures the duration of inbound remote procedure calls (RPC). | Histogram | | ||
| | `rpc.server.request.size` | Measures the size of RPC request messages (uncompressed). | Histogram | | ||
| | `rpc.server.response.size` | Measures the size of RPC response messages (uncompressed). | Histogram | |
There was a problem hiding this comment.
AFAIK, only the .call. metrics are available in the latest version of collector and semconv https://opentelemetry.io/docs/specs/semconv/rpc/rpc-metrics/
| | `rpc.client.call.duration` | Measures the duration of outbound remote procedure calls (RPC). | Histogram | | |
| | `rpc.client.request.size` | Measures the size of RPC request messages (uncompressed). | Histogram | | |
| | `rpc.client.response.size` | Measures the size of RPC response messages (uncompressed). | Histogram | | |
| | `rpc.server.call.duration` | Measures the duration of inbound remote procedure calls (RPC). | Histogram | | |
| | `rpc.server.request.size` | Measures the size of RPC request messages (uncompressed). | Histogram | | |
| | `rpc.server.response.size` | Measures the size of RPC response messages (uncompressed). | Histogram | | |
| | `rpc.client.call.duration` | Measures the duration of outbound remote procedure calls (RPC). | Histogram | | |
| | `rpc.server.call.duration` | Measures the duration of inbound remote procedure calls (RPC). | Histogram | |
There was a problem hiding this comment.
Indeed, I've just confirmed experimentally that:
- in 0.147.0, the RPC metrics changed to the ones currently in the PR;
- in 0.148.0, the size metrics were disabled.
Not sure why that happened in two steps, or whether there is any replacement for the size metrics?
There was a problem hiding this comment.
I dug into this with AI's help, and the short answer is no, there is no replacement. The Semantic Conventions SIG deprecated the rpc size metrics because they had "ambiguous definitions and inconsistent implementation".
So we'll need to update this page with a little more guidance.
Thanks for pointing this out, @odubajDT!
| > [!NOTE] | ||
| > | ||
| > The `http*` and `rpc*` metrics are not covered by the maturity levels below | ||
| > since they are not under the Collector SIG control. | ||
| > | ||
| > RPC metric names are version-dependent. For instance, Collector releases | ||
| > prior to 0.147.0 exposed `rpc.client.duration` and `rpc.server.duration` | ||
| > instead of `rpc.client.call.duration` and `rpc.server.call.duration`. | ||
| > | ||
| > The `otelcol_processor_batch_` metrics are unique to the `batchprocessor`. | ||
| > | ||
| > The `otelcol_receiver_`, `otelcol_scraper_`, `otelcol_processor_`, and | ||
| > `otelcol_exporter_` metrics come from their respective `helper` packages. As | ||
| > such, some components not using those packages might not emit them. |
There was a problem hiding this comment.
Since the RPC section was getting a little long, I suggest we eliminate the callout and create a new section about metric ownership. Very open to feedback on this idea.
| #### Ownership of emitted metrics | |
| Some metrics are not owned by the Collector SIG and some are limited to certain components. | |
| **`http*`and `rpc` metrics** | |
| These metrics are not under the Collector SIG's control, and as such, are not covered by the maturity levels below. | |
| **`rpc` metrics** | |
| The Collector's internal RPC metrics come from the upstream | |
| [`otelgrpc`](https://github.com/open-telemetry/opentelemetry-go-contrib/tree/main/instrumentation/google.golang.org/grpc/otelgrpc) | |
| instrumentation, which tracks the [OpenTelemetry RPC semantic conventions](/docs/specs/semconv/rpc/rpc-metrics/). The set of RPC | |
| metrics emitted by the Collector has changed across releases: | |
| | Collector version | Emitted RPC metrics | | |
| | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | |
| | v0.146.x and earlier | `rpc.client.duration`, `rpc.server.duration`, `rpc.*.request.size`, `rpc.*.response.size`, `rpc.*.requests_per_rpc`, `rpc.*.responses_per_rpc` | | |
| | v0.147.0 | `rpc.client.call.duration`, `rpc.server.call.duration`, `rpc.*.request.size`, `rpc.*.response.size` (the `*_per_rpc` metrics are deprecated and no longer emitted) | | |
| | v0.148.0 and later | `rpc.client.call.duration`, `rpc.server.call.duration` only | | |
| RPC size metrics are not emitted by | |
| Collector v0.148.0 or later. The [RPC semantic conventions v1.40.0](https://github.com/open-telemetry/semantic-conventions/releases/tag/v1.40.0) | |
| deprecated them due to ambiguous definitions and inconsistent implementation. | |
| **`otelcol_processor_batch_*` metrics** | |
| These metrics are unique to the `batchprocessor`. | |
| **`helper` package metrics** | |
| The `otelcol_receiver_`, `otelcol_scraper_`, `otelcol_processor_`, and `otelcol_exporter_` metrics come from their respective `helper` packages. As such, some components not using those packages might not emit them. |
This PR updates the Collector internal telemetry docs to match the RPC metrics emitted by newer Collector builds.
It makes three concrete documentation corrections in
content/en/docs/collector/internal-telemetry.md:rpc.client.duration/rpc.server.durationtorpc.client.call.duration/rpc.server.call.durationrpc.*.requests_per_rpcandrpc.*.responses_per_rpcentries from the detailed metrics tablerpc.client.durationandrpc.server.durationnamesThe reason for this change is a dependency chain across the semantic conventions, the generated Go semconv packages, the gRPC instrumentation used by the Collector, and finally the Collector docs.
The upstream source of truth is the semantic-conventions repo:
requests_per_rpcandresponses_per_rpcmetrics were deprecated and moved out of the active RPC metric model frommodel/rpc/metrics.yamlintomodel/rpc/deprecated/metrics-deprecated.yaml.rpc.client.durationandrpc.server.durationwere renamed torpc.client.call.durationandrpc.server.call.duration, and the duration unit changed from milliseconds to seconds.Those semantic-convention changes then flowed into the generated Go semconv packages:
Generate semconv/v1.38.0), the generated Go RPC semconv package no longer includesrpc.server.responses_per_rpcand the corresponding*_per_rpchelpers.Generate semconv/v1.39.0), the generated Go RPC semconv package switches fromrpc.server.durationtorpc.server.call.durationand fromrpc.client.durationtorpc.client.call.duration.That matters for the Collector because its gRPC internal telemetry is not hand-authored under Collector-specific metric names. Collector core wires the gRPC server and client through
otelgrpcinconfig/configgrpc/configgrpc.go, andotelgrpcconstructs its instruments from the generatedrpcconvhelpers. In current Go module versions that means:otelgrpc.NewServerHandler(...)usesrpcconv.NewServerCallDuration(...)otelgrpc.NewClientHandler(...)usesrpcconv.NewClientCallDuration(...)So once Collector core upgraded its OpenTelemetry Go dependencies, the emitted internal RPC duration metrics changed as a consequence of that dependency update, even though the Collector’s own gRPC instrumentation wiring stayed conceptually the same.
This is the important part of the reasoning for the documentation change:
detailedverbosity.*_per_rpcmetrics and renamed the duration metrics.opentelemetry-goregenerated its semconv packages from those newer semantic conventions.otelgrpcuses those regenerated semconv helpers to define the actual instrument names.otelgrpcfor internal gRPC client/server instrumentation.*.call.durationnames and no longer expose the deprecated*_per_rpcmetrics through the current Go-based internal RPC instrumentation path.That is why the previous documentation became stale:
rpc.client.durationandrpc.server.durationas if they were the current namesrpc.*.requests_per_rpcandrpc.*.responses_per_rpcas if they were still part of the current generated Go instrumentation surfaceThis PR updates the page to reflect the current state while still noting that older Collector releases may show the legacy duration names. I kept that compatibility note because users may be comparing different Collector versions and the page should make the version boundary understandable instead of presenting the current names as timeless.
Validation:
npm run checkcheck:i18ndrift in many unrelated translated files, not because of this doc changeRelevant upstream references:
Repo commit for this docs change:
docs(collector): update internal telemetry RPC metricsFootnotes
Yes, I can answer maintainer questions about the content of this PR, without using AI. ↩