|
| 1 | +# Export SpanContext.IsRemote in OTLP |
| 2 | + |
| 3 | +Update OTLP to indicate whether a span's parent is remote. |
| 4 | + |
| 5 | +## Motivation |
| 6 | + |
| 7 | +It is sometimes useful to post-process or visualise only entry-point spans: spans which either have no parent (trace roots), or which have a remote parent. |
| 8 | +For example, the Elastic APM solution highlights entry-point spans (Elastic APM refers to these as "transactions") and surfaces these as top-level operations |
| 9 | +in its user interface. |
| 10 | + |
| 11 | +The goal is to identify the spans which represent a request that is entering a service, or originating within a service, without having to first assemble the |
| 12 | +complete distributed trace as a DAG (Directed Acyclic Graph). It is trivially possible to identify trace roots, but it is not possible to identify spans with |
| 13 | +remote parents. |
| 14 | + |
| 15 | +Here is a contrived example distributed trace, with a border added to the entry-point spans: |
| 16 | + |
| 17 | +```mermaid |
| 18 | +graph TD |
| 19 | + subgraph comments_service |
| 20 | + POST_comments(POST /comment) |
| 21 | + POST_comments --> comments_send(comments send) |
| 22 | + end |
| 23 | +
|
| 24 | + subgraph auth_service |
| 25 | + POST_comments --> POST_auth(POST /auth) |
| 26 | + POST_auth --> LDAP |
| 27 | + end |
| 28 | +
|
| 29 | + subgraph user_details_service |
| 30 | + POST_comments --> GET_user_details(GET /user_details) |
| 31 | + GET_user_details --> SELECT_users(SELECT FROM users) |
| 32 | + end |
| 33 | +
|
| 34 | + subgraph comments_inserter |
| 35 | + comments_send --> comments_receive(comments receive) |
| 36 | + comments_receive --> comments_process(comments process) |
| 37 | + comments_process --> INSERT_comments(INSERT INTO comments) |
| 38 | + end |
| 39 | +
|
| 40 | + style POST_comments stroke-width:4 |
| 41 | + style POST_auth stroke-width:4 |
| 42 | + style GET_user_details stroke-width:4 |
| 43 | + style comments_receive stroke-width:4 |
| 44 | +``` |
| 45 | + |
| 46 | +## Explanation |
| 47 | + |
| 48 | +The OTLP encoding for spans has a boolean `parent_span_is_remote` field for identifying whether a span's parent is remote or not. |
| 49 | +All OpenTelemetry SDKs populate this field, and backends may use it to identify a span as being an entry-point span. |
| 50 | +A span can be considered an entry-point span if it has no parent (`parent_span_id` is empty), or if `parent_span_is_remote` is true. |
| 51 | + |
| 52 | +## Internal details |
| 53 | + |
| 54 | +The first part would be to update the trace protobuf, adding a `boolean parent_span_is_remote` field to the |
| 55 | +[`Span` message](https://github.com/open-telemetry/opentelemetry-proto/blob/b43e9b18b76abf3ee040164b55b9c355217151f3/opentelemetry/proto/trace/v1/trace.proto#L84). |
| 56 | + |
| 57 | +[`SpanContext.IsRemote`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#isremote) identifies whether span context has been propagated from a remote parent. |
| 58 | +The OTLP exporter in each SDK would need to be updated to record this in the new `parent_span_is_remote` field. |
| 59 | + |
| 60 | +For backwards compatibility with older OTLP versions, the protobuf field should be `nullable` (`true`, `false`, or unspecified) |
| 61 | +and the opentelemetry-collector protogen code should provide an API that enables backend exporters to identify whether the field is set. |
| 62 | + |
| 63 | +```go |
| 64 | +package pdata |
| 65 | + |
| 66 | +// ParentSpanIsRemote indicates whether ms's parent span is remote, if known. |
| 67 | +// If the parent span remoteness property is known then the "ok" result will be true, |
| 68 | +// and false otherwise. |
| 69 | +func (ms Span) ParentSpanIsRemote() (remote bool, ok bool) |
| 70 | +``` |
| 71 | + |
| 72 | +## Trade-offs and mitigations |
| 73 | + |
| 74 | +None identified. |
| 75 | + |
| 76 | +## Prior art and alternatives |
| 77 | + |
| 78 | +### Alternative 1: include entry-point span ID in other spans |
| 79 | + |
| 80 | +As an alternative to identifying whether the parent span is remote, we could instead encode and propagate the ID of the entry-point span in all non entry-point spans. |
| 81 | +Thus we can identify entry-point spans by lack of this field. |
| 82 | + |
| 83 | +The entry-point span ID would be captured when starting a span with a remote parent, and propagated through `SpanContext`. We would introduce a new `entry_span_id` field to |
| 84 | +the `Span` protobuf message definition, and set it in OTLP exporters. |
| 85 | + |
| 86 | +This was originally [proposed in OpenCensus](https://github.com/census-instrumentation/opencensus-specs/issues/229) with no resolution. |
| 87 | + |
| 88 | +The drawbacks of this alternative are: |
| 89 | + |
| 90 | +- `SpanContext` would need to be extended to include the entry-point span ID; SDKs would need to be updated to capture and propagate it |
| 91 | +- The additional protobuf field would be an additional 8 bytes, vs 1 byte for the boolean field |
| 92 | + |
| 93 | +The main benefit of this approach is that it additionally enables backends to group spans by their process subgraph. |
| 94 | + |
| 95 | +### Alternative 2: introduce a semantic convention attribute to identify entry-point spans |
| 96 | + |
| 97 | +As an alternative to adding a new field to spans, a new semantic convention attribute could be added to only entry-point spans. |
| 98 | + |
| 99 | +This approach would avoid increasing the memory footprint of all spans, but would have a greater memory footprint for entry-point spans. |
| 100 | +The benefit of this approach would therefore depend on the ratio of entry-point to internal spans, and may even be more expensive. |
| 101 | + |
| 102 | +### Alternative 3: extend SpanKind values |
| 103 | + |
| 104 | +Another alternative is to extend the SpanKind values to unambiguously define when a CONSUMER span has a remote parent or a local parent (e.g. with the message polling use case). |
| 105 | + |
| 106 | +For example, introducing a new SpanKind (e.g. `AMBIENT_CONSUMER`) that would have a clear `no` on the `Remote-Incoming` property of the SpanKind, and `REMOTE_CONSUMER` would have a clear `yes` on the `Remote-Incoming` property of the SpanKind. The downside of this approach is that it is a breaking on the semantics of `CONSUMER` spans. |
| 107 | + |
| 108 | +## Open questions |
| 109 | + |
| 110 | +### Relation between `parent_span_is_remote` and `SpanKind` |
| 111 | + |
| 112 | +The specification for `SpanKind` describes the following: |
| 113 | + |
| 114 | +``` |
| 115 | +The first property described by SpanKind reflects whether the Span is a "logical" remote child or parent ... |
| 116 | +``` |
| 117 | + |
| 118 | +However, the specification stay ambiguous for the `CONSUMER` span kind with respect to the property of the "logical" remote parent. |
| 119 | +Nevertheless, the proposed field `parent_span_is_remote` has some overlap with that `SpanKind` property. |
| 120 | +The specification would require some clearification on the `SpanKind` and its relation to `parent_span_is_remote`. |
| 121 | + |
| 122 | +## Future possibilities |
| 123 | + |
| 124 | +No other future changes identified. |
0 commit comments