Skip to content

Commit 6ccfdce

Browse files
authored
Export SpanContext.IsRemote in OTLP (open-telemetry#182)
* Introduce remote-parent OTEP
1 parent 83b053f commit 6ccfdce

File tree

1 file changed

+124
-0
lines changed

1 file changed

+124
-0
lines changed

oteps/0182-otlp-remote-parent.md

+124
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
# Export SpanContext.IsRemote in OTLP
2+
3+
Update OTLP to indicate whether a span's parent is remote.
4+
5+
## Motivation
6+
7+
It is sometimes useful to post-process or visualise only entry-point spans: spans which either have no parent (trace roots), or which have a remote parent.
8+
For example, the Elastic APM solution highlights entry-point spans (Elastic APM refers to these as "transactions") and surfaces these as top-level operations
9+
in its user interface.
10+
11+
The goal is to identify the spans which represent a request that is entering a service, or originating within a service, without having to first assemble the
12+
complete distributed trace as a DAG (Directed Acyclic Graph). It is trivially possible to identify trace roots, but it is not possible to identify spans with
13+
remote parents.
14+
15+
Here is a contrived example distributed trace, with a border added to the entry-point spans:
16+
17+
```mermaid
18+
graph TD
19+
subgraph comments_service
20+
POST_comments(POST /comment)
21+
POST_comments --> comments_send(comments send)
22+
end
23+
24+
subgraph auth_service
25+
POST_comments --> POST_auth(POST /auth)
26+
POST_auth --> LDAP
27+
end
28+
29+
subgraph user_details_service
30+
POST_comments --> GET_user_details(GET /user_details)
31+
GET_user_details --> SELECT_users(SELECT FROM users)
32+
end
33+
34+
subgraph comments_inserter
35+
comments_send --> comments_receive(comments receive)
36+
comments_receive --> comments_process(comments process)
37+
comments_process --> INSERT_comments(INSERT INTO comments)
38+
end
39+
40+
style POST_comments stroke-width:4
41+
style POST_auth stroke-width:4
42+
style GET_user_details stroke-width:4
43+
style comments_receive stroke-width:4
44+
```
45+
46+
## Explanation
47+
48+
The OTLP encoding for spans has a boolean `parent_span_is_remote` field for identifying whether a span's parent is remote or not.
49+
All OpenTelemetry SDKs populate this field, and backends may use it to identify a span as being an entry-point span.
50+
A span can be considered an entry-point span if it has no parent (`parent_span_id` is empty), or if `parent_span_is_remote` is true.
51+
52+
## Internal details
53+
54+
The first part would be to update the trace protobuf, adding a `boolean parent_span_is_remote` field to the
55+
[`Span` message](https://github.com/open-telemetry/opentelemetry-proto/blob/b43e9b18b76abf3ee040164b55b9c355217151f3/opentelemetry/proto/trace/v1/trace.proto#L84).
56+
57+
[`SpanContext.IsRemote`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#isremote) identifies whether span context has been propagated from a remote parent.
58+
The OTLP exporter in each SDK would need to be updated to record this in the new `parent_span_is_remote` field.
59+
60+
For backwards compatibility with older OTLP versions, the protobuf field should be `nullable` (`true`, `false`, or unspecified)
61+
and the opentelemetry-collector protogen code should provide an API that enables backend exporters to identify whether the field is set.
62+
63+
```go
64+
package pdata
65+
66+
// ParentSpanIsRemote indicates whether ms's parent span is remote, if known.
67+
// If the parent span remoteness property is known then the "ok" result will be true,
68+
// and false otherwise.
69+
func (ms Span) ParentSpanIsRemote() (remote bool, ok bool)
70+
```
71+
72+
## Trade-offs and mitigations
73+
74+
None identified.
75+
76+
## Prior art and alternatives
77+
78+
### Alternative 1: include entry-point span ID in other spans
79+
80+
As an alternative to identifying whether the parent span is remote, we could instead encode and propagate the ID of the entry-point span in all non entry-point spans.
81+
Thus we can identify entry-point spans by lack of this field.
82+
83+
The entry-point span ID would be captured when starting a span with a remote parent, and propagated through `SpanContext`. We would introduce a new `entry_span_id` field to
84+
the `Span` protobuf message definition, and set it in OTLP exporters.
85+
86+
This was originally [proposed in OpenCensus](https://github.com/census-instrumentation/opencensus-specs/issues/229) with no resolution.
87+
88+
The drawbacks of this alternative are:
89+
90+
- `SpanContext` would need to be extended to include the entry-point span ID; SDKs would need to be updated to capture and propagate it
91+
- The additional protobuf field would be an additional 8 bytes, vs 1 byte for the boolean field
92+
93+
The main benefit of this approach is that it additionally enables backends to group spans by their process subgraph.
94+
95+
### Alternative 2: introduce a semantic convention attribute to identify entry-point spans
96+
97+
As an alternative to adding a new field to spans, a new semantic convention attribute could be added to only entry-point spans.
98+
99+
This approach would avoid increasing the memory footprint of all spans, but would have a greater memory footprint for entry-point spans.
100+
The benefit of this approach would therefore depend on the ratio of entry-point to internal spans, and may even be more expensive.
101+
102+
### Alternative 3: extend SpanKind values
103+
104+
Another alternative is to extend the SpanKind values to unambiguously define when a CONSUMER span has a remote parent or a local parent (e.g. with the message polling use case).
105+
106+
For example, introducing a new SpanKind (e.g. `AMBIENT_CONSUMER`) that would have a clear `no` on the `Remote-Incoming` property of the SpanKind, and `REMOTE_CONSUMER` would have a clear `yes` on the `Remote-Incoming` property of the SpanKind. The downside of this approach is that it is a breaking on the semantics of `CONSUMER` spans.
107+
108+
## Open questions
109+
110+
### Relation between `parent_span_is_remote` and `SpanKind`
111+
112+
The specification for `SpanKind` describes the following:
113+
114+
```
115+
The first property described by SpanKind reflects whether the Span is a "logical" remote child or parent ...
116+
```
117+
118+
However, the specification stay ambiguous for the `CONSUMER` span kind with respect to the property of the "logical" remote parent.
119+
Nevertheless, the proposed field `parent_span_is_remote` has some overlap with that `SpanKind` property.
120+
The specification would require some clearification on the `SpanKind` and its relation to `parent_span_is_remote`.
121+
122+
## Future possibilities
123+
124+
No other future changes identified.

0 commit comments

Comments
 (0)