Update experimental C/C++ implementation for process context OTEP with memfd + in-place modification#34
Conversation
It's not actually the previous context that failed, it's the context that's _being built_.
By using a mmap from memfd, the `OTEL_CTX` name will show up in /proc/pid/maps even if the naming of the mapping using `prctl` fails, thus making finding the mapping more efficient.
As we now have memfd naming as a fallback, we don't need to to probing (looking at mappings with a given size + flags with no name) to find the process context. This allows the following simplifications: * The mapping size is no longer fixed * The mapping is no longer set to read-only * (Probing code gets dropped)
In-place updates are now made by using a zeroed
`otel_process_ctx_published_at_ns` field as a marker for a reader that
an update is in progress (because otherwise we'd need the published_at +
size + payload writes to be atomic).
This means the same mapping stays in place, and thus a reader that
already has found the mapping doesn't need to re-parse proc to
keep reading from it.
Here's an example of this in action:
```
# `example_ctx` before updating...
$ sudo ./otel_process_ctx_dump.sh 53992
Found OTEL context for PID 53992
Start address: 7286ccd62000
00000000 4f 54 45 4c 5f 43 54 58 02 00 00 00 50 01 00 00 |OTEL_CTX....P...|
00000010 b0 c5 c5 98 89 4e 82 18 a0 72 8e 82 6d 5f 00 00 |.....N...r..m_..|
00000020
Parsed struct:
otel_process_ctx_signature : "OTEL_CTX"
otel_process_ctx_version : 2
otel_process_payload_size : 336
otel_process_ctx_published_at_ns : 1766060356763239856 (2025-12-18 12:19:16 GMT)
otel_process_payload : 0x00005f6d828e72a0
Payload dump (336 bytes):
...
attributes {
key: "service.instance.id"
value {
string_value: "123d8444-2c7e-46e3-89f6-6217880f7123"
}
}
attributes {
key: "service.name"
value {
string_value: "my-service"
}
}
# `example_ctx` after updating...
$ sudo ./otel_process_ctx_dump.sh 53992
Found OTEL context for PID 53992
Start address: 7286ccd62000 # <-- Mapping still at same address!
00000000 4f 54 45 4c 5f 43 54 58 02 00 00 00 5b 01 00 00 |OTEL_CTX....[...|
00000010 41 54 fc 7e 8e 4e 82 18 00 7e 8e 82 6d 5f 00 00 |AT.~.N...~..m_..|
00000020
Parsed struct:
otel_process_ctx_signature : "OTEL_CTX"
otel_process_ctx_version : 2
otel_process_payload_size : 347 # <-- Updated!
otel_process_ctx_published_at_ns : 1766060377805444161 (2025-12-18 12:19:37 GMT) # <-- Updated!
otel_process_payload : 0x00005f6d828e7e00 # <-- Updated!
Payload dump (347 bytes):
...
attributes {
key: "service.instance.id"
value {
string_value: "456d8444-2c7e-46e3-89f6-6217880f7456"
}
}
attributes {
key: "service.name"
value {
string_value: "my-service-updated"
}
}
```
Small comment improvements Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
| } | ||
|
|
||
| // Step: Update bookkeeping | ||
| free(published_state.payload); // This was still pointing to the old payload |
There was a problem hiding this comment.
How is it possible to free this memory? How are you sure that the readers of the previous payload have finished reading?
There was a problem hiding this comment.
How is it possible to free this memory? How are you sure that the readers of the previous payload have finished reading?
Because readers are expected to be reading from outside the process, they will either read:
- The old payload
- Garbage/bits of the payload
- Nothing, will get an error
As per the "Reading protocol" in open-telemetry/opentelemetry-specification#4719, after reading the payload bytes, they readers need to re-check the published_at_ns.
That is, let's say the reader spotted published_at_ns == 1 and the old payload. When they go to re-read the published_at_ns, they MUST observe published_at_ns > 1 because the new timestamp gets installed before we free the old payload. So they'll be able to detect "oops whatever I got is not correct, there was a concurrent update, try again".
For readers inside the process (as is this test code) this is documented in the header:
Thread-safety: This function assumes there is no concurrent mutation of the process context.
In practice, to avoid this happening in-process a user of this library could wrap a lock around every operation, thus making sure there's no concurrency between updates and other operations (e.g. reading). (We could provide such a feature of the box too -- feedback welcome?)
There was a problem hiding this comment.
So I'm not familiar enough with what readers outside the process can do, or what is harmful to them. I'm only thinking that since free is called, there is in theory nothing stopping the allocator from unmapping the page that the memory was on, and how will an external reader react to the page disappearing while it is reading from it?
There was a problem hiding this comment.
There's a few different APIs for reading remote process memory -- process_vm_readv, ptrace, or reading /proc/<pid>/mem.
When using these APIs, trying to read pages that disappear/are not there anymore will get you an error as the return value for that function. In particular you won't get a SEGFAULT.
(process_vm_readv is actually quite neat since you can use it from inside the process too -- for instance we could update the reader in otel_process_ctx.c if we wanted to make sure it could never segfault even in the face of concurrency in the update; the downside is that it can be disabled/blocked on some container setups so a regular user process can't always rely on it being there)
There was a problem hiding this comment.
Nice. Then I'm all good with the approach.
This makes the Java prototype match the current version of open-telemetry/sig-profiling#34 as well as the upstream spec.
This PR updates the experimental C/C++ implementation for the "Process Context OTEP" introduced in #23 and being proposed in open-telemetry/opentelemetry-specification#4719 with the latest updates from the upstream discussion -- open-telemetry/opentelemetry-specification@3caecfb .
Specifically:
Instead of always using an anonymous mapping, try first to create a memfd and create a mapping from the memfd.
If due to security restrictions memfd is not available, fall back to an anonymous mapping instead.
Remove probing as a fallback for when naming a mapping fails.
Because the name of a memfd also shows up in
/proc/<pid>/maps, we expect that havingmemfdnaming as a fallback for whenprctlis not available is enough.Drop 2-page size and read-only permissions on the header memory pages.
These were intended to support the "probing as a fallback for naming failure", so they are no longer needed.
Introduce in-place updates to process context.
This allows efficient updates. In particular, it makes it easier for the reader to detect updates and avoids reparsing
/proc/<pid>/mapsfor updates.