Skip to content

Update experimental C/C++ implementation for process context OTEP with memfd + in-place modification#34

Merged
felixge merged 9 commits intoopen-telemetry:mainfrom
ivoanjo:ivoanjo/memfd-process-ctx
Jan 7, 2026
Merged

Update experimental C/C++ implementation for process context OTEP with memfd + in-place modification#34
felixge merged 9 commits intoopen-telemetry:mainfrom
ivoanjo:ivoanjo/memfd-process-ctx

Conversation

@ivoanjo
Copy link
Copy Markdown
Contributor

@ivoanjo ivoanjo commented Dec 18, 2025

This PR updates the experimental C/C++ implementation for the "Process Context OTEP" introduced in #23 and being proposed in open-telemetry/opentelemetry-specification#4719 with the latest updates from the upstream discussion -- open-telemetry/opentelemetry-specification@3caecfb .

Specifically:

  • Instead of always using an anonymous mapping, try first to create a memfd and create a mapping from the memfd.

    If due to security restrictions memfd is not available, fall back to an anonymous mapping instead.

  • Remove probing as a fallback for when naming a mapping fails.

    Because the name of a memfd also shows up in /proc/<pid>/maps, we expect that having memfd naming as a fallback for when prctl is not available is enough.

  • Drop 2-page size and read-only permissions on the header memory pages.

    These were intended to support the "probing as a fallback for naming failure", so they are no longer needed.

  • Introduce in-place updates to process context.

    This allows efficient updates. In particular, it makes it easier for the reader to detect updates and avoids reparsing /proc/<pid>/maps for updates.

It's not actually the previous context that failed, it's the context
that's _being built_.
By using a mmap from memfd, the `OTEL_CTX` name will show up in
/proc/pid/maps even if the naming of the mapping using `prctl` fails,
thus making finding the mapping more efficient.
As we now have memfd naming as a fallback, we don't need to to
probing (looking at mappings with a given size + flags with no name) to
find the process context.

This allows the following simplifications:
* The mapping size is no longer fixed
* The mapping is no longer set to read-only
* (Probing code gets dropped)
In-place updates are now made by using a zeroed
`otel_process_ctx_published_at_ns` field as a marker for a reader that
an update is in progress (because otherwise we'd need the published_at +
size + payload writes to be atomic).

This means the same mapping stays in place, and thus a reader that
already has found the mapping doesn't need to re-parse proc to
keep reading from it.

Here's an example of this in action:

```
 # `example_ctx` before updating...
$ sudo ./otel_process_ctx_dump.sh 53992
Found OTEL context for PID 53992
Start address: 7286ccd62000
00000000  4f 54 45 4c 5f 43 54 58  02 00 00 00 50 01 00 00  |OTEL_CTX....P...|
00000010  b0 c5 c5 98 89 4e 82 18  a0 72 8e 82 6d 5f 00 00  |.....N...r..m_..|
00000020
Parsed struct:
  otel_process_ctx_signature       : "OTEL_CTX"
  otel_process_ctx_version         : 2
  otel_process_payload_size        : 336
  otel_process_ctx_published_at_ns : 1766060356763239856 (2025-12-18 12:19:16 GMT)
  otel_process_payload             : 0x00005f6d828e72a0
Payload dump (336 bytes):
...
attributes {
  key: "service.instance.id"
  value {
    string_value: "123d8444-2c7e-46e3-89f6-6217880f7123"
  }
}
attributes {
  key: "service.name"
  value {
    string_value: "my-service"
  }
}

 # `example_ctx` after updating...
$ sudo ./otel_process_ctx_dump.sh 53992
Found OTEL context for PID 53992
Start address: 7286ccd62000 # <-- Mapping still at same address!
00000000  4f 54 45 4c 5f 43 54 58  02 00 00 00 5b 01 00 00  |OTEL_CTX....[...|
00000010  41 54 fc 7e 8e 4e 82 18  00 7e 8e 82 6d 5f 00 00  |AT.~.N...~..m_..|
00000020
Parsed struct:
  otel_process_ctx_signature       : "OTEL_CTX"
  otel_process_ctx_version         : 2
  otel_process_payload_size        : 347 # <-- Updated!
  otel_process_ctx_published_at_ns : 1766060377805444161 (2025-12-18 12:19:37 GMT) # <-- Updated!
  otel_process_payload             : 0x00005f6d828e7e00 # <-- Updated!
Payload dump (347 bytes):
...
attributes {
  key: "service.instance.id"
  value {
    string_value: "456d8444-2c7e-46e3-89f6-6217880f7456"
  }
}
attributes {
  key: "service.name"
  value {
    string_value: "my-service-updated"
  }
}
```
Copy link
Copy Markdown
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread process-context/c-and-cpp/otel_process_ctx.c Outdated
Comment thread process-context/c-and-cpp/otel_process_ctx.c Outdated
Small comment improvements

Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
@felixge felixge merged commit 74257b1 into open-telemetry:main Jan 7, 2026
1 check passed
}

// Step: Update bookkeeping
free(published_state.payload); // This was still pointing to the old payload
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it possible to free this memory? How are you sure that the readers of the previous payload have finished reading?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it possible to free this memory? How are you sure that the readers of the previous payload have finished reading?

Because readers are expected to be reading from outside the process, they will either read:

  • The old payload
  • Garbage/bits of the payload
  • Nothing, will get an error

As per the "Reading protocol" in open-telemetry/opentelemetry-specification#4719, after reading the payload bytes, they readers need to re-check the published_at_ns.

That is, let's say the reader spotted published_at_ns == 1 and the old payload. When they go to re-read the published_at_ns, they MUST observe published_at_ns > 1 because the new timestamp gets installed before we free the old payload. So they'll be able to detect "oops whatever I got is not correct, there was a concurrent update, try again".

For readers inside the process (as is this test code) this is documented in the header:

Thread-safety: This function assumes there is no concurrent mutation of the process context.

In practice, to avoid this happening in-process a user of this library could wrap a lock around every operation, thus making sure there's no concurrency between updates and other operations (e.g. reading). (We could provide such a feature of the box too -- feedback welcome?)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm not familiar enough with what readers outside the process can do, or what is harmful to them. I'm only thinking that since free is called, there is in theory nothing stopping the allocator from unmapping the page that the memory was on, and how will an external reader react to the page disappearing while it is reading from it?

Copy link
Copy Markdown
Contributor Author

@ivoanjo ivoanjo Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few different APIs for reading remote process memory -- process_vm_readv, ptrace, or reading /proc/<pid>/mem.

When using these APIs, trying to read pages that disappear/are not there anymore will get you an error as the return value for that function. In particular you won't get a SEGFAULT.

(process_vm_readv is actually quite neat since you can use it from inside the process too -- for instance we could update the reader in otel_process_ctx.c if we wanted to make sure it could never segfault even in the face of concurrency in the update; the downside is that it can be disabled/blocked on some container setups so a regular user process can't always rely on it being there)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Then I'm all good with the approach.

ivoanjo added a commit to ivoanjo/proc-level-demo that referenced this pull request Feb 3, 2026
This makes the Java prototype match the current version of
open-telemetry/sig-profiling#34 as well as
the upstream spec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants