Skip to content

OTEP: correlating OBI traces to profiles#4855

Closed
mmat11 wants to merge 3 commits into
open-telemetry:mainfrom
mmat11:matt/profile-trace-obi-otep
Closed

OTEP: correlating OBI traces to profiles#4855
mmat11 wants to merge 3 commits into
open-telemetry:mainfrom
mmat11:matt/profile-trace-obi-otep

Conversation

@mmat11
Copy link
Copy Markdown

@mmat11 mmat11 commented Jan 27, 2026

Changes

This OTEP introduces a standard communication channel and a specification for correlating profiles to opentelemetry-ebpf-instrumentation (OBI) traces.

There is a PoC implementation at open-telemetry/opentelemetry-ebpf-instrumentation#1184 and coralogix/opentelemetry-ebpf-profiler#1

  • Related issues #
  • Related OTEP(s) #212
  • Links to the prototypes (when adding or changing features)
  • CHANGELOG.md file updated for non-trivial changes
    • For trivial changes, include [chore] in the PR title to skip the changelog check
  • Spec compliance matrix updated if necessary

Opening as Draft for submission to the relevant SIGs first

Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
@mmat11 mmat11 force-pushed the matt/profile-trace-obi-otep branch from d88793d to 5c20f4e Compare January 27, 2026 15:50
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md
Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
@mmat11 mmat11 marked this pull request as ready for review January 29, 2026 23:45
@mmat11 mmat11 requested review from a team as code owners January 29, 2026 23:45
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md Outdated
Comment thread oteps/profiles/4855-profiles-obi-correlation.md
Signed-off-by: Mattia Meleleo <mattia.meleleo@coralogix.com>
Copy link
Copy Markdown
Contributor

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This is quite nimble and makes a lot of sense!

@@ -0,0 +1,55 @@
# Correlating Profiles to OBI Traces
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to provide additional context (e.g. links if context exists somewhere else) for correlation in general:

  • why OBI traces are targeted here and not the SDK traces?
  • how/if OBI traces are correlated to SDK traces?
  • how SDK traces are correlated to profiles

Also, what happens when application is getting instrumented and which mechanisms would win?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why OBI traces are targeted here and not the SDK traces?

there are other proposals which targets SDK traces specifically, see: #4719 and https://docs.google.com/document/d/1eatbHpEXXhWZEPrXZpfR58-5RIx-81mUgF69Zpn3Rz4/edit?tab=t.0#heading=h.fvztn3xtjxxm

how/if OBI traces are correlated to SDK traces?

I don't think they are, they have different ways of doing the same thing (context propagation) and if a trace exists, it will be ideally inherited both by OBI and SDKs

how SDK traces are correlated to profiles

this OTEP is OBI specific, for SDKs I believe the work/proposal is located here: https://docs.google.com/document/d/1eatbHpEXXhWZEPrXZpfR58-5RIx-81mUgF69Zpn3Rz4/edit?tab=t.0#heading=h.fvztn3xtjxxm

Also, what happens when application is getting instrumented and which mechanisms would win?

I think it should be something like:

trace_ctx = try_sdk()
if !trace_ctx:
    trace_ctx = try_obi()

or viceversa; if both sources have a trace context and it differs, perhaps it's a bug

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 I'm one of the authors of the gdoc mentioned -- we plan to turn it into an OTEP soon. The TL;DR of why we need OBI specifics is that due to the way OBI works it's awkward to use the same mechanism we're planning for regular SDKs and vice-versa.

See also #4719 (comment) for more on this discussion.

Copy link
Copy Markdown
Member

@christos68k christos68k Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova We discussed the priority/fidelity (who wins?) issue at today's Profiling SIG (also previously raised by me in the proposed implementation here)

Given that this OTEP focuses on a specific technical solution (fast data exchange between two OTel components), should we attempt to answer the priority/fidelity question here or somewhere more general? In today's SIG, we discussed that the same clash can be present in the rest of OTel, e.g. if one has multiple layers of instrumentation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having more context in the OTEP is helpful and increases chances of people reviewing it.
It's important to understand the implications of this otep on general correlation between obi, sdk, and profiling.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important to understand the implications of this otep on general correlation between obi, sdk, and profiling.

In my opinion it belongs in a more general document as it would dictate the correlation/priority/etc. of generic readers (eg. the profiler) with writers (eg. OBI, SDKs). Let me know if should I add (or clarify) something OBI specific

Comment on lines +52 to +53
u8 trace_id[16];
u8 span_id[8];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we don't type these as

Suggested change
u8 trace_id[16];
u8 span_id[8];
u64 trace_id[2];
u64 span_id;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. These are opaque 16/8-byte identifiers, not integers. The canonical form (including traceparent) is defined over raw bytes, and we manipulate them as bytes in eBPF and in Go ([16]byte / [8]byte). Using u8[16] / u8[8] keeps the in-memory layout identical to the wire format and the userspace representation.

If we type them as u64 / u64[2], we introduce endianness concerns. We'd have to define hi/lo ordering and a canonical byte order, and consistently convert at every kernel <-> userspace boundary and during header encoding. That's unnecessary complexity and an easy source of subtle cross-arch bugs, especially since the protocol is byte-ordered, not integer-ordered.

We don't perform arithmetic on these values, so there's no real benefit to representing them as integers. As a side note, keeping them as byte arrays also avoids type-punning/strict-aliasing issues from casting between u64* and byte buffers.

} traces_ctx_v1 SEC(".maps");
```

- **Key:** `(u64)pid_tgid`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Go where goroutines are multiplexed on OS threads? Will this hold a goroutine id instead?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The map will still be indexed by pid_tgid. OBI hooks the Go runtime (see: https://github.com/open-telemetry/opentelemetry-ebpf-instrumentation/blob/main/bpf/gotracer/go_runtime.c#L197) and keeps track of the goroutine statuses, whenever a goroutine starts doing work, this map is then updated to keep the context for the "currently executing work" goroutine.

Something similar happens for other languages with different async primitives

@github-actions
Copy link
Copy Markdown

This PR was marked stale. It will be closed in 14 days without additional activity.

@github-actions github-actions Bot added the Stale label Mar 28, 2026
@mmat11
Copy link
Copy Markdown
Author

mmat11 commented Mar 30, 2026

As discussed in the SIG spec, this will be moved to OBI repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants