reporter: compute profile duration using the time during which the profiler was collecting samples by Gandem · Pull Request #1129 · open-telemetry/opentelemetry-ebpf-profiler

Gandem · 2026-01-28T12:07:20Z

previously, we were reporting the profile duration based on sampled events for specific resources. this means that idle resources could end up with much smaller profile durations that would not reflect the actual utilization, compared to busy resources.

more generally, using the sample timestamps to compute the duration could lead to very small durations for mostly idle hosts, which would be inaccurate if the profiler is running for much longer.

this change proposes setting the duration to the time during which the profiler was running / collecting samples rather than samples timestamp.

…ofiler was collecting samples previously, we were reporting the profile duration based on sampled events for specific resources. this means that idle resources could end up with much smaller profile durations that would not reflect the actual utilization, compared to busy resources. more generally, using the sample timestamps to compute the duration leads to very small duration for idle hosts, which would be inaccurate if the profile is running for much longer. this change proposes setting the duration to the time during which the profiler was running / collecting samples rather than samples timestamp.

christos68k · 2026-01-28T16:27:04Z

 	*traceEventsPtr = newEvents
+	collectionEndTime := time.Now()
+	collectionStartTime := r.collectionStartTime
+	r.collectionStartTime = collectionEndTime


Since Sample timestamps are asynchronously created in the kernel, and this part of the reporter isn't synchronized with the tracer subsystem in any way, do we have a race here where it's possible for Samples with timestamps < collectionStartTime to appear in a profile which would violate the protocol?

Good point, thanks for catching this. I'll work on a patch to update the start time to the minimum between collectionStartTime and the smallest timestamp in samples. This way the invariant should always be true.
Let me know if you would prefer fixing this in another way 🙏

~~This should be fixed in 8a9c988.~~ Did a slightly different implementation and added tests in fb57048 🙏

Thanks! Looking deeper into this, I think this will not be sufficient as there's still the possibility that a trace with timestamp < tmin where tmin is the mininimum timestamp in a reporter batch will show up in the next batch.

However, I'm not sure if this is a protocol violation: We don't violate the invariant of having all Sample timestamps fall within the Profile duration, but there's a possibility that Profiles will have start times that go back in time (from one to the next). Even if we haven't made it explicit, it's probably best not to violate the implicit contract that reported Profiles from the same profiler are strictly linear in time (though clock changes can mess with that, and we do periodically sync with the realtime clock even if our base timestamps are monotonic).

To better understand this issue (that was pre-existing, you didn't introduce it with this PR) see my comment here.

In short, the tracer doesn't sort trace events by timestamp and also doesn't batch, so there are no ordering guarantees (either between traces or between subsequent batches) as events are sent from tracer to processmanager and to the reporter. That opens a window where a trace event can show up with a timestamp that's smaller than the minimum timestamp of the previously processed 'batch'. We've had to deal with this issue when cleaning up process metadata (e.g. for a process that exited) while still allowing for in-flight traces to be symbolized.

We came up with the ProcessedUntil mechanism that lets the rest of the agent know that trace events up to a particular time have been processed. This allows us to make decisions that depend on ordering without enforcing explicit ordering of events. The price we pay is a slight increase in processing latency (that does not compromise trace fidelity, as trace timestamps stay intact).

I think the same mechanism can be used to fix this issue in the reporter but it'll probably require some changes. So the options I see for now:

We review and merge this PR and I open an issue with follow-up work

We amend this PR to fix the issue in one go (using ProcessedUntil or another mechanism)

Let me know what you prefer.

EDIT: Thinking about it some more, I would go with 1. for now unless you feel strongly about 2.

@christos68k Thanks a lot for the context and thorough review 🙇 !!

I thought about this a bit, and my understanding is that we have the following constraints:

All sample timestamps >= profile start time.

No dropped samples.

Profile duration is based on the wall clock time (i.e. close to report interval).

Monotonic profile start times.

The fundamental issue is that it's always theoretically possible to receive samples asynchronously from per-CPU buffers, so in the following scenario:

At wall-clock T1, we send Profile A containing samples with timestamps [t3, t5, t7]. To satisfy "samples >= start", we set start t3.

At wall-clock T2, we're preparing Profile B. A late sample with timestamp t2 arrives (it was buffered in a different CPU's buffer).

We're forced with the following options:

Drop the late sample: violates "no dropped samples" (2.)

Wait for late samples and include all samples in a single profile: violates "wall-clock duration" (3.)

Include t2 in Profile B: Either violates (1.) if we don't adjust start time, or violates (4.) if we do.

So ultimately there is some tension between all these constraints and we need to treat at least one of them as best-effort.

I've tried hacking a tentative implementation here that uses ProcessedUntil in a best-effort fashion to reduce the likelihood of (4.) (so keeping the same tradeoffs as this current PR).

My thinking was that constraints (1), (2), and (3) are protocol/correctness requirements, while (4) is more of a "nice to have" for downstream consumers. But maybe we could accept dropped samples if we think the number of samples dropped would be low. In practice, I'd expect late-arriving samples to be rare since ProcessedUntil typically advances within one poll interval (~250ms), so the window for "too late" samples is small.

The implementation above creates a "pending" report and defers sending until ProcessedUntil confirms that samples up to that kernel time have been delivered. This gives late-arriving samples (from per-CPU reordering) a chance to be included in the pending report.

A couple notes:

This required dealing with what happens if ProcessedUntil doesn't arrive before the next report interval. The implementation above sends the report anyway as a timeout fallback.

The time passed to ProcessedUntil isn't strictly guaranteed to increase monotonically (due to per-CPU ring buffer reordering). I used a monotonically increasing maxProcessedUntilKTime watermark to work around this.

I've chosen to adjust the start time of the pending report in case we receive a sample that is older than the start time of the pending report, to ensure constraints (1.) and (2.) are met, but we could also drop that sample if we want to relax constraint (2.) as opposed to (4.)

All in all, it looks like this question has non-trivial design / implementation consequences, so we should probably align first on which constraint we want to relax, and what high level implementation / design we're willing to consider.

Based on that, I'd personally also vote for merging this PR as is, and opening an issue to discuss this further 🙌

As you wrote, I think we can maintain 1, 2 and 3 (with 1 and 2 being hard guarantees).

Regarding 4, there are two sources of potential discontinuities:

In the default configuration we periodically sync with the realtime clock (Sample timestamps are monotonic in the kernel but we add a delta to convert them to UnixTime, and that delta is not calculated once for the life of the agent process, but periodically). This means that system clock changes can result in Profile start time discontinuities (expected to be rare). However, this is a configuration option (clock-sync-interval) and the user may decide to switch it off, in which case Sample timestamps will be strictly monotonic for the life of the agent process.

Per-CPU event reordering: I think we should be able to deal with this by slightly relaxing 3. What I had in mind is something similar to what you described (have reporter buffer trace events and calculate reported batches based on ProcessedUntil). IIRC the current logic guarantees that the ktime passed to ProcessedUntil is monotonically increasing but I need to revisit and swap a lot of that context back in.

In the meantime, I'll merge this PR as discussed! Thanks!

EDIT: Opened #1142 to track this issue

this should ensure that the profile start time is always earlier than the earliest sample, even when accountting for buffered samples. also add tests to validate the adjustement behaviour.

…ofiler was collecting samples (open-telemetry#1129)

Gandem requested review from a team as code owners January 28, 2026 12:07

florianl approved these changes Jan 28, 2026

View reviewed changes

christos68k reviewed Jan 28, 2026

View reviewed changes

reporter: adjust start time backwards

fb57048

this should ensure that the profile start time is always earlier than the earliest sample, even when accountting for buffered samples. also add tests to validate the adjustement behaviour.

Gandem force-pushed the nayef/fix-duration branch from 8a9c988 to fb57048 Compare January 29, 2026 07:52

Merge branch 'main' into nayef/fix-duration

b3a8107

christos68k approved these changes Jan 29, 2026

View reviewed changes

christos68k merged commit 3871347 into open-telemetry:main Jan 30, 2026
15 of 16 checks passed

christos68k mentioned this pull request Jan 30, 2026

Ensure reported Profiles have monotonically increasing start times #1142

Open

Gandem added a commit to DataDog/opentelemetry-ebpf-profiler that referenced this pull request Jan 30, 2026

reporter: compute profile duration using the time during which the pr…

b19b4b8

…ofiler was collecting samples (open-telemetry#1129)

gnurizen mentioned this pull request Mar 9, 2026

Optimize distro QEMU tests to be more efficient parca-dev/opentelemetry-ebpf-profiler#228

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reporter: compute profile duration using the time during which the profiler was collecting samples#1129

reporter: compute profile duration using the time during which the profiler was collecting samples#1129
christos68k merged 3 commits into
open-telemetry:mainfrom
DataDog:nayef/fix-duration

Gandem commented Jan 28, 2026

Uh oh!

christos68k Jan 28, 2026 •

edited

Loading

Uh oh!

Gandem Jan 28, 2026 •

edited

Loading

Uh oh!

Gandem Jan 28, 2026 •

edited

Loading

Uh oh!

christos68k Jan 29, 2026 •

edited

Loading

Uh oh!

Gandem Jan 29, 2026 •

edited

Loading

Uh oh!

christos68k Jan 30, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Gandem commented Jan 28, 2026

Uh oh!

christos68k Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gandem Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gandem Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gandem Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christos68k Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

christos68k Jan 28, 2026 •

edited

Loading

Gandem Jan 28, 2026 •

edited

Loading

Gandem Jan 28, 2026 •

edited

Loading

christos68k Jan 29, 2026 •

edited

Loading

Gandem Jan 29, 2026 •

edited

Loading

christos68k Jan 30, 2026 •

edited

Loading