Skip to content

Profiles: simplify profile stack trace representation#645

Closed
felixge wants to merge 1 commit intoopen-telemetry:mainfrom
felixge:simpler-stack-trace-v2
Closed

Profiles: simplify profile stack trace representation#645
felixge wants to merge 1 commit intoopen-telemetry:mainfrom
felixge:simpler-stack-trace-v2

Conversation

@felixge
Copy link
Copy Markdown
Member

@felixge felixge commented Apr 17, 2025

Changes

  • Introduce a first-class Stack message type and lookup table.
  • Replace location index range based stack trace encoding on Sample with a single stack_index reference.
  • Remove the location_indices lookup table.

Motivation

tl;dr: Let's KISS. The stack trace encoding proposed in this PR is competitive with more advanced stack trace encoding formats, but much simpler to implement. See table below or @christos68k's benchmark for details.

Uncompressed vs Dedup Compressed vs Dedup
ExportDedup (current) 688709 0.0% 277955 0.0%
ExportStacks (this PR) 703704 2.8% 277713 -0.1%
ExportArrays (PR #651) 602920 -12.5% 305235 9.8%

The current stack trace encoding is somewhat complex as it tries to optimize for CPU profiling stack traces where it's relatively common to see prefixes of a stack trace.

For example, a program that looks like this ...

func a() { b() }
func b() { c() }

func c() {
	for {
		d()
		e()
	}
}

... will produce a CPU profile with stack traces like the ones shown below. The number indicate samples taken at different locations (program counters) within the same function:

a;b;c;d1
a;b;c;d2
a;b;c;d3
...
a;b;c;e1
a;b;c;e2
a;b;c;e3
...
a;b;c

This makes it attractive to optimize away the prefix repetition (a;b;c) in the stack trace dictionary table. But the current format doesn't quite succeed in this, as it would only allow us to get rid of the last stack trace in the table. This probably explains why it doesn't end up performing very differently from the encoding proposed in this PR.

The alternative double array proposal from @aalexand achieves over 12% reduction in profile size before compression, because it can exploit this pattern effectively. However, for some reason the resulting data patterns end up being much harder to compress, and the encoding loses by almost 10% after compression has been applied.

Last but not least, not all profiling data will benefit from prefix-encoding as much as CPU profiles. For example memory allocations and especially Off-CPU stack traces have significantly less prefix repetition. We don't have a good benchmark for this, but I hope this argument is easy to follow regardless.

Given the above, I'm proposing to chose the most trivial stack trace encoding format as outlined in this PR as it seems to hit a sweet spot between complexity and efficiency. The main tasks for reviewers is to validate benchmarks from @christos68k's and confirm if they're okay with this direction.

It seems like we had rough consensus around this change in previous SIG meetings.

Comment thread opentelemetry/proto/profiles/v1development/profiles.proto
Copy link
Copy Markdown
Member

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially mixed reports of profiles, e.g. on- and off-CPU profiles, will benefit from this approach.

Comment thread opentelemetry/proto/profiles/v1development/profiles.proto Outdated
aalexand added a commit to aalexand/opentelemetry-proto that referenced this pull request May 6, 2025
In open-telemetry#645 a simplified stack trace representation is proposed. A
discussion there asks for a more efficient way to represent stacks that
slightly differ at the leaf. A specialized solution proposed there would
put the leaf location as a separate field. That seems partial and
potentially error-prone.

This PR proposes an alternative stack trace representation, shaped as
a tree in the form of two arrays. See the code for details and an
example.

The advantage of this approach is that stacks with the same prefix are
encoded more compactly. Also, since we use only two arrays for the
encoding, the in-memory representation is also more efficient in terms
of the number of allocations required. The main disadvantage is that
this approach is more complex semantically and for this reason can be
more error-prone.
@reyang reyang changed the title Simplify profile stack trace representation Profiles: simplify profile stack trace representation May 21, 2025
@christos68k
Copy link
Copy Markdown
Member

See open-telemetry/opentelemetry-ebpf-profiler#524 for benchmarks that also cover this proposal.

- Introduce a first-class Stack message type and lookup table.
- Replace location index range based stack trace encoding on Sample with
  a single stack_index reference.
- Remove the location_indices lookup table.

The primary motivation is laying the ground work for [timestamp based
profiling][timestamp proposal] where the same stack trace needs to be
referenced much more frequently compared to aggregation based on low
cardinality attributes.

Timestamp based profiling is also expected to be used with the the
upcoming [Off-CPU profiling][off-cpu pr] feature in the eBPF profiler.
Off-CPU stack traces have a different distribution compared to CPU
samples. In particular stack traces are much more repetitive because
they only occur at call sites such as syscalls. For the same reason it
is also uncommon to see a stack trace are a root-prefix of a previously
observed stack trace.

We might need to revisit the previous [previous benchmarks][benchmarks]
to confirm these claims.

The secondary motivation is simplicitly. Arguably the proposed change
here will make it easier to write exporters, processors as well as
receivers.

It seems like we had rough consensus around this change in previous SIG
meetings, and it seems like a good incremental step to make progress on
the timestamp proposal.

[timestamp proposal]: open-telemetry#594
[off-cpu pr]: open-telemetry/opentelemetry-ebpf-profiler#196
[benchmarks]: https://docs.google.com/spreadsheets/d/1Q-6MlegV8xLYdz5WD5iPxQU2tsfodX1-CDV1WeGzyQ0/edit?gid=2069300294#gid=2069300294
@felixge felixge force-pushed the simpler-stack-trace-v2 branch from 52ad266 to d972082 Compare July 10, 2025 14:30
@felixge felixge marked this pull request as ready for review July 24, 2025 14:50
@felixge felixge requested review from a team July 24, 2025 14:50
@felixge
Copy link
Copy Markdown
Member Author

felixge commented Jul 24, 2025

This PR has been rebased and the description has been updated. It's ready for another round of review.

bool has_inline_frames = 9;
}

// A Stack represents a stack trace as a list of locations. The first location
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add this to and update the helper ascii graph at the beginning of the file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened felixge#2

@tigrannajaryan
Copy link
Copy Markdown
Member

Please resolve comments so that I can merge.

bool has_inline_frames = 9;
}

// A Stack represents a stack trace as a list of locations. The first location
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "The first location is the leaf frame" - would it make sense to put this in the field comment rather than message?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this here.

repeated AttributeUnit attribute_units = 7;

// Stacks referenced by samples via Sample.stack_index.
repeated Stack stack_table = 8;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe group stack_table with location_table and friends and keep attribute_units last since it's somewhat special in semantics?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess with #672 it's less special .

@tigrannajaryan
Copy link
Copy Markdown
Member

We are planning to release 1.8.0. Do you need this merged before that?

@christos68k
Copy link
Copy Markdown
Member

We are planning to release 1.8.0. Do you need this merged before that?

Before if possible, otherwise we'll need to trigger another release very soon.

@tigrannajaryan
Copy link
Copy Markdown
Member

We are planning to release 1.8.0. Do you need this merged before that?

Before if possible, otherwise we'll need to trigger another release very soon.

OK, in that case please address / close open comments and resolve the merge conflicts so that this PR can be merged. We can wait a day or so but if it is going to take much longer then it will have to wait until the release after that.

@christos68k
Copy link
Copy Markdown
Member

We are planning to release 1.8.0. Do you need this merged before that?

Before if possible, otherwise we'll need to trigger another release very soon.

OK, in that case please address / close open comments and resolve the merge conflicts so that this PR can be merged. We can wait a day or so but if it is going to take much longer then it will have to wait until the release after that.

@felixge will probably be busy until the end of this week so I'm guessing we won't be able to resolve conflicts and merge #645, but I've opened #708 which is essentially #645 with updates and hopefully we can get that in before 1.8.0 is out.

@tigrannajaryan
Copy link
Copy Markdown
Member

We are planning to release 1.8.0. Do you need this merged before that?

Before if possible, otherwise we'll need to trigger another release very soon.

OK, in that case please address / close open comments and resolve the merge conflicts so that this PR can be merged. We can wait a day or so but if it is going to take much longer then it will have to wait until the release after that.

@felixge will probably be busy until the end of this week so I'm guessing we won't be able to resolve conflicts and merge #645, but I've opened #708 which is essentially #645 with updates and hopefully we can get that in before 1.8.0 is out.

If you have a particular reason this change has to go together with some previous ones we can wait a few more days. If it doesn't need to be released together then we can release 1.8.0 now and then will do 1.9.0 immediate after your change is merged. Releases are not hard and we don't time them to any particular cadence, we can release when we want.

@christos68k
Copy link
Copy Markdown
Member

If you have a particular reason this change has to go together with some previous ones we can wait a few more days. If it doesn't need to be released together then we can release 1.8.0 now and then will do 1.9.0 immediate after your change is merged. Releases are not hard and we don't time them to any particular cadence, we can release when we want.

There's no special reason other than speeding up our roadmap (and also updating the profiling agent to include all recent protocol changes) but being able to do another 1.9.0 release soon is great and therefore I see no reason to delay 1.8.0.

@pellared
Copy link
Copy Markdown
Member

@christos68k, should this PR be closed given #708 is merged?

@christos68k
Copy link
Copy Markdown
Member

Closing, @tigrannajaryan has merged #708 which contains the changes in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants