Profiles: optimize profile stack trace representation.#651
Profiles: optimize profile stack trace representation.#651aalexand wants to merge 6 commits intoopen-telemetry:mainfrom
Conversation
In open-telemetry#645 a simplified stack trace representation is proposed. A discussion there asks for a more efficient way to represent stacks that slightly differ at the leaf. A specialized solution proposed there would put the leaf location as a separate field. That seems partial and potentially error-prone. This PR proposes an alternative stack trace representation, shaped as a tree in the form of two arrays. See the code for details and an example. The advantage of this approach is that stacks with the same prefix are encoded more compactly. Also, since we use only two arrays for the encoding, the in-memory representation is also more efficient in terms of the number of allocations required. The main disadvantage is that this approach is more complex semantically and for this reason can be more error-prone.
| // location_table = ["", "main", "foo", "bar", "baz1", "baz2"] | ||
| // | ||
| // And the stack tree arrays would be: | ||
| // (stack index) = 0 1 2 3 4 5 6 |
There was a problem hiding this comment.
Great you found the time to come up with an implementation!
Just some comments
- In the example there are 4 stack traces, but you encode 6 (index 0 ignored) because you also encode partial stacks line
mainandmain -> foo. I don't think this is required because these partial stacks can easily be constructed from the complete stacks if needed. - Add a step-by-step example to reconstruct one of the stacks.
There was a problem hiding this comment.
All those nodes are needed since they serve as parents to the child nodes, actually.
There was a problem hiding this comment.
Thanks for the clarification!
What if we use 0 as the root index instead of -1?
And additionally, use 0 in stack_location_index for non-leaf frames, which a) potentially improves compression and b) marks complete stacks?
// (index) = 0 1 2 3 4 5 6
// stack_parent_array = [ 0, 0, 1, 2, 3, 3, 2]
// stack_location_index = [ 0, 0, 0, 3, 4, 5, 5]
When constructing a stack, start at a leaf frame (index > 0) and stop at index 0?
Example for index 6
stack_location_index[6] is 5 --> "baz2"
parent_parent_array[6] is 2 --> "foo"
parent_parent_array[2] is 1 --> "main"
parent_parent_array[1] is 0 --> "" (root), stop
There was a problem hiding this comment.
Yes, there could be a convention that the node at index 0 is the nil node, instead of using -1 to indicate that. I don't have strong preference.
We can't use 0 for non-leaf frames' location ID. In your decoding example, for the nodes 2 and 1 you need to know that they correspond to the foo and main locations. If you store the zero as the code location for them then this information is lost.
There was a problem hiding this comment.
I might got it wrong. Can you provide a step-by-step example (in the proto file comment) to to properly reconstruct the stacks?
There was a problem hiding this comment.
I realized there was a mistake in the example, I corrected it.
There was a problem hiding this comment.
Here is a decoding example in Go:
package main
import (
"fmt"
)
// Stack tree encoding these stacks (the number is the stack index):
//
// main|0 -> foo|1 -> bar|2
// main|0 -> foo|1 -> bar|2 -> baz1|3
// main|0 -> foo|1 -> bar|2 -> baz2|4
// main|0 -> foo|1 -> baz2|5
var locationTable = []string{"", "main", "foo", "bar", "baz1", "baz2"}
var stackParentArray = []int{-1, 0, 1, 2, 2, 1}
var stackLocationIndex = []int{1, 2, 3, 4, 5, 5}
func main() {
stackIdx := 4
fmt.Printf("stackIdx=%d:", stackIdx)
for stackIdx != -1 {
fmt.Printf(" %s", locationTable[stackLocationIndex[stackIdx]])
stackIdx = stackParentArray[stackIdx]
}
fmt.Println()
}This outputs:
stackIdx=4: baz2 bar foo main
There was a problem hiding this comment.
It took me a few reads (and this thread helped a lot), I like this approach. Thanks for putting this together!
There was a problem hiding this comment.
It also took me a few reads. Here are some suggestions for making the idea more clear in the comments, feel free to merge @aalexand .
https://github.com/aalexand/opentelemetry-proto/pull/1/files
|
In theory, one more disadvantage of this approach is that the implementation would have to check for cycles when dealing with the parent links. In practice I think we can just require |
|
Any benchmarking available to indicate how much of a space saving we can expect from this extra complexity? |
No, but we should benchmark both this and #645. |
|
Thanks for writing this down. Looking forward to discussing it this Thursday. |
profiles: improve stack tree description
Based on internal feedback, putting the location and stack indexed lists into the same table is very confusing because they are really two separate tables. Make this clear by splitting them up into two.
|
@aalexand maybe you can also merge aalexand#2 ? I shared this PR with some people in my team, and they still struggled to follow the idea because they got confused by the location data being in the same table as the stack data ... which was indeed confusing because the values at the same index don't belong together across those two tables. |
profiles: split the stack tree ASCII table in the comment into two
Good point - merged. |
|
Here is an example implementation (buildArrays and buildStacks): rockdaboot#1 |
|
Generic implementation to encode the arrays from (generic) stacks and decode from arrays back into stacks: https://github.com/rockdaboot/go-trie |
| // Index into ProfilesData.{stack_parent_array, stack_location_index} stack | ||
| // table, representing the stack tree node of the sample. | ||
| int32 stack_index = 1; | ||
| // The type and unit of each value is defined by the corresponding |
There was a problem hiding this comment.
Sorry, I may have been unclear. There are no plans to add ability to specify different attributes to timestamps within a sample. All timestamps within a sample correspond to the same stack and attributes. What I meant is that with a single field fo referencing the stack the sizeof(Sample) is smaller.
|
See open-telemetry/opentelemetry-ebpf-profiler#524 for benchmarks that also cover this proposal. |
|
Closing since we agree to move on with #645. |
In #645 a simplified stack trace representation is proposed. A discussion there asks for a more efficient way to represent stacks that slightly differ at the leaf. A specialized solution proposed there would put the leaf location as a separate field. That seems partial and potentially error-prone.
This PR proposes an alternative stack trace representation, shaped as a tree in the form of two arrays. See the code for details and an example.
The advantage of this approach is that stacks with the same prefix are encoded more compactly. Also, since we use only two arrays for the encoding, the in-memory representation is also more efficient in terms of the number of allocations required. The main disadvantage is that this approach is more complex semantically and for this reason can be more error-prone.