Increase frame buffer to max 1024 frames per trace#908
Increase frame buffer to max 1024 frames per trace#908dalehamel wants to merge 1 commit intoopen-telemetry:mainfrom
Conversation
|
+1 this is quite valuable to improving the Ruby profiling out-of-the-box usability. Our experience at Datadog for our Ruby profiler matches @dalehamel's experience of Ruby apps consistently having deeper stacks than most other languages/runtimes. This is why the Datadog Ruby profiler defaults to 400 frames, and I know of customers that needed to increase this further. |
| #define MAX_FRAME_UNWINDS 128 | ||
| // 1024 is the maximum power of 2 we can fit in a single perf event, and | ||
| // in a single per CPU array entry | ||
| #define MAX_FRAME_UNWINDS 1024 |
There was a problem hiding this comment.
Not every environment does run ruby application and so will not directly benefit from this change. As with this change, eight times more memory is required for every trace that is reported from kernel to user space, did you measure the impact of this change for various workloads?
There was a problem hiding this comment.
The link you shared is for race sanitizer to workaround exactly that fact(only for race sanitizer) that we do not send the whole trace to userspace.
We should not be sending empty frames
There was a problem hiding this comment.
Sorry that I mixed up the race sanitizer work around. As this was just resently changed, it was the first that dig pop into my head.
Another component that will grow with this increase are the eBPF maps per_cpu_records.
How close are we getting to unwind 1024 frames? Without further changes, we might be limited:
| Unwinder | Unwinder limit | Unwinder limit * tail call limit |
|---|---|---|
| native | NATIVE_FRAMES_PER_PROGRAM = 4 | 4 * 32 = 128 |
| perl | PERL_FRAMES_PER_PROGRAM = 12 | 12 * 32 = 384 |
| php | FRAMES_PER_WALK_PHP_STACK = 19 | 19 * 32 = 608 |
| python | FRAMES_PER_WALK_PYTHON_STACK = 12 | 12 * 32 = 384 |
| ruby | FRAMES_PER_WALK_RUBY_STACK = 27 | 27 * 32 = 864 |
| v8 | V8_FRAMES_PER_PROGRAM = 8 | 8 * 32 = 256 |
| dotnet | DOTNET_FRAMES_PER_PROGRAM = 6 | 6 * 12 = 192 |
With further jumps between the different unwinders, the actual number might be lower.
There was a problem hiding this comment.
@florianl yeah so just about everything is potentially truncated right now.
Another component that will grow with this increase are the eBPF maps per_cpu_records.
I picked 1024 because It is the largest number that still fits into a single perf event, which is the limit for the per cpu record. Any higher power of 2 won't fit. There is a discussion about how to potentially go higher than that, as the per cpu record is currently an array of size of 1 but it doesn't necessarily need to be.
It is actually half of the size of other ruby profilers (stackprof and vernier allow a buffer of 2048, believe it or not - and rbspy is theoretically unlimited). My PR #907 raises the number of frames per tail call to 32, so the max theoretical number for ruby would actually be ~ 32 * 32 = 1024.
I don't believe we will really be wasting any memory here as profilers that don't write this many frames won't really allocate it. The per CPU array entry size does increase to something like 24Kb, but this isn't much of an increase on the system over all even if you have lots of processors. The benefit of being able to have untruncated stacks I think is worth it.
On the golang side, we should only end up actually transferring as much data as was actually written, so if memory increases it should be because we legitimately are sending more (useful) data to userspace.
|
I think this should be done together when fixing #940. There we can just set the variable length |
Yeah, I agree. I think it is a better solution overall from the sound of it |
Can we then close this PR? |
Most ruby / rails apps have deep stacks. As part of my efforts to make the otel BPF profiler work with ruby, I'm requesting to increase the maximum number of frames per trace to 1024, which fixes #760.
This is particularly needed for ruby, as it is a very common to have wrapper functions in frameworks like rails, such request as middlewares which all implement "env.call". When we have an intermixing of ruby and native frames, this becomes even more problematic.
This was originally part of #907 but I'm pulling it out to a separate PR at @florianl 's request so it can be discussed separately, and the memory implications for other interpreters / users can be considered.
Per @fabled 's thoughts in #760 though, I'm hoping this change is reasonable: