Profiler Event Struct && BPF Ring Buffer for profiler implementation #228
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey guys, love the stuff you're cooking here. Always wanted to learn eBPF so thanks for documenting all your issues here
This PR aims to tackle these two issues:
When reading through the requirements of the second issue, I realized that you want to have a ring buffer per-CPU. Ring Buffers are designed to be shared across all CPUs. What would be the reason to have isolated ring buffers per CPU?
According to eBPF docs, if you were to proceed with implementing per-CPU ring buffers, you would lose major benefits that a single centralized Ring Buffer brings to the table. They are:
Memory footprint - BPF ringbuf memory usage scales better with increased amount of CPUs, because going from 16 to 32 CPUs doesn’t necessarily require twice as big a buffer to accommodate more load.
Event ordering - having a per-CPU ring buffer inherently introduces a problem of having to sync related events that are scattered across ring buffers/CPUs, making the application that needs to read those events more complex. (for example: fork(), exec(), and exit() can happen in a very rapid succession on different CPUs for short-lived processes due to the kernel scheduler migrating them from one CPU to another)
Let me know if i'm wrong here and feel free to correct! I would love to try and tackle the second issue, but before i do that, i'd like to get some clarity on whether you want to go per-CPU or single buffer. Thank you!
Reference:
https://nakryiko.com/posts/bpf-ringbuf/
https://www.kernel.org/doc/html/latest/bpf/ringbuf.html