Skip to content

ebpf: remove UNROLL from bounded loops#1171

Merged
fabled merged 3 commits into
open-telemetry:mainfrom
florianl:bounded-loops
Feb 17, 2026
Merged

ebpf: remove UNROLL from bounded loops#1171
fabled merged 3 commits into
open-telemetry:mainfrom
florianl:bounded-loops

Conversation

@florianl
Copy link
Copy Markdown
Member

@florianl florianl commented Feb 13, 2026

Inspired by a discussion on Slack started by @gnurizen.

Removing UNROLL from bounded loops started to become possible with torvalds/linux@2589726 in Linux kernels 5.3 and newer.

$ ./bloat-o-meter tracer.ebpf.amd64.orig tracer.ebpf.amd64
function                                             old     new   delta
perf_unwind_python                                  5855    1534   -4321
kprobe_unwind_python                                5855    1534   -4321
perf_unwind_hotspot                                 7607    2017   -5590
kprobe_unwind_hotspot                               7607    2017   -5590
perf_unwind_perl                                    7324    1618   -5706
kprobe_unwind_perl                                  7324    1618   -5706
perf_unwind_php                                     6852     966   -5886
kprobe_unwind_php                                   6852     966   -5886
perf_unwind_dotnet10                                7393    1487   -5906
kprobe_unwind_dotnet10                              7393    1487   -5906
perf_unwind_dotnet                                  7531    1554   -5977
kprobe_unwind_dotnet                                7531    1554   -5977
perf_unwind_v8                                      7906    1674   -6232
kprobe_unwind_v8                                    7906    1674   -6232
perf_unwind_native                                  7810    1531   -6279
kprobe_unwind_native                                7810    1531   -6279
perf_unwind_beam                                    7681    1005   -6676
kprobe_unwind_beam                                  7681    1005   -6676
perf_unwind_ruby                                   14229    1284  -12945
kprobe_unwind_ruby                                 14229    1284  -12945
------------------------------------------------------------------------------
(add/remove: 930/7552 grow/shrink: 0/20 up/down: 0/-131036)Total: -131036 insns
$ ./bloat-o-meter tracer.ebpf.arm64.orig tracer.ebpf.arm64
function                                             old     new   delta
perf_unwind_python                                  5855    1534   -4321
kprobe_unwind_python                                5855    1534   -4321
perf_unwind_hotspot                                 6484    1821   -4663
kprobe_unwind_hotspot                               6484    1821   -4663
perf_unwind_perl                                    7324    1618   -5706
kprobe_unwind_perl                                  7324    1618   -5706
perf_unwind_php                                     6854     978   -5876
kprobe_unwind_php                                   6854     978   -5876
perf_unwind_dotnet10                                7393    1487   -5906
kprobe_unwind_dotnet10                              7393    1487   -5906
perf_unwind_dotnet                                  7531    1554   -5977
kprobe_unwind_dotnet                                7531    1554   -5977
perf_unwind_native                                  7632    1579   -6053
kprobe_unwind_native                                7632    1579   -6053
perf_unwind_v8                                      7970    1684   -6286
kprobe_unwind_v8                                    7970    1684   -6286
perf_unwind_beam                                    7681    1005   -6676
kprobe_unwind_beam                                  7681    1005   -6676
perf_unwind_ruby                                   14229    1284  -12945
kprobe_unwind_ruby                                 14229    1284  -12945
------------------------------------------------------------------------------
(add/remove: 904/7398 grow/shrink: 0/20 up/down: 0/-128818)Total: -128818 insns

Inspired by a discussion on Slack started by @gnurizen.

Removing UNROLL from bounded loops started to become possible with
torvalds/linux@2589726 in Linux kernels 5.3 and newer.

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@florianl florianl requested review from a team as code owners February 13, 2026 10:36
// https://github.com/ruby/ruby/blob/v3_4_7/vm_insnhelper.c#L769
UNROLL for (ep_check = 0; ep_check < MAX_EP_CHECKS; ++ep_check)
{
for (ep_check = 0; ep_check < MAX_EP_CHECKS; ++ep_check) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be fine, especially considering the integration tests pass. The delta in instruction counts for ruby is phenomenal (and kind of funny haha)

Copy link
Copy Markdown
Contributor

@gnurizen gnurizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet thanks!

Comment thread support/ebpf/dotnet_tracer.ebpf.c Outdated
Comment thread support/ebpf/v8_tracer.ebpf.c Outdated
@fabled
Copy link
Copy Markdown
Contributor

fabled commented Feb 14, 2026

All of our loops are bounded. Otherwise they cannot be unrolled. So if going this way I think we can remove the whole UNROLL macro.

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Feb 14, 2026

Only reason to leave UNROLL is potential performance optimization. If the JIT does not unroll loops, there is a small performance impact by the counter decrement/jump. Perhaps clang already unrolls small loops fully, and bigger loops partly by unrolling a few iterations.

If the JIT/compiler is smart enough the UNROLL can go away. Otherwise it can stay for the selected loops where it has meaningful performance impact. Also in those cases it might be better to insert hand written pragma to specify the number of iterations to unroll.

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Feb 14, 2026

I suppose the other case is if verifier is unable to verify large loops that are not unrolled. But seems that at least some of the main unwinder loops can be handled.

Would this allow doing a combined iteration count (unwinds per program) for all unwinder?

If yes, does it work for larger numbers? We could increase the number of frames we can unwind for several programs this way.

Another option to investigate would be if we can do one program to unwind all interpreter types to avoid the tail call completely? I suppose doing this could be difficult for the verifier to check.

Just throwing some questions to determine if this helps with other future strategies.

Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
@florianl
Copy link
Copy Markdown
Member Author

I didn't remove the UNROLL in the two remaining loops (see aea40ff) in the first place, as I thought it would hit the up to 256 rule as defined by #define UNROLL _Pragma("unroll 256"). So I did more experiments and learned, that I was wrong. Therefore I have removed all remains of UNROLL.

@fabled fabled merged commit cc33538 into open-telemetry:main Feb 17, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants