ebpf: remove UNROLL from bounded loops#1171
Conversation
Inspired by a discussion on Slack started by @gnurizen. Removing UNROLL from bounded loops started to become possible with torvalds/linux@2589726 in Linux kernels 5.3 and newer. Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
| // https://github.com/ruby/ruby/blob/v3_4_7/vm_insnhelper.c#L769 | ||
| UNROLL for (ep_check = 0; ep_check < MAX_EP_CHECKS; ++ep_check) | ||
| { | ||
| for (ep_check = 0; ep_check < MAX_EP_CHECKS; ++ep_check) { |
There was a problem hiding this comment.
I think these should be fine, especially considering the integration tests pass. The delta in instruction counts for ruby is phenomenal (and kind of funny haha)
|
All of our loops are bounded. Otherwise they cannot be unrolled. So if going this way I think we can remove the whole UNROLL macro. |
|
Only reason to leave UNROLL is potential performance optimization. If the JIT does not unroll loops, there is a small performance impact by the counter decrement/jump. Perhaps clang already unrolls small loops fully, and bigger loops partly by unrolling a few iterations. If the JIT/compiler is smart enough the UNROLL can go away. Otherwise it can stay for the selected loops where it has meaningful performance impact. Also in those cases it might be better to insert hand written pragma to specify the number of iterations to unroll. |
|
I suppose the other case is if verifier is unable to verify large loops that are not unrolled. But seems that at least some of the main unwinder loops can be handled. Would this allow doing a combined iteration count (unwinds per program) for all unwinder? If yes, does it work for larger numbers? We could increase the number of frames we can unwind for several programs this way. Another option to investigate would be if we can do one program to unwind all interpreter types to avoid the tail call completely? I suppose doing this could be difficult for the verifier to check. Just throwing some questions to determine if this helps with other future strategies. |
Signed-off-by: Florian Lehner <florian.lehner@elastic.co>
d035779 to
aea40ff
Compare
|
I didn't remove the UNROLL in the two remaining loops (see aea40ff) in the first place, as I thought it would hit the up to 256 rule as defined by |
Inspired by a discussion on Slack started by @gnurizen.
Removing UNROLL from bounded loops started to become possible with torvalds/linux@2589726 in Linux kernels 5.3 and newer.