Skip to content

tracer/ruby: Optimize ep check to simplify work for verifier, increase limit#1100

Merged
christos68k merged 5 commits into
open-telemetry:mainfrom
Shopify:optimize-ep-check-upstream
Feb 3, 2026
Merged

tracer/ruby: Optimize ep check to simplify work for verifier, increase limit#1100
christos68k merged 5 commits into
open-telemetry:mainfrom
Shopify:optimize-ep-check-upstream

Conversation

@dalehamel
Copy link
Copy Markdown
Contributor

What

Optimizes the inner loop used when walking from the cfp->ep until we hit a local env, when checking for a method entry.

I also realized we had a regression on MAX_EP_CHECKS, on our branch I had raised it to 10 but in the #907 it was only set to 6. I addressed that issue, as we have some cases (ruby graphql libraries in rails) that seem to potentially have up to 10 levels of indirection to get to the IMEMO_MENT entry. It will now walk up to 10 environment pointers before giving up (and erroring), instead of just 6.

Why

Even though it passed the verifier in CI, when I rebased after merging and tried in production, I started having verifier errors. It works on all CI amd64 checks, works locally on aarch64 with linux 6.8, but fails with amd64 on linux 6.6.

It seems that when I switched from goto's to a simpler UNROLL while addressing the feedback in #907 (see 953cb67) , I didn't optimally structure the loop.

Without these changes, it will error1 out if there is a lot of indirection to find a local method entry, such as in ruby graphql libraries and rails with a lot of meta-programming.

How

I tried to push all code that didn't need to be in the unroll to either before or after it. The unroll will now only check:

  • If it is local (base case break the recursion / stop the loop)
  • read the imemo from rbasic flags
  • and always exit if it is a method entry
  • break the loop if it is local and an svar

All other probe reads and cfunc checks are moved out of the loop explicitly.

The MAX_EP_CHECKS macro is bumped to 10

Footnotes

  1. perhaps bailing on unwinding the whole stack is too severe, we can probably unwind the other frames and if we hit max ep checks, return a dummy frame for userpace to symbolize as something like <unknown ruby_local_ep_missing>

// is M*N for instruction checks, so be extra sensitive about additions there.
// If we get ERR_RUBY_READ_CME_MAX_EP regularly, we may need to raise it.
#define MAX_EP_CHECKS 6
#define MAX_EP_CHECKS 10
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the future if we do have issues with the verifier, this tunable and the one above for how many frames to walk can adjust this. However, I'd rather leave this at 10 as it should handle most cases, including more pathological workloads.

#define IMEMO_MENT 6

// https://github.com/ruby/ruby/blob/36809a8d0c7ab67ff0919b331db926529a3e98a9/vm_core.h#L1375
#define GC_GUARDED_PTR_REF_MASK 0x03
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed i had a magic constant below, named it and doc'd it here.

{
// On every iteration except the first, get the ep from specval only if
// it is non-local.
if (ep_check > 0) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This weird check isn't necessary as we just do this processing before the loop starts now.

// to pass the kernel verifier.
// https://github.com/ruby/ruby/blob/v3_4_7/vm_insnhelper.c#L743
if (me_or_cref == 0)
continue;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This continue statement was a headache. Now it just uses a branch and in the case it is 0, we advance to the next ep as that is the next code in the loop

return ERR_RUBY_READ_RBASIC_FLAGS;
}
// If the env is local, check if imemo is svar and if so, dereference it
if ((u64)vm_env.flags & VM_ENV_FLAG_LOCAL) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out this doesn't need to run in the loop, as we only care about the base-case. an "svar" is only allowed once we have determined we are local, so we run that check here after the loop is done. The additional "SVAR" check lets is potentially dereference an imemo entry from the svar, and then we check that imemo in the next branch below.

Copy link
Copy Markdown
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some documentation nits

Comment thread support/ebpf/ruby_tracer.ebpf.c Outdated
Comment thread support/ebpf/ruby_tracer.ebpf.c
Comment thread support/ebpf/ruby_tracer.ebpf.c Outdated
Comment thread support/ebpf/ruby_tracer.ebpf.c Outdated
Comment thread support/ebpf/ruby_tracer.ebpf.c Outdated
Comment thread support/ebpf/ruby_tracer.ebpf.c
Comment thread support/ebpf/ruby_tracer.ebpf.c Outdated
@dalehamel
Copy link
Copy Markdown
Contributor Author

Some documentation nits

Cheers, applied them all thanks! Were any of these from a linter setting? Perhaps we could update .clang-format if so.

dalehamel and others added 3 commits February 2, 2026 15:34
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
@dalehamel dalehamel force-pushed the optimize-ep-check-upstream branch from c0abaa8 to 2462f24 Compare February 2, 2026 20:36
Copy link
Copy Markdown
Member

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment.

Comment thread support/ebpf/ruby_tracer.ebpf.c Outdated
dalehamel and others added 2 commits February 3, 2026 05:27
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
@dalehamel
Copy link
Copy Markdown
Contributor Author

Just a minor comment.

Thanks, applied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants