ruby: Handle Ruby JIT PC with JIT frame type#1102
Conversation
|
Opening this one as a draft, it should probably have coredump tests as well. While it is independent from #1101 , it should probably land after it as it is lower a bit priority. If the approach looks good, I'll add some coredump tests. |
|
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
|
I managed to add support for better jit unwinding support when frame pointers are present, I am thinking i will probably submit both together - this is more stub stuff |
7120077 to
e4d6d9a
Compare
korniltsev-grafanista
left a comment
There was a problem hiding this comment.
I've just tested the change locally and it worked well on 3.4.9 and 3.3.10
I left some comments, but the general idea works and unlocks the profiling with jit enabled
hey @korniltsev-grafanista i have an updated version of this branch Shopify#26 but i haven't updated this yet as i'm waiting for two other PRs to land first ( #1226 and #1227 ) Thanks for your comments, when the other PRs land trying to upstream the JIT support is my next priority. For now i can say that this branch is at least tested in production and works quite well for our cases so far but certainly can use some review and hardening. |
|
I took a look at Shopify#26 and it looks good to me. I would still prefer to only handle the whole jit area once and forget about it, than updating maps everytime something is recompiled, but it should work anyways. |
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
- Restore findJITRegion() from base (open-telemetry#1102) which was incorrectly removed during rebase - PR #26 now reuses the shared function instead of inline code - Remove redundant inline first-pass JIT detection (now handled by findJITRegion) - Detach cleanup is inherited from the updated base (open-telemetry#1102)
94930a6 to
5831ed4
Compare
|
@florianl I added new coredump tests for ruby with yjit enabled, can you please add the modules to the store |
|
@korniltsev-grafanista please take another look, i've added unit tests for the memory region detection and addressed your comments. I've also added coredump tests. I'm planning on submitting a second PR once this lands which will add proper frame pointer detection to keep the native state in sync, which would be needed for unwinding zjit correctly. But for now that's lower priority, this allows the ruby unwinder to work with either jit enabled at all. |
5831ed4 to
e151da4
Compare
| "Range#each+0 in <cfunc>:0", | ||
| "block in <main>+0 in /Users/dalehamel/src/github.com/Shopify/otel-ebpf-pr1102/tools/coredump/testsources/ruby/loop.rb:33", | ||
| "Kernel#loop+0 in <internal:kernel>:168", | ||
| "<main>+0 in /Users/dalehamel/src/github.com/Shopify/otel-ebpf-pr1102/tools/coredump/testsources/ruby/loop.rb:32" |
There was a problem hiding this comment.
Note that because we lack the ability to continue unwinding when there are no frame pointers, the 'base frame' appears as the base ruby frame, not actual C / native frame that is the entrypoint to the ruby interpreter process. This is something that my follow-up PR will address. but for now it is expected behaviour that once jit is detected, we don't resume native unwinding.
And even with the follow-up, it isn't guaranteed we can - ruby needs to be running with a jit-perf mode that enables the frame pointers. So this is generally an expected and (I think) acceptable tradeoff for being able to profile with jit enabled.
0c685ac to
edbf402
Compare
|
thanks for the review and feedback @florianl I applied your suggestions in one commit (per the new guidance to ensure you get credit), and fixed them up in a second to fix the build and address the remaining comments. |
|
|
||
| for _, prefix := range prefixes { | ||
| if isNew { | ||
| if err := ebpf.UpdatePidInterpreterMapping(pid, prefix, |
There was a problem hiding this comment.
My only tiny suggestion remains the same is to consider taking advantage of the fact that we know the full size of the yjit area (48/128/x mib) and we can just find the first mapping and assume the whole area belongs to the ruby interpreter without figuring out which subset of it has already been occupied/ garbage collected by the jit. This would simplify go code, there will be less map updates/deletitions every time something is recompiled, maps will be smaller.
| // Heuristic fallback: first anonymous executable mapping by address. | ||
| // Mappings from /proc/pid/maps are sorted by address, so the first | ||
| // match is the lowest address. | ||
| if !heuristicFound && m.IsExecutable() && m.IsAnonymous() { |
There was a problem hiding this comment.
Why does the label detection extends start/end jit area, but the heuristic sets it to the first found?
There was a problem hiding this comment.
There's also a bit of disconnection with the interpreter mappings: we UpdatePidInterpreterMapping for all mappings, but only set jit start/end for the first one. Is this intentional?
There was a problem hiding this comment.
should be addresesd, please take another look
| wantFound: true, | ||
| }, | ||
| { | ||
| name: "heuristic fallback - first anonymous executable mapping", |
There was a problem hiding this comment.
Taking only the first mapping may be not correct.
I did not test, but looks like it would not catch at least this case.
7f17d99b9000-7f17d9b18000 r-xp 00000000 00:00 0
7f17d9b18000-7f17d9c31000 r-xp 00000000 00:00 0
7f17d9c31000-7f17e19b9000 ---p 00000000 00:00 0
Both of this mappings belong to the jit area. And there may be many more cases. There may be more mappings later. Furthermore once full jit area is occupied some of the pages may be grabage collected, so there may be holes in it.
There was a problem hiding this comment.
I don't understand why we reject the second mapping in the heuristic fallback - first anonymous executable mapping testcase.
execAnon(0x7f0000100000, 0x4000),
execAnon(0x7f0000200000, 0x8000),
There was a problem hiding this comment.
Thanks, i believe i addressed this
… cleanup - Extract JIT region detection into standalone findJITRegion() function that handles both prctl-labeled mappings (spanning full reserved area including holes) and heuristic fallback (first anonymous executable mapping) - Add comprehensive unit tests for JIT region detection: no mappings, file-backed only, labeled single/multiple, heuristic fallback, precedence - Fix Detach() to clean up PidInterpreterMapping prefixes (previously only deleted proc data, leaking eBPF map entries) - Remove dead code: m.Vaddr < jitMapping.Vaddr branch that could never be true since /proc/pid/maps is sorted by address
- Fix JIT region check to use >= for lower bound (half-open interval [start, end)) matching V8 and HotSpot conventions - Reuse map lookup value to avoid double hashing of RawMapping struct key - Wrap CalculatePrefixList error for better debugging - Log stale prefix deletion errors instead of silently discarding - Move prctl(PR_SET_VMA) named anonymous mapping support from process/process.go (was in PR #36 but is needed here for findJITRegion labeled detection) - Remove stale 'assume all anonymous' comment - Rebuild BPF blobs for >= fix
- Add 'Register LPM prefixes' comment to SynchronizeMappings loop - Extract in_jit bool variable from inline if condition for clarity - Improve JIT detection comments (read_ruby_frame and walk_ruby_stack) - Fix whitespace: add blank line after FRAMES_PER_WALK_RUBY_STACK define, normalize MAX_EP_CHECKS spacing - Rebuild BPF blobs
Ruby 3.4.7 with --yjit on arm64. Verifies that: - JIT code region is detected from anonymous executable mappings - A JIT frame is pushed for the YJIT-compiled code - Ruby VM frames (is_prime, sum_of_primes, loop) are properly unwound after the JIT frame On main, this coredump fails with native_no_pid_page_mapping because the profiler doesn't know how to handle the anonymous JIT mapping. Moduledata tar for CI upload: ~/coredump-uploads/ruby-yjit-moduledata.tar
Ruby 3.4.7 with --yjit on x86_64. Same verification as arm64 test: - JIT code region detected from anonymous executable mappings - JIT frame pushed for YJIT-compiled code - Full Ruby VM stack unwound (is_prime, sum_of_primes, loop) Moduledata tar: ~/coredump-uploads/ruby-yjit-amd64-moduledata.tar
On systems with CONFIG_ANON_VMA_NAME=y, Ruby labels its JIT memory region via prctl(PR_SET_VMA), giving it a path like '[anon:Ruby:rb_yjit_reserve_addr_space]'. Without this fix, IsFileBacked() returns true for these mappings (non-empty Path), causing IsAnonymous() to return false. This prevents the LPM prefix registration loop in SynchronizeMappings from registering JIT executable pages for eBPF dispatch. Add IsPrctlNamed() helper and exclude prctl-named mappings from IsFileBacked(), so they are correctly classified as anonymous.
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Adapt SynchronizeMappings to work with uint32 values instead of *uint32 pointers: since generations are no longer shared via pointer, both mappings and their prefixes must be updated independently each cycle. Use an isNew flag to only call UpdatePidInterpreterMapping for newly discovered mappings.
Address florianl's concern about unlimited growth: entries are pruned each SynchronizeMappings call via generation tracking, and the map size is bounded by the number of executable anonymous mappings per process.
Only push the JIT frame when trace->num_frames == 0 (leaf position), restoring the original behavior. Without this guard, when native frames were already unwound before the Ruby unwinder runs, the JIT frame was pushed deep in the stack instead of at the leaf, causing it to appear as the root frame in flamegraphs.
This reverts commit f88aaa3.
walk_ruby_stack is re-entered via tail calls to process more frames. On re-entry, in_jit was recomputed from record->state.pc which hasn't changed (non-FP path), causing the JIT frame to be pushed again on every tail call. Guard with !jit_detected so the JIT frame is only pushed once on the first entry.
ecf5e25 to
d03e376
Compare
|
@korniltsev-grafanista @florianl sorry for the delay in updating this, i had to debug and fix some errors related to the otel protocol change and then got distracted with other tasks. Finally got around to rebasing and QA'ing this in prod today after addressing the feedback above, and it's still looking good functionality-wise. |
korniltsev-grafanista
left a comment
There was a problem hiding this comment.
- My main suggestion remain regarding holes: JIT area may have multiple
r-xpentries withrw-por---pbetween them. It looks like in the current revisionjit_endmay point in the middle of JIT area while there are more JIT mappings beyondjit_end - nit: This is not a big deal, but I think we can simplify the code and logic here while also saving a number of syscalls and some memory. Currently, SynchronizeMapping updates the mappings every time something is recompiled or YJIT is GCed. It seems like we could avoid that by treating the entire YJIT code region, for example 128 MiB or whatever value is passed to YJIT, as a single mapping. Since we already know the start address and size, we can compute jit_start and jit_end once and update the eBPF interpreter mappings map with one large range, even if there are multiple smaller mappings inside it. That would mean we only need to update the BPF interpreter mapping once per process. It should reduce eBPF map updates, reduce map size, avoid the extra map in the Go code, prevent uncontrolled growth of mappings, and remove the possibility of jit_start / jit_end getting out of sync with the mappings. The Go-side logic would also be simpler. Unless I am missing something here?
| Path: "[anon:Ruby:rb_yjit_reserve_addr_space]", | ||
| } | ||
| } | ||
| fileBacked := func(vaddr, length uint64, path string) process.RawMapping { |
There was a problem hiding this comment.
According to
// Needed for JIT mappings (Hotspot, V8, BEAM, etc.)
interpreterNeeded := m.IsExecutable() && m.IsAnonymous()
// Needed by .NET to retrieve PE assembly mappings
interpreterNeeded = interpreterNeeded || strings.HasSuffix(m.Path, ".dll")I believe interpreters do not receive file backed mappings or PROT_NONE mappings and therefore these tests exercise something that is not happening in real world. I suggest we only pass m.IsExecutable() && m.IsAnonymous() pages in the test to make it closer to what's happening in real life.
Co-authored-by: Tolyan Korniltsev <anatoly.korniltsev@grafana.com>
What
Detect JIT frames produced by ruby's yjit / zjit to enable the ruby interpreter to still work if jit is in use.
Why
Fixes #937
yjit is a substantial improvement to performance, but currently breaks the ruby interpreter as it will never run, since the PC for the interpreter is never called - the whole point of JIT is to avoid the interpreter after all.
How
We use the
SynchronizeMappingshook to detect the ruby JIT address range, and if the PC is in this range we will push a dummy frame to indicate that the leaf is some JIT code.In the future, we can support the linux jit interface to symbolize these, but for now we just push a dummy frame.
Since Ruby's jit works by just running native code replacing the ISEQ of the leaf CME, we can just switch to unwinding the ruby stack once we detect a JIT frame.
However, since we cannot be guaranteed base pointers are available (and even if they are, we don't seem to be able to further unwind correctly with the native unwinder), we can't switch back to native unwinding once we have detected a JIT frame on the stack. This means that if JIT is enabled, we don't get the "interleaved" native and ruby stacks anymore, but we do still get the native frames on the edge of the stack which are probably the most interesting:
In a follow-up, PR, i'll add more thorough support for continuing to unwind when we have frame pointers enabled, and better support for zjit which is a method-based jit.