interpreter/ruby: add Ruby 4.0.1 support for stack unwinding#1123
Conversation
|
|
10d53f8 to
fb139e3
Compare
|
@dalehamel care to review? |
Ruby 4.0 was released December 2025 with significant internal changes: - New ZJIT compiler replaces some YJIT fields in rb_iseq_constant_body - New lvar_states field added to rb_iseq_constant_body - Complete redesign of rb_ractor_sync with Port-based API - Removed receiving_mutex and barrier_wait_cond from rb_ractor_struct Changes: - Update version gate to allow Ruby 4.0.x (up to 4.1.0 exclusive) - Add iseq_constant_body offsets for Ruby 4.0+ (size: 352 bytes) - Add rb_ractor_struct.running_ec offset for Ruby 4.0+ - amd64: 0x150 (reduced from 0x180 due to struct changes) - arm64: 0x160 (reduced from 0x190 due to struct changes) Tested successfully against Ruby 4.0.1 running Rails 8 with Puma, producing complete stack traces with function names, file paths, and line numbers.
fb139e3 to
dc82926
Compare
|
@liad-miggo Can you restore the coredump test, and send the datafiles? The |
5b76f90 to
593feed
Compare
Add support for Ruby 4.0.1 interpreter stack unwinding with correct struct offsets determined via GDB analysis: - rb_iseq_constant_body: local_iseq=176, size=304 - rb_ractor_struct.running_ec: amd64=0x138, arm64=0x148 Ruby 4.0+ has different struct layouts due to ZJIT replacing YJIT and internal API changes like the Port-based Ractor API. Includes coredump test cases for both arm64 and amd64 architectures that verify proper Ruby stack trace extraction with source locations.
593feed to
d0084b6
Compare
|
I've added. |
Can export the actual data files ( |
Regenerate expected frames using coredump rebase to include the correct class prefixes (Object#, Kernel#) for Ruby methods.
|
I already had Shopify#20 prepared, but was waiting for #1101 to merge. Now that it has merged, this PR needs to be updated to include the updated objspace structs for finding the GC bits. I can submit a PR to your branch to reconcile these if you'd like @liad-miggo |
dalehamel
left a comment
There was a problem hiding this comment.
Thanks for this, I filed liad-miggo#1 to try and reconcile this with the other offsets i had set for this, but still need to verify with rbenv and ruby-install on amd64
| vms.iseq_constant_body.insn_info_size = 128 | ||
| vms.iseq_constant_body.succ_index_table = 136 | ||
| vms.iseq_constant_body.local_iseq = 176 | ||
| vms.iseq_constant_body.size_of_iseq_constant_body = 304 |
There was a problem hiding this comment.
with ruby-install via:
ruby-install 4.0.1 -- --enable-shared
I get 360 for this. Likewise with rbenv (installed from git, with ruby-build plugin installed from git also):
rbenv install 4.0.1
I likewise get 360 for this. Both done in colima (ubuntu 24.04) on aarch64, m4 macbook.
In my draft PR it was 354, but i think it is also wrong and should be 360, i verified it is the same on 4.0.0.
On amd64, the jit members are not added by default at the end. They are not read anyways, so the smaller size should be safer.
Additional ruby 4.0+ offset updates
| vms.iseq_constant_body.insn_info_size = 128 | ||
| vms.iseq_constant_body.succ_index_table = 136 | ||
| vms.iseq_constant_body.local_iseq = 176 | ||
| if runtime.GOARCH == "amd64" { |
There was a problem hiding this comment.
I discovered that the difference here is actually not architecture dependent per se, it has to do with the ruby build toolchain:
The JIT fields at the end of the struct get added if you have rustc installed. It just so happened that my development environment on aarch64 has them, but the cleaner build box I used for amd64 did not.
If we take a look at pahole for this struct, we see that the size is the same on both platforms until you get to the JIT members at the end:
struct rb_iseq_constant_body {
enum rb_iseq_type type; /* 0 4 */
unsigned int iseq_size; /* 4 4 */
VALUE * iseq_encoded; /* 8 8 */ struct rb_iseq_parameters param; /* 16 48 */
/* --- cacheline 1 boundary (64 bytes) --- */
rb_iseq_location_t location; /* 64 48 */
struct iseq_insn_info insns_info; /* 112 32 */ /* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
const ID * local_table; /* 144 8 */
enum lvar_state * lvar_states; /* 152 8 */
struct iseq_catch_table * catch_table; /* 160 8 */
const struct rb_iseq_struct * parent_iseq; /* 168 8 */
struct rb_iseq_struct * local_iseq; /* 176 8 */
union iseq_inline_storage_entry * is_entries; /* 184 8 */ /* --- cacheline 3 boundary (192 bytes) --- */
struct rb_call_data * call_data; /* 192 8 */
struct {
rb_snum_t flip_count; /* 200 8 */
VALUE script_lines; /* 208 8 */
VALUE coverage; /* 216 8 */
VALUE pc2branchindex; /* 224 8 */
VALUE * original_iseq; /* 232 8 */
} variable; /* 200 40 */
unsigned int local_table_size; /* 240 4 */
unsigned int ic_size; /* 244 4 */
unsigned int ise_size; /* 248 4 */
unsigned int ivc_size; /* 252 4 */
/* --- cacheline 4 boundary (256 bytes) --- */
unsigned int icvarc_size; /* 256 4 */
unsigned int ci_size; /* 260 4 */
unsigned int stack_max; /* 264 4 */
unsigned int builtin_attrs; /* 268 4 */
_Bool prism; /* 272 1 */
/* XXX 7 bytes hole, try to pack */
union {
iseq_bits_t * list; /* 280 8 */
iseq_bits_t single; /* 280 8 */
} mark_bits; /* 280 8 */
struct rb_id_table * outer_variables; /* 288 8 */
const rb_iseq_t * mandatory_only_iseq; /* 296 8 */
rb_jit_func_t jit_entry; /* 304 8 */
...
We never actually read any members past mandatory_only_iseq, in fact the latest member in the struct we actually read is much earlier, local_iseq at 176.
I would argue that just using 304 is actually a safer value for the size of the struct that will work on more deployments, as it is possible we could try and do the larger read of 360 and then hit problems when JIT is not enabled, as the read hits invalid memory.
I don't think this issue is unique to ruby 4 either, it looks like the layout has been more or less the same at least since 3.1.0 https://github.com/ruby/ruby/blob/fb4df44d1670e9d25aef6b235a7281199a177edb/vm_core.h#L485, but the making these end members conditional on ZJIT / YJIT was added later
I'd like some feedback from the other otel maintainers what is preferred here. I'm leaning towards just using 304 on both platforms, with the knowledge that if we ever do access jit members of the struct, we'll need to be more careful about this.
There was a problem hiding this comment.
I'm leaning towards just using 304 on both platforms, with the knowledge that if we ever do access jit members of the struct, we'll need to be more careful about this.
I'm agreeing with this and it should be documented.
There was a problem hiding this comment.
Can we get rid of the GOARCH test, switch to using 304 for length and document your findings in the source?
There was a problem hiding this comment.
Can we get rid of the
GOARCHtest, switch to using304for length and document your findings in the source?
Added a suggestion for @liad-miggo that should address this
There was a problem hiding this comment.
Agreed, and accepted the suggestion.
dalehamel
left a comment
There was a problem hiding this comment.
I've verified these offsets and resolved this with another draft branch I had for ruby 4 support.
I will call out the comment i've made though on one of the structs, which has a size that is dependent on if you have rustc installed at build time or not.
Co-authored-by: Dale Hamel <dalehamel@users.noreply.github.com>
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
Summary
Background
Ruby 4.0 was released December 2025 with significant internal changes:
rb_iseq_constant_bodylvar_statesfield added torb_iseq_constant_bodyrb_ractor_syncwith Port-based APIreceiving_mutexandbarrier_wait_condfromrb_ractor_structChanges
Offsets determined via GDB analysis of Ruby 4.0.1 binaries:
iseq_constant_bodyiseq_constant_bodyrb_ractor_structrb_ractor_structTest plan
tools/coredump/testdata/amd64/ruby-4.0.1-loop.jsonandtools/coredump/testdata/arm64/ruby-4.0.1-loop.json