Skip to content

Conversation

@r1viollet
Copy link

@r1viollet r1viollet commented Jan 6, 2026

What does this PR do?

This PR reimplements part of the heap profiler to support Ruby 4 (which was
disabled in #5148 ).

Motivation:

Support heap profiling on Ruby 4!

Change log entry

Yes. Add support for heap profiling on Ruby 4.

Additional Notes:

The Datadog Ruby heap profiler tracks live heap objects by storing their object_id when they're allocated, then later using ObjectSpace._id2ref to check if those objects are still alive. This mechanism is currently incompatible with Ruby 4.x.

Key Components
  1. collectors_cpu_and_wall_time_worker.c - Main sampling coordinator
  2. heap_recorder.c - Tracks live heap objects using object IDs
Allocation Flow (Before Fix)
on_newobj_event()  →  start_heap_allocation_recording()  →  end_heap_allocation_recording()
                              ↓
                      rb_obj_id(new_obj)  ← PROBLEM HERE
Liveness Check Flow
heap_recorder_update()  →  st_object_record_update()  →  ruby_ref_from_id()
                                                              ↓
                                                      ObjectSpace._id2ref(obj_id)
The Ruby 4.x Problem
What Changed

Ruby 4.x changed how object_id works internally. The key issue:

  • on_newobj_event is called during object allocation (object is in "in-between state")
  • Calling rb_obj_id() during this event mutates the object (assigns an ID)
  • This mutation is not safe during the allocation tracepoint in Ruby 4.x
  • Reference: Ruby Issue #21710
Implemented Solution: Deferred Object ID Recording

We defer the rb_obj_id() call to after the allocation tracepoint completes using rb_postponed_job_trigger.

Allocation Flow (After Fix - Ruby 4+)
on_newobj_event()
    ↓
start_heap_allocation_recording()
    - Store VALUE in heap_recorder->active_deferred_object
    - Store metadata in heap_recorder->active_deferred_object_data
    ↓
end_heap_allocation_recording()
    - Move to pending_recordings[] array
    ↓
rb_postponed_job_trigger()
    ↓
finalize_heap_allocation_from_postponed_job()  ← Runs outside allocation tracepoint
    - heap_recorder_finalize_pending_recordings()
    - Call rb_obj_id() safely
    - Commit object_record to heap_record

⚠️ These changes are AI assisted and will require careful review & analysis of performance impacts.

How to test the change?

The usual test coverage uses the new code path on Ruby 4.

We've also added additional coverage in DataDog/prof-correctness#89 + internally in the reliability environment.

@github-actions github-actions bot added the profiling Involves Datadog profiling label Jan 6, 2026
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

Thank you for updating Change log entry section 👏

Visited at: 2026-01-29 11:52:43 UTC

@pr-commenter
Copy link

pr-commenter bot commented Jan 6, 2026

Benchmarks

Benchmark execution time: 2026-01-29 12:15:52

Comparing candidate commit a64a546 in PR branch r1viollet/heap-profiling-4.0 with baseline commit e700369 in branch master.

Found 0 performance improvements and 1 performance regressions! Performance is the same for 43 metrics, 2 unstable metrics.

scenario:profiling - intern_all 1000 repeated strings

  • 🟥 throughput [-3830.713op/s; -3735.016op/s] or [-14.617%; -14.251%]

@r1viollet r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch from c4fdb76 to 49477bf Compare January 6, 2026 17:46
@r1viollet
Copy link
Author

Reminder that to measure this, we should do ON/OFF. Some of the cost will be in the VM itself.

@datadog-official
Copy link

datadog-official bot commented Jan 9, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 87.10%
Overall Coverage: 95.21% (-0.00%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: a64a546 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran out of time today, here's what I got so far! In particular, I still need to stare at heap_recorder.c for longer, I'm not yet convinced it's correct, I'm seeing some state getting carried across calls that I'm not confident is right.

The current notes are small-ish stuff, other than the extra overhead that's unneeded (and I believe should be easy to fix) + the code exposing the heap recorder directly to the cpu and wall collector that ideally I'd like to avoid too if possible.

Comment on lines 93 to 92
#ifdef DEFERRED_HEAP_ALLOCATION_RECORDING
static rb_postponed_job_handle_t finalize_heap_allocation_from_postponed_job_handle;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: More as a style thing, in general I've avoided overly-ifdefing things out when they're harmless.

That is, the cost of a leftover extra field on Rubies that don't need it is so small that I usually prefer the advantage of less code and easier to reason due to less ifdef branching. (Same for most spots in this file)

Comment on lines 335 to 348
context "on Ruby 4.0 or newer" do
let(:testing_version) { "4.0.0" }

it "initializes StackRecorder without heap sampling support and warns" do
before do
settings.profiling.allocation_enabled = true
allow(logger).to receive(:debug)
end

it "initializes StackRecorder with heap sampling support" do
expect(Datadog::Profiling::StackRecorder).to receive(:new)
.with(hash_including(heap_samples_enabled: false, heap_size_enabled: false))
.with(hash_including(heap_samples_enabled: true, heap_size_enabled: true))
.and_call_original

expect(logger).to receive(:warn).with(/Datadog Ruby heap profiler is currently incompatible with Ruby 4/)

build_profiler_component
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do add an extra setting to toggle the new code, this testcase is still worth keeping. On the other hand, if we don't, then this test-case has become redundant: since Ruby 4 is no longer special, the existing tests below already cover this situation. (The test was added exactly because there was an exception for Ruby 4)

Comment on lines 82 to 92
#ifdef DEFERRED_HEAP_ALLOCATION_RECORDING
// A pending recording is used to defer the object_id call on Ruby 4+
// where calling rb_obj_id during on_newobj_event is unsafe.
typedef struct {
VALUE object_ref;
heap_record *heap_record;
live_object_data object_data;
} pending_recording;

#define MAX_PENDING_RECORDINGS 64
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: The same note about not overly if-def'ing things from the cpu and wall collector I believe applies here as well

Comment on lines 522 to 553
VALUE obj = heap_recorder->pending_recordings[i].object_ref;
if (obj != Qnil) {
rb_gc_mark(obj);
}
Copy link
Member

@ivoanjo ivoanjo Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object can't ever be Qnil (or if it is, we have a bug...), since it was set from an allocation and Qnil is not a heap-allocated object. (It's a tagged pointer)

Comment on lines 529 to 559
if (heap_recorder->active_deferred_object != Qnil) {
rb_gc_mark(heap_recorder->active_deferred_object);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This can can indeed trivially be Qnil; but btw it's OK to mark Qnil, so maybe remove the branch anyway? Less code ;)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels weird to do so ^^

Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here's my comments from the full pass :)

I think the key change needed is to make sure we call during_sample_enter to avoid any other parts of the profiler firing in the middle of finalization.

Having said that, especially on the heap profiler I'm not a huge fan of some of the duplication -- that code is ultra-fiddly and so having complex logic duplicated across if-defs I worry makes it easy to forget to update one of the versions.

Comment on lines 1430 to 1440
static void finalize_heap_allocation_from_postponed_job(DDTRACE_UNUSED void *_unused) {
cpu_and_wall_time_worker_state *state = active_sampler_instance_state;

if (state == NULL) return;

if (!ddtrace_rb_ractor_main_p()) {
return;
}

// Get the heap_recorder from the thread_context_collector
heap_recorder *recorder = thread_context_collector_get_heap_recorder(state->thread_context_collector_instance);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, on a second pass, there's two things missing here that can create a bit of a sharp edge:

a. We're missing during_sample_enter and during_sample_exit -- these set a flag that allows us to avoid nested operations inside the profiler. E.g. some sharp edges along the line of "the profiler is sampling something else -> it calls some VM api that causes the VM to check for interruptions -> the VM decides now it's a really nice time to flush heap things -> our current state may not be in a consistent sate" (or reverse -- maybe this is the function that started first, and it triggers an allocation, and the flip situation happens)

b. (Minor) We're missing the discrete_dynamic_sampler_before_sample and discrete_dynamic_sampler_after_sample calls to update the dynamic sampling rate mechanism. In practice, this means that work done inside this function isn't accounted as being profiler overhead. TBH what we're doing right now isn't a lot but... yeah maybe at least leave a comment saying "This is not being accounted for the dynamic sampling rate update, and it's ok because the amount of work we do in this case is very small"

Both things happen in e.g. on_newobj_event so doing the same here should be enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: Actually I wonder if we can even get recursion where finalize gets called -> goes into the vm -> vm calls finalize again (e.g. if there's an allocation) or something. during_sample_enter/during_sample_exit would also protect against that.

Comment on lines 465 to 489
bool heap_recorder_has_pending_recordings(heap_recorder *heap_recorder) {
if (heap_recorder == NULL) {
return false;
}
return heap_recorder->pending_recordings_count > 0;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Since this is only for debugging, I wonder if we should just return the count instead of a boolean. Same cost for the logic, and a bit easier to debug from the Ruby side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is style, I'll leave it to the end.

Comment on lines 704 to 742
describe "pending heap recordings cleanup" do
def has_pending_recordings?
described_class::Testing.send(:_native_has_pending_heap_recordings?, stack_recorder)
end

def track_object_without_finalize(obj)
described_class::Testing._native_track_object(stack_recorder, obj, sample_rate, obj.class.name)
Datadog::Profiling::Collectors::Stack::Testing
._native_sample(Thread.current, stack_recorder, metric_values, labels, numeric_labels)
end

it "clears pending recordings after finalization" do
skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4"

test_object = Object.new

track_object_without_finalize(test_object)

expect(has_pending_recordings?).to be true

described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)

expect(has_pending_recordings?).to be false
end

it "clears pending recordings after multiple allocations" do
skip "Only applies to Ruby 4+ with deferred heap allocation recording" if RUBY_VERSION < "4"

3.times do
test_object = Object.new
track_object_without_finalize(test_object)
end

expect(has_pending_recordings?).to be true

described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)

expect(has_pending_recordings?).to be false
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I'm not sure these tests add a lot... Specifically, because they don't assert on any results, it's easy for them to pass while not doing the correct thing.

Furthermore, we have existing coverage where we already check if the objects we intend to sample/track do get sampled/track.

So... I'd be inclined to remove these tests (and the pending_heap_recordings? helper maybe as well?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

things slightly changed. I'll let you refer to comment above (in this file)

@r1viollet r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch 2 times, most recently from e966c43 to ea9eba7 Compare January 16, 2026 15:50
@github-actions github-actions bot added the core Involves Datadog core libraries label Jan 16, 2026
@r1viollet
Copy link
Author

I took into account major comments.
Next step perf tests.
Then we can do style fixups.

@r1viollet r1viollet force-pushed the r1viollet/heap-profiling-4.0 branch from ea9eba7 to 4dbdd06 Compare January 19, 2026 08:32

expect(has_pending_recordings?).to be true

described_class::Testing._native_finalize_pending_heap_recordings(stack_recorder)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test shows that you need these bits which is not ideal (but I still like that the test forces me to do this).

@ivoanjo ivoanjo changed the title Heap profiling for ruby 4.x - Prototype [PROF-13510] Heap profiling for ruby 4.x - Prototype Jan 20, 2026
r1viollet and others added 15 commits January 21, 2026 14:28
The idea is to delay the time at which we record object ids.
Once we are outside of the allocation code path, we can request the object ID.
- Avoid scheduling many postponed jobs
This fixes some of the issues we had with accuracy
Also I suspect that this has less overhead
- Avoid re-entrancy based on Ivo's comments
Although some of this code is dead code on legacy Rubies, always
compiling it in means less ifdefs spread throughout and it helps
keep the code focused on modern rubies, rather than on legacy ones.
This check is already covered by
`heap_recorder->active_recording != NULL` (they're set and unset
together).
This reverts commit e153759.

(Avoid touching CHANGELOG for nicer diff)
…is needed

This will replace the more heavy-handed query in
`thread_context_collector_heap_pending_buffer_pressure`.
…filer directly

This avoids other parts of the profiler needing to care about this --
they only need to care to run the `after_sample` callback.
…ons directly

We no longer need to ask other parts of the code to raise instead :)
This probably needs adjusting for non-4.0 rubies, will do it as
a separate pass.
@ivoanjo ivoanjo force-pushed the r1viollet/heap-profiling-4.0 branch from c0e60ce to f04eb52 Compare January 21, 2026 17:10
@ivoanjo
Copy link
Member

ivoanjo commented Jan 23, 2026

Quick side-by-side results from Ruby 3.3 vs 4.0 in our Ruby on Rails test app:

image

ivoanjo added a commit to DataDog/prof-correctness that referenced this pull request Jan 29, 2026
**What does this PR do?**

This PR adds Ruby 4 heap profiling test variants. As per
DataDog/dd-trace-rb#5201, we need a different
implementation of heap profiling on Ruby 4, so we want to validate
it's still sane with prof-correctness.

These variants are effectively the same as the regular ones but:
* Replace Ruby 3.3 with Ruby 4.0
* Point to the branch from
  DataDog/dd-trace-rb#5201 (we'll need to
  change this back to master once landed)
* Set the `DD_PROFILING_EXPERIMENTAL_HEAP_RUBY4_ENABLED` env
  variable we're using to gate the new feature
* Update the `expected_profile.json` to take into account that
  object allocation in Ruby 4 produces a slightly different stack
  trace (there's no `new` method on the stack -- it gets inlined
  into the caller)

**Motivation:**

Validate Ruby 4 heap profiling.

**Additional Notes:**

There's still a few anomalies in the results we're looking into...

**How to test the change?**

Run as usual!
ivoanjo added a commit to DataDog/prof-correctness that referenced this pull request Jan 29, 2026
**What does this PR do?**

This PR adds Ruby 4 heap profiling test variants. As per
DataDog/dd-trace-rb#5201, we need a different
implementation of heap profiling on Ruby 4, so we want to validate
it's still sane with prof-correctness.

These variants are effectively the same as the regular ones but:
* Replace Ruby 3.3 with Ruby 4.0
* Point to the branch from
  DataDog/dd-trace-rb#5201 (we'll need to
  change this back to master once landed)
* Set the `DD_PROFILING_EXPERIMENTAL_HEAP_RUBY4_ENABLED` env
  variable we're using to gate the new feature
* Update the `expected_profile.json` to take into account that
  object allocation in Ruby 4 produces a slightly different stack
  trace (there's no `new` method on the stack -- it gets inlined
  into the caller)

**Motivation:**

Validate Ruby 4 heap profiling.

**Additional Notes:**

There's still a few anomalies in the results we're looking into...

**How to test the change?**

Run as usual!
Allow heap profiling to be enabled on Ruby 4 without any extra flags.
@ivoanjo ivoanjo changed the title [PROF-13510] Heap profiling for ruby 4.x - Prototype [PROF-13510] Heap profiling for ruby 4.x Jan 29, 2026
@ivoanjo
Copy link
Member

ivoanjo commented Jan 29, 2026

TODO: We're still investigating a few benchmarks / validation results, but this should be ready for review.

@ivoanjo ivoanjo marked this pull request as ready for review January 29, 2026 11:53
@ivoanjo ivoanjo requested review from a team as code owners January 29, 2026 11:53
@ivoanjo ivoanjo requested a review from AlexJF January 29, 2026 11:53
Copy link
Contributor

@AlexJF AlexJF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me. Some minor feedback to try to prevent so much split logic.

Also, if the deferred mechanism is also compatible with Ruby < 4, could it make sense to unify codepaths and have all be deferred? Or does this incur a perf penalty?

Comment on lines +371 to +380
// The intuition here is: Usually we ask for an `after_allocation` callback only when the buffer is about to go
// from empty -> non-empty, because this is going to be mapped onto a postponed job, so after it gets queued once
// it doesn't seem worth it to keep spamming requests.
// Yet, if for some reason the postponed job doesn't flush the pending list (or if e.g. it ran with `during_sample == true ` and thus
// was skipped) we need to have some mechanism to recover -- and so if the buffer starts accumulating too much we
// start always requesting the callback to happen so that we eventually flush the buffer.
bool needs_after_allocation =
heap_recorder->pending_recordings_count == 0 || heap_recorder->pending_recordings_count >= MAX_PENDING_RECORDINGS / 2;

return needs_after_allocation;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this logic is tied with the #ifdef above (which is why the return true in line 352 mentions this code block. We could promote needs_after_allocation as a top-level variable with default value false, move this logic to the first #ifdef and leave this second #ifdef to focus on the capturing of the deferred object data.

I feel doing this makes it harder to forget to update things if our logic for needs_after_allocation changes in the future?

Comment on lines +148 to +149
VALUE active_deferred_object;
live_object_data active_deferred_object_data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: This duplication of "responsibility/intent" between active_recording and active_deferred feels weird and error-prone (e.g. I'm not sure if rate-based skipping in line 343 plays well with your #ifdef handling logic on the end - after all, sounds possible for the end to not see a valid active_deferred_object reference in this case. Could we maybe just have active_recording be a flexible_object_record which would be a object_record with either a long obj_id; or a VALUE object_ref;?

Copy link
Author

@r1viollet r1viollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (can't approve as I opened this)
Though feel free to merge.
I was thinking that a test that would exercise some of the raise code paths could be good, though not blocking for this PR.
Performance looks the same as other heap instrumentation.
Accuracy tests are OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries profiling Involves Datadog profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants