fix(profiling): upper bound on iterations for `TaskInfo::unwind` by KowalskiThomas · Pull Request #16510 · DataDog/dd-trace-py

KowalskiThomas · 2026-02-14T23:05:15Z

Description

This PR updates the Task unwinding logic for the Profiler to have an upper bound on the number of (1) Tasks in the Task chain unwound (2) coroutines in the coroutine chain unwound.

This is important because if somehow we have some memory corruption (very possible, as we don't take a snapshot of the interpreter memory but rather copy select "chunks" over time, and the state of Tasks can change as we copy those "chunks"), we could otherwise end up looping infinitely (which is bad for obvious reasons) and as a result try to add an infinite number of items to the Frame Stack (which is arguably significantly worse, as this would mean trying to allocate an infinite amount of memory 💣).

We spotted this issue when we deployed 4.5.0rc2 to internal Rapid Python HTTP services, see IR-49542.

cit-pr-commenter-54b7da · 2026-02-14T23:09:24Z

Codeowners resolved as

ddtrace/internal/datadog/profiling/stack/src/echion/tasks.cc            @DataDog/profiling-python
ddtrace/internal/datadog/profiling/stack/src/echion/threads.cc          @DataDog/profiling-python
releasenotes/notes/profiling-fix-max-iterations-unwind-tasks-671d743912c7d600.yaml  @DataDog/apm-python

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb0aecac07

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ddtrace/internal/datadog/profiling/stack/src/echion/tasks.cc

P403n1x87

Should we try to detect potential cycles as well?

KowalskiThomas · 2026-02-16T12:40:17Z

Should we try to detect potential cycles as well?

Funnily(?) I had a PR that did exactly this but that I never merged (see here: #15712). We had discussions around this recently because detecting cycles means using hash maps and that is more costly than just using a counter.

We probably should decide one way forward -- only counters or only hash sets (or bloom filters, possibly...) but the latest thing we settled on was "let's not introduce more hash sets" so that's what I followed here.
I think given the cost and our current overhead, a bloom filter would probably be the best tradeoff but I only thought of that this weekend and we've never discussed it before.

releasenotes/notes/profiling-fix-max-iterations-unwind-tasks-671d743912c7d600.yaml

KowalskiThomas · 2026-02-16T16:09:56Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-02-16T16:10:00Z

View all feedbacks in Devflow UI.

2026-02-16 16:09:59 UTC ℹ️ Start processing command /merge

2026-02-16 16:10:06 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2026-02-16 17:16:08 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 5h (p90).

2026-02-16 19:17:23 UTC ❌ MergeQueue: The build pipeline has timeout

The merge request has been interrupted because the build 96780885 took longer than expected. The current limit for the base branch 'main' is 120 minutes.

KowalskiThomas · 2026-02-16T22:44:17Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-02-16T22:44:21Z

View all feedbacks in Devflow UI.

2026-02-16 22:44:20 UTC ℹ️ Start processing command /merge

2026-02-16 22:44:26 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2026-02-16 23:36:09 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 5h (p90).

2026-02-17 00:45:01 UTC ❌ MergeQueue: The checks failed on this merge request

Tests failed on this commit f6718bc:

What to do next?

Investigate the failures and when ready, re-add your pull request to the queue!
If your PR checks are green, try to rebase/merge. It might be because the CI run is a bit old.
Any question, go check the FAQ.

KowalskiThomas · 2026-02-17T07:54:58Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-02-17T07:55:02Z

View all feedbacks in Devflow UI.

2026-02-17 07:55:01 UTC ℹ️ Start processing command /merge

2026-02-17 07:55:05 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 5h (p90).

2026-02-17 08:41:06 UTC ❌ MergeQueue: The checks failed on this merge request

Tests failed on this commit 04dbd01:

What to do next?

Investigate the failures and when ready, re-add your pull request to the queue!
If your PR checks are green, try to rebase/merge. It might be because the CI run is a bit old.
Any question, go check the FAQ.

KowalskiThomas · 2026-02-17T09:32:55Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-02-17T09:32:59Z

View all feedbacks in Devflow UI.

2026-02-17 09:32:58 UTC ℹ️ Start processing command /merge

2026-02-17 09:33:04 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 5h (p90).

2026-02-17 10:21:06 UTC ❌ MergeQueue: The checks failed on this merge request

Tests failed on this commit f5a3b91:

What to do next?

Investigate the failures and when ready, re-add your pull request to the queue!
If your PR checks are green, try to rebase/merge. It might be because the CI run is a bit old.
Any question, go check the FAQ.

KowalskiThomas · 2026-02-17T13:23:16Z

/merge

gh-worker-devflow-routing-ef8351 · 2026-02-17T13:23:19Z

View all feedbacks in Devflow UI.

2026-02-17 13:23:19 UTC ℹ️ Start processing command /merge

2026-02-17 13:23:28 UTC ℹ️ MergeQueue: waiting for PR to be ready

This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals. View in MergeQueue UI.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

2026-02-17 14:19:05 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 5h (p90).

2026-02-17 14:44:02 UTC ℹ️ MergeQueue: Readding this merge request to the queue because another merge request processed with yours failed. No action is needed from your side.

2026-02-17 14:45:54 UTC ℹ️ MergeQueue: Readding this merge request to the queue because another merge request processed with yours failed. No action is needed from your side.

2026-02-17 16:01:20 UTC ℹ️ MergeQueue: This merge request was merged

github-actions · 2026-02-17T19:11:09Z

This change is marked for backport to 4.5 and it does not conflict with that branch.
The command used to test backporting was

git checkout 4.5 && git cherry-pick -x --mainline 1 0cfe067b01a81fd4ea886950eb02a9a05bbfdf17

) ## Description This PR updates the Task unwinding logic for the Profiler to have an upper bound on the number of (1) Tasks in the Task chain unwound (2) coroutines in the coroutine chain unwound. This is important because if somehow we have some memory corruption (very possible, as we don't take a snapshot of the interpreter memory but rather copy select "chunks" over time, and the state of Tasks can change as we copy those "chunks"), we could otherwise end up looping infinitely (which is bad for obvious reasons) and as a result try to add an infinite number of items to the Frame Stack (which is arguably significantly worse, as this would mean trying to allocate an infinite amount of memory 💣). We spotted this issue when we deployed `4.5.0rc2` to internal Rapid Python HTTP services, see IR-49542. Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com> (cherry picked from commit 0cfe067)

) ## Description This PR updates the Task unwinding logic for the Profiler to have an upper bound on the number of (1) Tasks in the Task chain unwound (2) coroutines in the coroutine chain unwound. This is important because if somehow we have some memory corruption (very possible, as we don't take a snapshot of the interpreter memory but rather copy select "chunks" over time, and the state of Tasks can change as we copy those "chunks"), we could otherwise end up looping infinitely (which is bad for obvious reasons) and as a result try to add an infinite number of items to the Frame Stack (which is arguably significantly worse, as this would mean trying to allocate an infinite amount of memory 💣). We spotted this issue when we deployed `4.5.0rc2` to internal Rapid Python HTTP services, see IR-49542. Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com> (cherry picked from commit 0cfe067) Signed-off-by: Emmett Butler <emmett.butler321@gmail.com>

…kport 4.5] (#16542) Backport #16510 to 4.5 Signed-off-by: Emmett Butler <emmett.butler321@gmail.com> Co-authored-by: Thomas Kowalski <thomas.kowalski@datadoghq.com>

KowalskiThomas added the changelog/no-changelog A changelog entry is not required for this PR. label Feb 14, 2026

KowalskiThomas force-pushed the kowalski/test-profiling-upper-bound-on-iterations-for-taskinfo-unwind branch from 0cedab4 to 47da8ea Compare February 14, 2026 23:05

This comment has been minimized.

Sign in to view

KowalskiThomas changed the title ~~test(profiling): upper bound on iterations for taskinfo::unwind~~ test(profiling): upper bound on iterations for TaskInfo::unwind Feb 16, 2026

KowalskiThomas added the Profiling Continous Profling label Feb 16, 2026

KowalskiThomas changed the title ~~test(profiling): upper bound on iterations for TaskInfo::unwind~~ fix(profiling): upper bound on iterations for TaskInfo::unwind Feb 16, 2026

KowalskiThomas marked this pull request as ready for review February 16, 2026 09:30

KowalskiThomas requested review from a team as code owners February 16, 2026 09:30

KowalskiThomas requested review from christophe-papazian and vlad-scherbich February 16, 2026 09:30

chatgpt-codex-connector bot reviewed Feb 16, 2026

View reviewed changes

ddtrace/internal/datadog/profiling/stack/src/echion/tasks.cc Outdated Show resolved Hide resolved

christophe-papazian approved these changes Feb 16, 2026

View reviewed changes

KowalskiThomas removed the changelog/no-changelog A changelog entry is not required for this PR. label Feb 16, 2026

P403n1x87 approved these changes Feb 16, 2026

View reviewed changes

taegyunkim approved these changes Feb 16, 2026

View reviewed changes

taegyunkim reviewed Feb 16, 2026

View reviewed changes

releasenotes/notes/profiling-fix-max-iterations-unwind-tasks-671d743912c7d600.yaml Show resolved Hide resolved

KowalskiThomas force-pushed the kowalski/test-profiling-upper-bound-on-iterations-for-taskinfo-unwind branch from 5abd871 to 8d12117 Compare February 16, 2026 16:09

KowalskiThomas force-pushed the kowalski/test-profiling-upper-bound-on-iterations-for-taskinfo-unwind branch from 8d12117 to 0bc5769 Compare February 16, 2026 22:44

test(profiling): upper bound on iterations for taskinfo::unwind

bfc0343

KowalskiThomas added 3 commits February 17, 2026 14:04

more upper bounds

14f1df2

chore(profiling): add release note

48eeeed

chore(profiling): change coro chain logic

974ff92

KowalskiThomas force-pushed the kowalski/test-profiling-upper-bound-on-iterations-for-taskinfo-unwind branch from 0bc5769 to 974ff92 Compare February 17, 2026 13:05

gh-worker-dd-mergequeue-cf854d bot merged commit 0cfe067 into main Feb 17, 2026
392 checks passed

gh-worker-dd-mergequeue-cf854d bot deleted the kowalski/test-profiling-upper-bound-on-iterations-for-taskinfo-unwind branch February 17, 2026 16:01

emmettbutler added the backport 4.5 label Feb 17, 2026

dd-octo-sts bot mentioned this pull request Feb 17, 2026

fix(profiling): upper bound on iterations for TaskInfo::unwind [backport 4.5] #16542

Merged

Conversation

KowalskiThomas commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

cit-pr-commenter-54b7da bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codeowners resolved as

Uh oh!

This comment has been minimized.

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

P403n1x87 left a comment

Choose a reason for hiding this comment

Uh oh!

KowalskiThomas commented Feb 16, 2026

Uh oh!

Uh oh!

KowalskiThomas commented Feb 16, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KowalskiThomas commented Feb 16, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 bot commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What to do next?

Uh oh!

KowalskiThomas commented Feb 17, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What to do next?

Uh oh!

KowalskiThomas commented Feb 17, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What to do next?

Uh oh!

KowalskiThomas commented Feb 17, 2026

Uh oh!

gh-worker-devflow-routing-ef8351 bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

KowalskiThomas commented Feb 14, 2026 •

edited

Loading

cit-pr-commenter-54b7da bot commented Feb 14, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 bot commented Feb 16, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 bot commented Feb 16, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 bot commented Feb 17, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 bot commented Feb 17, 2026 •

edited

Loading

gh-worker-devflow-routing-ef8351 bot commented Feb 17, 2026 •

edited

Loading