fix(profiling): upper bound on iterations for TaskInfo::unwind#16510
Conversation
0cedab4 to
47da8ea
Compare
Codeowners resolved as |
This comment has been minimized.
This comment has been minimized.
TaskInfo::unwind
TaskInfo::unwindTaskInfo::unwind
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb0aecac07
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P403n1x87
left a comment
There was a problem hiding this comment.
Should we try to detect potential cycles as well?
Funnily(?) I had a PR that did exactly this but that I never merged (see here: #15712). We had discussions around this recently because detecting cycles means using hash maps and that is more costly than just using a counter. We probably should decide one way forward -- only counters or only hash sets (or bloom filters, possibly...) but the latest thing we settled on was "let's not introduce more hash sets" so that's what I followed here. |
releasenotes/notes/profiling-fix-max-iterations-unwind-tasks-671d743912c7d600.yaml
Show resolved
Hide resolved
5abd871 to
8d12117
Compare
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
The merge request has been interrupted because the build 96780885 took longer than expected. The current limit for the base branch 'main' is 120 minutes. |
|
/merge |
8d12117 to
0bc5769
Compare
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
Tests failed on this commit f6718bc: What to do next?
|
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
Tests failed on this commit 04dbd01: What to do next?
|
|
/merge |
|
View all feedbacks in Devflow UI.
The expected merge time in
Tests failed on this commit f5a3b91: What to do next?
|
0bc5769 to
974ff92
Compare
|
/merge |
|
View all feedbacks in Devflow UI.
This pull request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
The expected merge time in
|
|
This change is marked for backport to 4.5 and it does not conflict with that branch. |
) ## Description This PR updates the Task unwinding logic for the Profiler to have an upper bound on the number of (1) Tasks in the Task chain unwound (2) coroutines in the coroutine chain unwound. This is important because if somehow we have some memory corruption (very possible, as we don't take a snapshot of the interpreter memory but rather copy select "chunks" over time, and the state of Tasks can change as we copy those "chunks"), we could otherwise end up looping infinitely (which is bad for obvious reasons) and as a result try to add an infinite number of items to the Frame Stack (which is arguably significantly worse, as this would mean trying to allocate an infinite amount of memory 💣). We spotted this issue when we deployed `4.5.0rc2` to internal Rapid Python HTTP services, see IR-49542. Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com> (cherry picked from commit 0cfe067)
) ## Description This PR updates the Task unwinding logic for the Profiler to have an upper bound on the number of (1) Tasks in the Task chain unwound (2) coroutines in the coroutine chain unwound. This is important because if somehow we have some memory corruption (very possible, as we don't take a snapshot of the interpreter memory but rather copy select "chunks" over time, and the state of Tasks can change as we copy those "chunks"), we could otherwise end up looping infinitely (which is bad for obvious reasons) and as a result try to add an infinite number of items to the Frame Stack (which is arguably significantly worse, as this would mean trying to allocate an infinite amount of memory 💣). We spotted this issue when we deployed `4.5.0rc2` to internal Rapid Python HTTP services, see IR-49542. Co-authored-by: thomas.kowalski <thomas.kowalski@datadoghq.com> (cherry picked from commit 0cfe067) Signed-off-by: Emmett Butler <emmett.butler321@gmail.com>
Description
This PR updates the Task unwinding logic for the Profiler to have an upper bound on the number of (1) Tasks in the Task chain unwound (2) coroutines in the coroutine chain unwound.
This is important because if somehow we have some memory corruption (very possible, as we don't take a snapshot of the interpreter memory but rather copy select "chunks" over time, and the state of Tasks can change as we copy those "chunks"), we could otherwise end up looping infinitely (which is bad for obvious reasons) and as a result try to add an infinite number of items to the Frame Stack (which is arguably significantly worse, as this would mean trying to allocate an infinite amount of memory 💣).
We spotted this issue when we deployed
4.5.0rc2to internal Rapid Python HTTP services, see IR-49542.