fix(python): handle cold interpreter func ranges#414
fix(python): handle cold interpreter func ranges#414korniltsev wants to merge 4 commits intoopen-telemetry:mainfrom
Conversation
- handle cold interpreter func chunks - add coredump alpine320, alpine320-nobuildid
|
I don't really like the debug-file symbol lookup. I'm thinking to rollback this change (and keep only the hardcoded table for now) until we find a better fix. |
|
Hacky but I solved a similar problem for luajit by using the size of the function by looking at the stack deltas. So maybe if the size of the function is less than some cut off (ie 1000) decide we have a .cold situation and walk the stack deltas and assume the largest function in the binary is the rest of the interpreter. The 3 largest functions in python: So seems like there's a large margin for error. |
|
I don't think we can use the size of the function unfortunately. This is from alpine 3.21
Nice one, I will take a look. |
|
dumping another idea for validation: |
Interesting. Does the hot function call the cold function as a normal function call or does it just jmp between the two? I guess it must be a jmp or this wouldn't be an issue? |
It jumps. Either direct/conditional jump or indirect through switch/case jump table. |
merging open-telemetry#414 Squashed commit of the following: commit 43b26dc Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Mon Mar 24 11:18:47 2025 +0700 revert debug-file-lookup changes commit 71ce508 Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Fri Mar 21 18:41:36 2025 +0700 update arm kernel blobs commit 1346f70 Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Fri Mar 21 18:36:21 2025 +0700 Lint commit 85861b1 Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Fri Mar 21 16:16:16 2025 +0700 fix(python): handle cold interpreter func chunks - handle cold interpreter func chunks - add coredump alpine320, alpine320-nobuildid
|
Well, that escalated quickly. I found the following images with cold function chunks I will mark the PR as a draft as I don't believe this should be merged in this form. |
|
I propose to split this PR in two pieces.
I will close this and submit a separate PR's for the above. |
Squashed commit of the following:
commit 69d4ab0e79a6073b7e67ef3e0fa732647f928c1f
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date: Tue Mar 25 13:13:01 2025 +0700
add more alpines
commit 43b26dc
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date: Mon Mar 24 11:18:47 2025 +0700
revert debug-file-lookup changes
commit 71ce508
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date: Fri Mar 21 18:41:36 2025 +0700
update arm kernel blobs
commit 1346f70
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date: Fri Mar 21 18:36:21 2025 +0700
Lint
commit 85861b1
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date: Fri Mar 21 16:16:16 2025 +0700
fix(python): handle cold interpreter func chunks
- handle cold interpreter func chunks
- add coredump alpine320, alpine320-nobuildid
# Conflicts:
# interpreter/loaderinfo.go
cherry-pick open-telemetry#414 but without hardcoded cold ranges
cherry-pick open-telemetry#414 but without hardcoded cold ranges
alpine:3.20 python3.12 has a separate
_PyEval_EvalFrameDefault_PyEval_EvalFrameDefault.coldreadelf -s -W /usr/lib/debug/usr/lib/libpython3.12.so.1.0.debug | grep EvalFrameDef 2143: 0000000000171780 362 FUNC LOCAL DEFAULT 9 _PyEval_EvalFrameDefault.localalias 2234: 0000000000088a9a 56543 FUNC LOCAL DEFAULT 9 _PyEval_EvalFrameDefault.cold 43791: 0000000000171780 362 FUNC GLOBAL DEFAULT 9 _PyEval_EvalFrameDefaultTherefore ebpf code only assumes the interpreter function is only 362 bytes long and fails to select python unwinder as the meat of the interpreter is in the cold one.
In this PR I fix this issue in two ways.
Debug file lookup (at least gives someone an opportunity to install a debug package and allow profiling). The downside is that we run this check even for interpreters who has no cold functions. Maybe we can come up with some heuristic to not waste resources.Edit: I've reverted the change to keep the PR smaller and not introduce unnecesary resource consumption. I've created a tracking issue python: handle cold interpreter func ranges #416the debug file lookupEDIT: the second testalpine320-nobuildid.jsonis skipped for now)