Skip to content

chore: handle interpreters with multiple ranges#441

Merged
fabled merged 10 commits intoopen-telemetry:mainfrom
grafana:python-interpreter-cold-chunk-fix-part1
Jun 19, 2025
Merged

chore: handle interpreters with multiple ranges#441
fabled merged 10 commits intoopen-telemetry:mainfrom
grafana:python-interpreter-cold-chunk-fix-part1

Conversation

@korniltsev
Copy link
Copy Markdown
Contributor

@korniltsev korniltsev commented Apr 17, 2025

alpine:3.20 python3.12 has a separate _PyEval_EvalFrameDefault _PyEval_EvalFrameDefault.cold

readelf -s -W /usr/lib/debug/usr/lib/libpython3.12.so.1.0.debug | grep EvalFrameDef
  2143: 0000000000171780   362 FUNC    LOCAL  DEFAULT    9 _PyEval_EvalFrameDefault.localalias
  2234: 0000000000088a9a 56543 FUNC    LOCAL  DEFAULT    9 _PyEval_EvalFrameDefault.cold
 43791: 0000000000171780   362 FUNC    GLOBAL DEFAULT    9 _PyEval_EvalFrameDefault

Therefore ebpf code only assumes the interpreter function is only 362 bytes long and fails to select python unwinder as the meat of the interpreter is in the cold one.

This PR is a second attempt of #414 (comment)

Specifically

Add support for handling interpreter functions with two ranges. Both in userspace and kernel. Without actually providing second range just yet.

This is a preparation for an actual fix of #416 which will be submitted as a followup

@korniltsev korniltsev requested review from a team as code owners April 17, 2025 04:37
@korniltsev korniltsev force-pushed the python-interpreter-cold-chunk-fix-part1 branch 4 times, most recently from 77a8035 to d1fc8f3 Compare April 17, 2025 10:35
@korniltsev
Copy link
Copy Markdown
Contributor Author

hehe I think hit the 4.19 kernel insn limit.

kprobe/unwind_native has 4111 instructions

@korniltsev
Copy link
Copy Markdown
Contributor Author

korniltsev commented Apr 17, 2025

I removed 4.19 tests in the light of #440 Edit: rebased on top of main

@korniltsev korniltsev force-pushed the python-interpreter-cold-chunk-fix-part1 branch from a498c32 to 42bc6f1 Compare April 17, 2025 11:16
Comment thread processmanager/ebpf/ebpf.go Outdated
Comment thread processmanager/ebpf/ebpf.go Outdated
Comment thread processmanager/ebpf/ebpf.go Outdated

func InterpreterOffsetKeyValue(ebpfProgIndex uint16, fileID host.FileID,
offsetRanges []util.Range) (key uint64, value C.OffsetRange, err error) {
if len(offsetRanges) != 1 && len(offsetRanges) != 2 {
Copy link
Copy Markdown
Member

@christos68k christos68k May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here (or a few lines later, when we check for len(offetRanges) == 2) clarifying why len(offsetRanges) can be 1 or 2 (I know it's in the eBPF code but it's not obvious here)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've updated the comment in the ebpf code and duplicated it into go code.

@korniltsev
Copy link
Copy Markdown
Contributor Author

@christos68k Thanks for the review. I will address your comments on Monday.

korniltsev and others added 3 commits May 2, 2025 17:59
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
@korniltsev korniltsev requested a review from christos68k June 1, 2025 15:33
Copy link
Copy Markdown
Member

@christos68k christos68k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (didn't test)

Comment thread interpreter/python/python.go Outdated
Comment thread processmanager/ebpf/ebpf.go Outdated
Comment thread support/ebpf/types.h Outdated
korniltsev and others added 2 commits June 4, 2025 11:53
Co-authored-by: Christos Kalkanis <christos.kalkanis@elastic.co>
@korniltsev
Copy link
Copy Markdown
Contributor Author

korniltsev commented Jun 4, 2025

@christos68k Thank you for your review

(didn't test)

One can test it like this - add an extra range manually
For example use python from the current alpine:3.20 docker image

diff --git a/interpreter/python/python.go b/interpreter/python/python.go
index 7a1a6cc..1e25e37 100644
--- a/interpreter/python/python.go
+++ b/interpreter/python/python.go
@@ -858,5 +858,6 @@ func findInterpreterRanges(info *interpreter.LoaderInfo) ([]util.Range, error) {
        // TODO(korniltsev): find cold ranges
        // see tools/coredump/testdata/amd64/alpine320-nobuildid.json
        // https://github.com/open-telemetry/opentelemetry-ebpf-profiler/issues/416
+       interpRanges = append(interpRanges, util.Range{Start: 0x8a934, End: 0x8a934 + 56456}) // obtained with readelf from the debug elf
        return interpRanges, nil
 }

Before:
Screenshot_20250604_120406
After:
Screenshot_20250604_120541

I have a proposal branch that recovers cold ranges automatically here main...grafana:opentelemetry-ebpf-profiler:python-interpreter-cold-chunk-fix-part2
The branch is big, I will be submitting the proposal in two separate followups later. (the first chunk is #447 )

Comment thread interpreter/python/python.go Outdated
Comment thread support/ebpf/tracemgmt.h Outdated
Comment on lines +443 to +445
if (
((section_offset >= range->lower_offset1) && (section_offset <= range->upper_offset1)) ||
((section_offset >= range->lower_offset2) && (section_offset <= range->upper_offset2))) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds ebpf code and makes it slower. And there's no users for this yet. Could this be commented out? Perhaps also split this out to a function to make this a bit more readable. Then the commented code could be in a #if 0 instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you anticipate the follow up PR to happen soon, we can already leave this test enabled. Maybe still separate the test to a helper function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted to a separate function. I prefer to leave the test enabled.

Comment thread tools/coredump/coredump_test.go Outdated
Comment on lines +14 to +18
var skip = map[string]bool{
// https://github.com/open-telemetry/opentelemetry-ebpf-profiler/issues/416
"testdata/amd64/alpine320-nobuildid.json": true,
"testdata/amd64/alpine320.json": true,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather add the skip field to the json file perhaps as a string, where the string is the link to the ticket explaining why it fails.

@@ -0,0 +1,87 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps rename the file to python3-alpine320-nobuildid.json or similar to include the interpreter in the name

@@ -0,0 +1,85 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this too to include python312

@korniltsev korniltsev requested a review from fabled June 19, 2025 06:02
Comment thread tools/coredump/json.go Outdated
Co-authored-by: Timo Teräs <timo.teras@iki.fi>
@fabled fabled merged commit d656b55 into open-telemetry:main Jun 19, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants