Move native unwinder impl to a .h by gnurizen · Pull Request #1280 · open-telemetry/opentelemetry-ebpf-profiler

gnurizen · 2026-03-23T16:38:46Z

This is a prep the patient PR to make room for a hybrid python/native
unwinder that we found necessary to unwind large pytorch stacks that
go back and forth from python to native more times than the tail call
limit will allow.

This change is pure code motion and changes nothing functionally.

gnurizen · 2026-03-23T16:46:27Z

This is the hybrid python/native unwinder we're currently rolling with: parca-dev@3695a06

fabled · 2026-03-23T16:52:42Z

Since we now allow bounded loops. I wonder if we could merge all unwinders to one program. Then this would automatically on all combinations.

See: #1171 (comment)

gnurizen · 2026-03-23T16:57:14Z

The problem is the verifier doesn't seem to care about bounded loops vs unwound code. We still hit the 1M instruction count limit. The changes we currently have allow the hybrid unwinder to unwind around ~260 frames. We're looking at bumping that up by changing the DEBUG_PRINT statements but that's probably a separate discussion.

This is a prep the patient PR to make room for a hybrid python/native unwinder that we found necessary to unwind large pytorch stacks that go back and forth from python to native more times than the tail call limit will allow. This change is pure code motion and changes nothing functionally.

gnurizen · 2026-03-27T19:16:42Z

Actually we've abandoned trying to futz with DEBUG_PRINTs to bring the instruction count down and found a better approach, will outline more in the final PR #1288.

fabled · 2026-04-08T12:48:45Z

The problem is the verifier doesn't seem to care about bounded loops vs unwound code. We still hit the 1M instruction count limit. The changes we currently have allow the hybrid unwinder to unwind around ~260 frames. We're looking at bumping that up by changing the DEBUG_PRINT statements but that's probably a separate discussion.

Did you do the match on the 1M instruction count limit being near?

On main currently:
kprobe/unwind_beam has 987 instructions
kprobe/unwind_dotnet has 1642 instructions
kprobe/unwind_dotnet10 has 1538 instructions
kprobe/go_labels has 665 instructions
kprobe/unwind_hotspot has 2009 instructions
kprobe/unwind_stop has 1563 instructions
kprobe/unwind_native has 1516 instructions
kprobe/unwind_perl has 1613 instructions
kprobe/unwind_php has 957 instructions
kprobe/unwind_python has 1529 instructions
kprobe/unwind_ruby has 1575 instructions
kprobe/unwind_v8 has 1657 instructions

987+1642+1538+665+2009+1563+1516+1613+957+1529+1575+1656 = 17250

1000000/17250 is roughly 58. So we should get 58 iterations per 1M instructions on a function containing all unwinders.

We can probably eliminate some glue code from each per-interpreter thing if going this direction.

Not sure if verifier can handle all this easily. But I think this should be doable.

Actually we've abandoned trying to futz with DEBUG_PRINTs to bring the instruction count down and found a better approach, will outline more in the final PR #1288.

Nice trick to put the iteration count to a read only variable.

While I'm not particularly fond of special casing python. The work seems to be done and it improves things. Should this be closed and continue review at #1288. I'd rather do this in one review cycle if possible.

gnurizen · 2026-04-08T16:35:21Z

Thanks for looking!

Raw instructions and the 1M instruction limit of the verifier aren't counting the same thing. The 1M instruction limit is based on the maximal instructions visited on its walk of the instructions. The only good way I've found to measure this is to load the program with the verifier in verbose instruction mode and parse the output. For this branch I see:

Program (loops)                           Verified     Raw  Limit
---------------------------------------- --------- -------  --------------------
perf_unwind_python(6)                       452849    2497   45.3% ######################
perf_unwind_native(5)                       219518    1305   22.0% ##########
perf_unwind_ruby(32)                         85103    1130    8.5% ####
perf_unwind_beam(8)                          41898     860    4.2% ##
perf_unwind_dotnet(6)                        28544    1389    2.9% #
perf_unwind_dotnet10(9)                      27734    1300    2.8% #
perf_unwind_hotspot(4)                       22909    1733    2.3% #
perf_unwind_v8(8)                            18755    1441    1.9%
perf_unwind_php(19)                          15552     812    1.6%
perf_unwind_perl(12)                         14642    1393    1.5%

So while my hybrid python/native unwinder only has 2497 instructions the verifier counts 452k. One more loop iteration and it hits the limit (there's a fudge factor because some kernels are more/less efficient at pruning the search space I'm guessing).

Main looks like this:

Program (loops)                           Verified     Raw  Limit
---------------------------------------- --------- -------  --------------------
perf_unwind_native(5)                       222994    1305   22.3% ###########
perf_unwind_ruby(32)                         85103    1130    8.5% ####
perf_unwind_beam(8)                          41898     860    4.2% ##
perf_unwind_python(12)                       31904    1329    3.2% #
perf_unwind_dotnet(6)                        28544    1389    2.9% #
perf_unwind_dotnet10(9)                      27734    1300    2.8% #
perf_unwind_hotspot(4)                       22909    1733    2.3% #
perf_unwind_v8(8)                            18755    1441    1.9%
perf_unwind_php(19)                          15552     812    1.6%
perf_unwind_perl(12)                         14642    1393    1.5%

I'll close this PR in favor of #1288, wasn't sure if folks preferred one PR with multiple commits or multiple PRs with fewer commits each.

gnurizen marked this pull request as ready for review March 23, 2026 16:46

gnurizen requested review from a team as code owners March 23, 2026 16:46

gnurizen closed this Mar 23, 2026

gnurizen reopened this Mar 23, 2026

gnurizen force-pushed the py-nat-hybrid branch from 856ed32 to f35b934 Compare March 26, 2026 16:14

gnurizen closed this Apr 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move native unwinder impl to a .h#1280

Move native unwinder impl to a .h#1280
gnurizen wants to merge 1 commit into
open-telemetry:mainfrom
parca-dev:py-nat-hybrid

gnurizen commented Mar 23, 2026

Uh oh!

gnurizen commented Mar 23, 2026

Uh oh!

fabled commented Mar 23, 2026

Uh oh!

gnurizen commented Mar 23, 2026

Uh oh!

gnurizen commented Mar 27, 2026

Uh oh!

fabled commented Apr 8, 2026

Uh oh!

gnurizen commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gnurizen commented Mar 23, 2026

Uh oh!

gnurizen commented Mar 23, 2026

Uh oh!

fabled commented Mar 23, 2026

Uh oh!

gnurizen commented Mar 23, 2026

Uh oh!

gnurizen commented Mar 27, 2026

Uh oh!

fabled commented Apr 8, 2026

Uh oh!

gnurizen commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gnurizen commented Apr 8, 2026 •

edited

Loading