Skip to content

Move native unwinder impl to a .h#1280

Closed
gnurizen wants to merge 1 commit into
open-telemetry:mainfrom
parca-dev:py-nat-hybrid
Closed

Move native unwinder impl to a .h#1280
gnurizen wants to merge 1 commit into
open-telemetry:mainfrom
parca-dev:py-nat-hybrid

Conversation

@gnurizen
Copy link
Copy Markdown
Contributor

This is a prep the patient PR to make room for a hybrid python/native
unwinder that we found necessary to unwind large pytorch stacks that
go back and forth from python to native more times than the tail call
limit will allow.

This change is pure code motion and changes nothing functionally.

@gnurizen
Copy link
Copy Markdown
Contributor Author

This is the hybrid python/native unwinder we're currently rolling with: parca-dev@3695a06

@gnurizen gnurizen marked this pull request as ready for review March 23, 2026 16:46
@gnurizen gnurizen requested review from a team as code owners March 23, 2026 16:46
@fabled
Copy link
Copy Markdown
Contributor

fabled commented Mar 23, 2026

Since we now allow bounded loops. I wonder if we could merge all unwinders to one program. Then this would automatically on all combinations.

See: #1171 (comment)

@gnurizen gnurizen closed this Mar 23, 2026
@gnurizen gnurizen reopened this Mar 23, 2026
@gnurizen
Copy link
Copy Markdown
Contributor Author

The problem is the verifier doesn't seem to care about bounded loops vs unwound code. We still hit the 1M instruction count limit. The changes we currently have allow the hybrid unwinder to unwind around ~260 frames. We're looking at bumping that up by changing the DEBUG_PRINT statements but that's probably a separate discussion.

This is a prep the patient PR to make room for a hybrid python/native
unwinder that we found necessary to unwind large pytorch stacks that
go back and forth from python to native more times than the tail call
limit will allow.

This change is pure code motion and changes nothing functionally.
@gnurizen
Copy link
Copy Markdown
Contributor Author

Actually we've abandoned trying to futz with DEBUG_PRINTs to bring the instruction count down and found a better approach, will outline more in the final PR #1288.

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Apr 8, 2026

The problem is the verifier doesn't seem to care about bounded loops vs unwound code. We still hit the 1M instruction count limit. The changes we currently have allow the hybrid unwinder to unwind around ~260 frames. We're looking at bumping that up by changing the DEBUG_PRINT statements but that's probably a separate discussion.

Did you do the match on the 1M instruction count limit being near?

On main currently:
kprobe/unwind_beam has 987 instructions
kprobe/unwind_dotnet has 1642 instructions
kprobe/unwind_dotnet10 has 1538 instructions
kprobe/go_labels has 665 instructions
kprobe/unwind_hotspot has 2009 instructions
kprobe/unwind_stop has 1563 instructions
kprobe/unwind_native has 1516 instructions
kprobe/unwind_perl has 1613 instructions
kprobe/unwind_php has 957 instructions
kprobe/unwind_python has 1529 instructions
kprobe/unwind_ruby has 1575 instructions
kprobe/unwind_v8 has 1657 instructions

987+1642+1538+665+2009+1563+1516+1613+957+1529+1575+1656 = 17250

1000000/17250 is roughly 58. So we should get 58 iterations per 1M instructions on a function containing all unwinders.

We can probably eliminate some glue code from each per-interpreter thing if going this direction.

Not sure if verifier can handle all this easily. But I think this should be doable.

Actually we've abandoned trying to futz with DEBUG_PRINTs to bring the instruction count down and found a better approach, will outline more in the final PR #1288.

Nice trick to put the iteration count to a read only variable.

While I'm not particularly fond of special casing python. The work seems to be done and it improves things. Should this be closed and continue review at #1288. I'd rather do this in one review cycle if possible.

@gnurizen
Copy link
Copy Markdown
Contributor Author

gnurizen commented Apr 8, 2026

Thanks for looking!

Raw instructions and the 1M instruction limit of the verifier aren't counting the same thing. The 1M instruction limit is based on the maximal instructions visited on its walk of the instructions. The only good way I've found to measure this is to load the program with the verifier in verbose instruction mode and parse the output. For this branch I see:

Program (loops)                           Verified     Raw  Limit
---------------------------------------- --------- -------  --------------------
perf_unwind_python(6)                       452849    2497   45.3% ######################
perf_unwind_native(5)                       219518    1305   22.0% ##########
perf_unwind_ruby(32)                         85103    1130    8.5% ####
perf_unwind_beam(8)                          41898     860    4.2% ##
perf_unwind_dotnet(6)                        28544    1389    2.9% #
perf_unwind_dotnet10(9)                      27734    1300    2.8% #
perf_unwind_hotspot(4)                       22909    1733    2.3% #
perf_unwind_v8(8)                            18755    1441    1.9%
perf_unwind_php(19)                          15552     812    1.6%
perf_unwind_perl(12)                         14642    1393    1.5%

So while my hybrid python/native unwinder only has 2497 instructions the verifier counts 452k. One more loop iteration and it hits the limit (there's a fudge factor because some kernels are more/less efficient at pruning the search space I'm guessing).

Main looks like this:

Program (loops)                           Verified     Raw  Limit
---------------------------------------- --------- -------  --------------------
perf_unwind_native(5)                       222994    1305   22.3% ###########
perf_unwind_ruby(32)                         85103    1130    8.5% ####
perf_unwind_beam(8)                          41898     860    4.2% ##
perf_unwind_python(12)                       31904    1329    3.2% #
perf_unwind_dotnet(6)                        28544    1389    2.9% #
perf_unwind_dotnet10(9)                      27734    1300    2.8% #
perf_unwind_hotspot(4)                       22909    1733    2.3% #
perf_unwind_v8(8)                            18755    1441    1.9%
perf_unwind_php(19)                          15552     812    1.6%
perf_unwind_perl(12)                         14642    1393    1.5%

I'll close this PR in favor of #1288, wasn't sure if folks preferred one PR with multiple commits or multiple PRs with fewer commits each.

@gnurizen gnurizen closed this Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants