Move native unwinder impl to a .h#1280
Conversation
|
This is the hybrid python/native unwinder we're currently rolling with: parca-dev@3695a06 |
|
Since we now allow bounded loops. I wonder if we could merge all unwinders to one program. Then this would automatically on all combinations. See: #1171 (comment) |
|
The problem is the verifier doesn't seem to care about bounded loops vs unwound code. We still hit the 1M instruction count limit. The changes we currently have allow the hybrid unwinder to unwind around ~260 frames. We're looking at bumping that up by changing the DEBUG_PRINT statements but that's probably a separate discussion. |
This is a prep the patient PR to make room for a hybrid python/native unwinder that we found necessary to unwind large pytorch stacks that go back and forth from python to native more times than the tail call limit will allow. This change is pure code motion and changes nothing functionally.
|
Actually we've abandoned trying to futz with DEBUG_PRINTs to bring the instruction count down and found a better approach, will outline more in the final PR #1288. |
Did you do the match on the 1M instruction count limit being near? On 987+1642+1538+665+2009+1563+1516+1613+957+1529+1575+1656 = 17250 1000000/17250 is roughly 58. So we should get 58 iterations per 1M instructions on a function containing all unwinders. We can probably eliminate some glue code from each per-interpreter thing if going this direction. Not sure if verifier can handle all this easily. But I think this should be doable.
Nice trick to put the iteration count to a read only variable. While I'm not particularly fond of special casing python. The work seems to be done and it improves things. Should this be closed and continue review at #1288. I'd rather do this in one review cycle if possible. |
|
Thanks for looking! Raw instructions and the 1M instruction limit of the verifier aren't counting the same thing. The 1M instruction limit is based on the maximal instructions visited on its walk of the instructions. The only good way I've found to measure this is to load the program with the verifier in verbose instruction mode and parse the output. For this branch I see: So while my hybrid python/native unwinder only has 2497 instructions the verifier counts 452k. One more loop iteration and it hits the limit (there's a fudge factor because some kernels are more/less efficient at pruning the search space I'm guessing). Main looks like this: I'll close this PR in favor of #1288, wasn't sure if folks preferred one PR with multiple commits or multiple PRs with fewer commits each. |
This is a prep the patient PR to make room for a hybrid python/native
unwinder that we found necessary to unwind large pytorch stacks that
go back and forth from python to native more times than the tail call
limit will allow.
This change is pure code motion and changes nothing functionally.