Skip to content

Fix python3.12 on ubuntu x86-64#284

Closed
gnurizen wants to merge 2 commits intoopen-telemetry:mainfrom
parca-dev:fix-py312-up
Closed

Fix python3.12 on ubuntu x86-64#284
gnurizen wants to merge 2 commits intoopen-telemetry:mainfrom
parca-dev:fix-py312-up

Conversation

@gnurizen
Copy link
Copy Markdown
Contributor

@gnurizen gnurizen commented Dec 17, 2024

Fix from: DataDog@630fd97

This error occurs on stock python 3.12 on ubuntu x86-64.

Fixes issue: #251

@gnurizen gnurizen requested review from a team as code owners December 17, 2024 14:24
@Gandem
Copy link
Copy Markdown
Member

Gandem commented Dec 17, 2024

Hey @gnurizen 👋 The commit I authored in DataDog@630fd97 unfortunately doesn't fully fix #251.

I think we should consider the alternative described in:

b. Add more logic in disassembly code: Add logic in decode_amd64.c to take into account variables in temporary registers, and the case where %rdi is populated via mov then add. We could potentially fix both issues at the cost of additionally complexity.

So we better cover all the edge cases discussed in that issue (didn't get around to implementing this yet though).

Additionally, as far as I can tell, the packaged python in ubuntu is compiled with --enable-optimizations, --with-lto, so this shouldn't impact stock python 3.12 on ubuntu x86-64. Do you have more information on how you reproduced this? We might have missed something on the original issue.

@athre0z
Copy link
Copy Markdown
Member

athre0z commented Mar 15, 2025

Maybe it would still make sense to update this branch merge it until someone finds the time to look into the concerns raised by @Gandem above, turning them into an issue to be addressed later? The proposed patch certainly improves things already compared to present state on main where Python 3.12 just doesn't work at all. I think everyone is cherry-picking this on their forks already anyway.

@Gandem
Copy link
Copy Markdown
Member

Gandem commented Mar 15, 2025

There's an open issue in #251 (comment)

My understanding is that the issue fixed by this change only impacts python 3.12 when it's compiled without LTO and without optimizations (which is not the case for most distributions of python), and the fix should only work for that very specific edge case.

@athre0z Do you have more information on how you were able to reproduce this?

I'm definitely onboard with getting this merged independently of the longer term fix, especially if this is indeed impacting python 3.12 more generally.

@gnurizen
Copy link
Copy Markdown
Contributor Author

gnurizen commented Mar 18, 2025

so this shouldn't impact stock python 3.12 on ubuntu x86-64. Do you have more information on how you reproduced this? We might have missed something on the original issue.

$ lsb_release -d
No LSB modules are available.
Description:	Ubuntu 24.04.2 LTS
$ python3 --version
Python 3.12.3
(lldb) disass -n PyGILState_GetThisThreadState
python3`PyGILState_GetThisThreadState:
python3[0x608580] <+0>:  endbr64 
python3[0x608584] <+4>:  pushq  %rbp
python3[0x608585] <+5>:  movl   $0xb352c8, %edi ; imm = 0xB352C8 
python3[0x60858a] <+10>: movq   %rsp, %rbp
python3[0x60858d] <+13>: callq  0x610840       ; PyThread_tss_is_created at thread.c:102:1
python3[0x608592] <+18>: testl  %eax, %eax
python3[0x608594] <+20>: je     0x4b2906       ; PyGILState_GetThisThreadState.cold
python3[0x60859a] <+26>: movl   $0xb352c8, %edi ; imm = 0xB352C8 
python3[0x60859f] <+31>: popq   %rbp
python3[0x6085a0] <+32>: jmp    0x610850       ; PyThread_tss_get at thread_pthread.h:922:1

So the stock python has the same problem as described in #251

Upon further digging, the stock python 3.12 works fine, the code extracts the autoTLSKey from the first arg to PyThread_tss_is_created just fine. I'm not sure what I thought I was fixing with this patch, it might have been the python 3.12 docker images and not the ubuntu package.

@gnurizen
Copy link
Copy Markdown
Contributor Author

I've also confirmed that as @Gandem says in the bug the workaround fails for bundled python in the google-cloud-sdk so this isn't a complete solution.

Fix from: DataDog@630fd97

This error occurs on stock python 3.12 on ubuntu x86-64.
There's a ton of variety in how these functions get compiled on amd64
with inlining and other optimizations.  Fall back to what appear to
be relatively stable hard coded offsets for the autoTLSkey on amd64.
@korniltsev
Copy link
Copy Markdown
Contributor

I've submitted an alternative draft for proposal B described in the tracking issue, PTAL

#412

@athre0z
Copy link
Copy Markdown
Member

athre0z commented Mar 20, 2025

@athre0z Do you have more information on how you were able to reproduce this?

I'm not entirely sure, to be honest. I know that the patch has significantly reduced the number of errors about autoTLSkey not being found on my machine in the past, and I had a println in the new branch trigger and resolve to correct key with this patch. However after I added enough printing to actually pin down the process / container, it suddenly stopped reproducing. Seems to be some Python instance / container that occasionally pops up in the background on my machine that profits from this, but I'm not sure which one it is.

[...] fails for bundled python in the google-cloud-sdk so this isn't a complete solution.

Yeah, that much I can confirm -- had the same failure for the GCP SDK on my box even with this patch.

@gnurizen
Copy link
Copy Markdown
Contributor Author

gnurizen commented Mar 20, 2025

I'll leave this open for now in case folks want to opine on the worthiness of the hardcoded approach vs. smarter decoder approach in #412 but I suspect the #412 approach will be more palatable.

@fabled
Copy link
Copy Markdown
Contributor

fabled commented Apr 9, 2025

I'll leave this open for now in case folks want to opine on the worthiness of the hardcoded approach vs. smarter decoder approach in #412 but I suspect the #412 approach will be more palatable.

Yes, the other PR is preferred. Any objection if we just close this one?

@fabled fabled closed this Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants