fix(python): fix stub decoding routine#412
Conversation
| uint64_t decode_stub_argument(const uint8_t* code, size_t codesz, uint8_t argument_no, | ||
| uint64_t rip_base, uint64_t memory_base) { | ||
| // todo update comment | ||
| // todo rewrite in go |
There was a problem hiding this comment.
For what its worth we've gotten pretty good mileage out of go's builtin disassembler: https://github.com/parca-dev/opentelemetry-ebpf-profiler/blob/main/interpreter/luajit/extractor_x86.go
Its shortcomings don't really manifest for simple cases.
There was a problem hiding this comment.
The long history is that we needed Zydis originally for the stack delta extraction filtering, and the go disassembler did not recognize all opcodes. We wanted to keep only single disassembler so Zydis was used for all other purposes as well. I think the go built in disassembler has improved since, and the stub disassembly is more constrained, this could probably an option. Especially since we use the go builtin disassembler for all arm64 side.
There was a problem hiding this comment.
The go x86 disassembler is definitely still broken for anything meaty but I found for simple tasks like this it works well. Personally I find avoiding cgo when possible is worth it and if I was gonna undertake any new decoder tasks I'd probably take a hard look at https://github.com/zyantific/zydis-go.
There was a problem hiding this comment.
It's a bit slower but I think it's worth having less unsafe code in the program running as privileged root parsing untrusted user data.
cpu: 13th Gen Intel(R) Core(TM) i7-1365U
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
DecodeAmd64-12 2.000µ ± 13% 2.732µ ± 8% +36.63% (p=0.000 n=10)
I think rewriting in go should be a separate issue though. I will submit a separate change if we merge #412
Did a quick look into zydis-go and If I understand correctly - it embeds some precompiled native binaries and I'm not excited to use this for the same reasons described above to not use memory unsafe languages and lack of transparency.
There was a problem hiding this comment.
Although I'm happy to apply it here in this PR as well. Both go and C versions looks like almost complete rewrite (touches all the lines )
There was a problem hiding this comment.
@christos68k @florianl @athre0z Do you see any blockers for not using the built-in Go disassembler for amd64?
Currently we have Zydis used in interpreter/php, interpreter/python, and tpbase (for kernel/fsbase and libc/pthread).
The commit referenced would fix php. I'm available to rewrite the three others which are simpler. This would allow cross-running amd64 tests on arm64 builds, and remove the 12M .c drop from this repository.
I believe the Go code will be more maintainable and safer. The original reason for including Zydis was the stack delta extraction code inspecting call return locations to determine which stack deltas to keep and thus had requirements the Go disassembler was not able to handle. But this code is long gone, and I'd like to simplify our codebase now.
Does any anyone object this? If no, I would prefer to review the Go version for this fix.
|
This is great and thanks for undertaking this! You might consider a docker based testing approach (https://github.com/parca-dev/opentelemetry-ebpf-profiler/blob/main/interpreter/luajit/offsets_test.go) but I think this approach here is fine too. As long as we have tests! |
|
The nice thing about docker tests is that you can have one be for the "latest" so when new dot releases CI can fail if something changes. I suspect most of the "interesting" instruction sequences we're seeing are coming from PGO/LTO builds which the stock docker builds are not, but apparently the google cloud sdk builds are so we could use those in addition. |
Thank you for suggestion. Overall I like the idea to have more tests with the help of docker images. We should create a separate issue to not overwhelm the scope of this PR. |
merging open-telemetry#412 Squashed commit of the following: commit b0d9b46 Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Fri Mar 21 11:18:40 2025 +0700 lint commit aa5ef11 Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Thu Mar 20 23:53:53 2025 +0700 more tests, debug dump code on failure commit e4f1c2a Merge: 6a8e9fc 192111e Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Thu Mar 20 14:57:17 2025 +0700 Merge branch 'main' into python-fix commit 6a8e9fc Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Thu Mar 20 14:37:15 2025 +0700 lint commit b8f926a Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com> Date: Thu Mar 20 11:39:13 2025 +0700 python stub decoding improvements
|
Tagging this as review-ready to get some eyes on the pull request. Still want and intend to upload some coredumps once we have a green light. |
fabled
left a comment
There was a problem hiding this comment.
Thanks! Looks pretty good already. Some suggestions added. I'd mostly like to see the debug flag go away and use normal logging. The useful pieces of the code can be moved to a disassembly helper module in a follow up PR if preferred that way.
| import "C" | ||
| const debugDecodeAMD64 = false | ||
|
|
||
| func regIndex(reg x86asm.Reg) int { |
There was a problem hiding this comment.
We may want this in adm64helpers or similar module to share with the other places to be converted. Similar to armhelpers which already exist. Perhaps those should go to some better descriptive subdirectory like disasm/amd64 and disasm/arm64. This can be also done in a follow up MR if preferred.
| value uint64 | ||
| } | ||
|
|
||
| func decodeStubArgumentAMD64(code []byte, codeAddress, memoryBase uint64) uint64 { |
There was a problem hiding this comment.
This should be in decode.go where the equivalent arm decoder is. That allows running the amd64 tests in arm64 builds.
| func decodeStubArgumentWrapper(code []byte, argNumber uint8, symbolValue, | ||
| addrBase libpf.SymbolValue) libpf.SymbolValue { | ||
| return decodeStubArgumentWrapperX64(code, argNumber, symbolValue, addrBase) | ||
| func decodeStubArgumentWrapper( |
There was a problem hiding this comment.
Purpose of this wrapper was to hide the CGO stuff from other files. But now there is no CGO, so perhaps the wrapper could be just directly implementation, or drop the wrapper portion from name and update call sites?
| @@ -0,0 +1,204 @@ | |||
| // Copyright The OpenTelemetry Authors | |||
There was a problem hiding this comment.
This whole file should be merged to decode_test.go once the disassembler implementation is moved to decode.go. Then test suite works identically on arm64/amd64 and does not matter which CPU is on developers computer.
| dumpCode := func() { | ||
| log.Debugf("python stub code: %s", hex.Dump(code)) |
There was a problem hiding this comment.
How about changing debugStub to return also an error and put the dump there?
| if len(rem) >= 4 && | ||
| code[instructionOffset] == 0xf3 && | ||
| code[instructionOffset+1] == 0x0f && | ||
| code[instructionOffset+2] == 0x1e && | ||
| code[instructionOffset+3] == 0xfa { |
There was a problem hiding this comment.
So this is a instruction the disassembler does not support, but we need to recognize and skip? This should go to a helper module. But if preferred, I can do this when working with the other conversions and try to see what would be a nice api.
| // #include "decode_amd64.h" | ||
| // #include "../../support/ebpf/types.h" | ||
| import "C" | ||
| const debugDecodeAMD64 = false |
There was a problem hiding this comment.
I would prefer to drop this. The debug prints that are not important can be dropped, and the meaningful ones should use the same logging mechanism as other code. It was #ifdef in the C code only because adding interface to the logger from C was not found important enough.
|
sent coredump blobs here https://cloud-native.slack.com/archives/C03J794L0BV/p1743940210220949?thread_ts=1742556865.154259&cid=C03J794L0BV @fabled Thanks for review. I've updated the code accordingly. ptal |
| "fib+3 in /home/korniltsev/trash/fib.py:4", | ||
| "fib+3 in /home/korniltsev/trash/fib.py:4", | ||
| "<module>+6 in /home/korniltsev/trash/fib.py:7", | ||
| "ffffffffffffffffffffffffffffffff+0x0", |
There was a problem hiding this comment.
This worth creating an issue
fabled
left a comment
There was a problem hiding this comment.
LGTM. Perhaps the modulestore change should be separate PR? I think this is good enough to go in, and we can do rest of the asm helper improvements in follow up PRs when transforming the other instances. Thanks!
| if resp.StatusCode != http.StatusOK { | ||
| errorResponse, _ := io.ReadAll(resp.Body) | ||
| return "", fmt.Errorf("store returned %d %s", resp.StatusCode, errorResponse) | ||
| } | ||
|
|
||
| // Download the file to a temporary location to prevent half-complete modules on crashes. | ||
| file, err := os.CreateTemp(store.localCachePath, localTempPrefix) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to create local file: %w", err) | ||
| } | ||
| defer file.Close() |
There was a problem hiding this comment.
This perhaps should/could be a separate PR?
|
|
||
| package amd // import "go.opentelemetry.io/ebpf-profiler/asm/amd" | ||
|
|
||
| func IsEndbr64(code []byte) (isEndbr bool, size int) { |
There was a problem hiding this comment.
Can we add a comment to the function?
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
| // IsEndbr64 returns true if the first 4 bytes of the code is endbr64 instruction | ||
| // https://www.felixcloutier.com/x86/endbr64 | ||
| // The second returned argument is the size of the instruction which is always 4 | ||
| func IsEndbr64(code []byte) (isEndbr bool, size int) { |
There was a problem hiding this comment.
Could this instead just a Decode, DecodeSkippable or similar? I suspect we may need to add few more here...
fabled
left a comment
There was a problem hiding this comment.
Some stylistic nits that I'd like fixed. I can do moving of code to make the helpers more useful later on (so not a requirement for now).
| if len(code) >= 4 && | ||
| code[0] == 0xf3 && | ||
| code[1] == 0x0f && | ||
| code[2] == 0x1e && | ||
| code[3] == 0xfa { |
There was a problem hiding this comment.
opcodeEndBr64 := []byte{0xf3,0x0f,0x1e,0xfa}
...
switch {
cast bytes.HasPrefix(code, opcodeEndbr64):
...
}
Possibly needs another case to keep lint happy?
| case x86asm.RSP, x86asm.ESP: | ||
| return 8 | ||
| case x86asm.RIP: | ||
| return 9 |
There was a problem hiding this comment.
This is missing registers r8 ... r15. But probably not needed yet?
| if (inst.Op == x86asm.LEA || inst.Op == x86asm.MOV) && inst.Args[0] != nil { | ||
| if reg, ok := inst.Args[0].(x86asm.Reg); ok { | ||
| var value uint64 | ||
| var loadedFrom uint64 | ||
|
|
||
| switch src := inst.Args[1].(type) { | ||
| case x86asm.Imm: | ||
| value = uint64(src) | ||
| case x86asm.Mem: | ||
| baseAddr, _ := regs.Get(src.Base) | ||
| displacement := uint64(src.Disp) | ||
|
|
||
| if inst.Op == x86asm.MOV { | ||
| value = memoryBase | ||
| loadedFrom = baseAddr + displacement | ||
| if src.Index != 0 { | ||
| indexValue, _ := regs.Get(src.Index) | ||
| loadedFrom += indexValue * uint64(src.Scale) | ||
| } | ||
| } else if inst.Op == x86asm.LEA { | ||
| value = baseAddr + displacement | ||
| if src.Index != 0 { | ||
| indexValue, _ := regs.Get(src.Index) | ||
| value += indexValue * uint64(src.Scale) | ||
| } | ||
| } | ||
|
|
||
| case x86asm.Reg: | ||
| value, _ = regs.Get(src) | ||
| } | ||
|
|
||
| regs.Set(reg, value, loadedFrom) | ||
| } | ||
| } | ||
|
|
||
| if inst.Op == x86asm.ADD && inst.Args[0] != nil && inst.Args[1] != nil { | ||
| if reg, ok0 := inst.Args[0].(x86asm.Reg); ok0 { | ||
| if _, ok1 := inst.Args[1].(x86asm.Mem); ok1 { | ||
| oldValue, _ := regs.Get(reg) | ||
| value := oldValue + memoryBase | ||
| regs.Set(reg, value, 0) | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This probably could go to the helpers. This probably directly reusable in the other modules. Though, I can do this in followup PRs.
There was a problem hiding this comment.
Yes, I would prefer to do this in a followup when switching some other decoding routine from c to go
I have a draft for libc + reusability, but it's not ready for review yet
a9f1733#diff-0cf1ffa2fb979e85389ef109707ee2440dbc759f65a2a2f1bf51ea19ef259defR71
| if runtime.GOARCH == "arm64" { | ||
| return decodeStubArgumentWrapperARM64(code, memoryBase), nil | ||
| } | ||
| if runtime.GOARCH == "amd64" { | ||
| return decodeStubArgumentAMD64(code, uint64(codeAddress), uint64(memoryBase)) | ||
| } |
There was a problem hiding this comment.
switch runtime.GOARCH {
case "arm64":
...
default:
return error
}
|
Can I have another round of review please? I'd like to work on a followup which depends on this. |
fabled
left a comment
There was a problem hiding this comment.
Thanks! LGTM. Let's do more stuff in follow up PRs.
FIxes #251 #284
This PR changes amd64 python stub decoding routine.
addinstructionmovThis fixes the failure to resolve autoTLSKey
with my self compiled cpython 312 , 313 as well as
/usr/lib/google-cloud-sdk/platform/bundledpythonunix/bin/python3.11fromgoogle/cloud-sdk:502.0.0docker image.This is a draft to initiate discussion. More testdata will be added.