Skip to content

fix(python): fix stub decoding routine#412

Merged
florianl merged 28 commits intoopen-telemetry:mainfrom
grafana:python-fix
Apr 22, 2025
Merged

fix(python): fix stub decoding routine#412
florianl merged 28 commits intoopen-telemetry:mainfrom
grafana:python-fix

Conversation

@korniltsev
Copy link
Copy Markdown
Contributor

@korniltsev korniltsev commented Mar 20, 2025

FIxes #251 #284

This PR changes amd64 python stub decoding routine.

  • registers state is tracked
  • adds support for add instruction
  • adds support for register mov
  • returns the last target register state value at the termination instructions (call/jmp)
  • removes argno as it is always 0

This fixes the failure to resolve autoTLSKey
with my self compiled cpython 312 , 313 as well as /usr/lib/google-cloud-sdk/platform/bundledpythonunix/bin/python3.11 from google/cloud-sdk:502.0.0 docker image.

This is a draft to initiate discussion. More testdata will be added.

Comment thread interpreter/python/decode_amd64.c Outdated
uint64_t decode_stub_argument(const uint8_t* code, size_t codesz, uint8_t argument_no,
uint64_t rip_base, uint64_t memory_base) {
// todo update comment
// todo rewrite in go
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what its worth we've gotten pretty good mileage out of go's builtin disassembler: https://github.com/parca-dev/opentelemetry-ebpf-profiler/blob/main/interpreter/luajit/extractor_x86.go

Its shortcomings don't really manifest for simple cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The long history is that we needed Zydis originally for the stack delta extraction filtering, and the go disassembler did not recognize all opcodes. We wanted to keep only single disassembler so Zydis was used for all other purposes as well. I think the go built in disassembler has improved since, and the stub disassembly is more constrained, this could probably an option. Especially since we use the go builtin disassembler for all arm64 side.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The go x86 disassembler is definitely still broken for anything meaty but I found for simple tasks like this it works well. Personally I find avoiding cgo when possible is worth it and if I was gonna undertake any new decoder tasks I'd probably take a hard look at https://github.com/zyantific/zydis-go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've drafted a go version.

It's a bit slower but I think it's worth having less unsafe code in the program running as privileged root parsing untrusted user data.

cpu: 13th Gen Intel(R) Core(TM) i7-1365U
               │   old.txt    │               new.txt               │
               │    sec/op    │   sec/op     vs base                │
DecodeAmd64-12   2.000µ ± 13%   2.732µ ± 8%  +36.63% (p=0.000 n=10)

I think rewriting in go should be a separate issue though. I will submit a separate change if we merge #412
Did a quick look into zydis-go and If I understand correctly - it embeds some precompiled native binaries and I'm not excited to use this for the same reasons described above to not use memory unsafe languages and lack of transparency.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm happy to apply it here in this PR as well. Both go and C versions looks like almost complete rewrite (touches all the lines )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christos68k @florianl @athre0z Do you see any blockers for not using the built-in Go disassembler for amd64?

Currently we have Zydis used in interpreter/php, interpreter/python, and tpbase (for kernel/fsbase and libc/pthread).

The commit referenced would fix php. I'm available to rewrite the three others which are simpler. This would allow cross-running amd64 tests on arm64 builds, and remove the 12M .c drop from this repository.

I believe the Go code will be more maintainable and safer. The original reason for including Zydis was the stack delta extraction code inspecting call return locations to determine which stack deltas to keep and thus had requirements the Go disassembler was not able to handle. But this code is long gone, and I'd like to simplify our codebase now.

Does any anyone object this? If no, I would prefer to review the Go version for this fix.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

@gnurizen
Copy link
Copy Markdown
Contributor

This is great and thanks for undertaking this! You might consider a docker based testing approach (https://github.com/parca-dev/opentelemetry-ebpf-profiler/blob/main/interpreter/luajit/offsets_test.go) but I think this approach here is fine too. As long as we have tests!

@gnurizen
Copy link
Copy Markdown
Contributor

The nice thing about docker tests is that you can have one be for the "latest" so when new dot releases CI can fail if something changes. I suspect most of the "interesting" instruction sequences we're seeing are coming from PGO/LTO builds which the stock docker builds are not, but apparently the google cloud sdk builds are so we could use those in addition.

Comment thread interpreter/python/decode_amd64_test.go Outdated
Copy link
Copy Markdown
Contributor

@gnurizen gnurizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@korniltsev
Copy link
Copy Markdown
Contributor Author

This is great and thanks for undertaking this! You might consider a docker based testing approach (https://github.com/parca-dev/opentelemetry-ebpf-profiler/blob/main/interpreter/luajit/offsets_test.go) but I think this approach here is fine too. As long as we have tests!

Thank you for suggestion.
I love the idea with docker images overall and I think we should add them but not in this particular case. Maybe for better integration testing, or automated-coredump-generation-testing.
The reason I don't see how docker images could be useful here - is that the tests needs some expected values to assert and currently in this PR it is a manual process to prepare them. it's probably possible to reliably extract the expected values from debug packages, but they may not be available in all cases/distros.

Overall I like the idea to have more tests with the help of docker images. We should create a separate issue to not overwhelm the scope of this PR.

korniltsev added a commit to grafana/opentelemetry-ebpf-profiler that referenced this pull request Mar 24, 2025
merging open-telemetry#412

Squashed commit of the following:

commit b0d9b46
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date:   Fri Mar 21 11:18:40 2025 +0700

    lint

commit aa5ef11
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date:   Thu Mar 20 23:53:53 2025 +0700

    more tests, debug dump code on failure

commit e4f1c2a
Merge: 6a8e9fc 192111e
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date:   Thu Mar 20 14:57:17 2025 +0700

    Merge branch 'main' into python-fix

commit 6a8e9fc
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date:   Thu Mar 20 14:37:15 2025 +0700

    lint

commit b8f926a
Author: Tolya Korniltsev <korniltsev.anatoly@gmail.com>
Date:   Thu Mar 20 11:39:13 2025 +0700

    python stub decoding improvements
@korniltsev
Copy link
Copy Markdown
Contributor Author

Tagging this as review-ready to get some eyes on the pull request. Still want and intend to upload some coredumps once we have a green light.

@korniltsev korniltsev marked this pull request as ready for review March 27, 2025 12:29
@korniltsev korniltsev requested review from a team as code owners March 27, 2025 12:29
Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks pretty good already. Some suggestions added. I'd mostly like to see the debug flag go away and use normal logging. The useful pieces of the code can be moved to a disassembly helper module in a follow up PR if preferred that way.

Comment thread interpreter/python/decode_amd64.go Outdated
import "C"
const debugDecodeAMD64 = false

func regIndex(reg x86asm.Reg) int {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want this in adm64helpers or similar module to share with the other places to be converted. Similar to armhelpers which already exist. Perhaps those should go to some better descriptive subdirectory like disasm/amd64 and disasm/arm64. This can be also done in a follow up MR if preferred.

Comment thread interpreter/python/decode_amd64.go Outdated
value uint64
}

func decodeStubArgumentAMD64(code []byte, codeAddress, memoryBase uint64) uint64 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in decode.go where the equivalent arm decoder is. That allows running the amd64 tests in arm64 builds.

Comment thread interpreter/python/decode_amd64.go Outdated
func decodeStubArgumentWrapper(code []byte, argNumber uint8, symbolValue,
addrBase libpf.SymbolValue) libpf.SymbolValue {
return decodeStubArgumentWrapperX64(code, argNumber, symbolValue, addrBase)
func decodeStubArgumentWrapper(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Purpose of this wrapper was to hide the CGO stuff from other files. But now there is no CGO, so perhaps the wrapper could be just directly implementation, or drop the wrapper portion from name and update call sites?

Comment thread interpreter/python/decode_amd64_test.go Outdated
@@ -0,0 +1,204 @@
// Copyright The OpenTelemetry Authors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file should be merged to decode_test.go once the disassembler implementation is moved to decode.go. Then test suite works identically on arm64/amd64 and does not matter which CPU is on developers computer.

Comment thread interpreter/python/python.go Outdated
Comment on lines +673 to +674
dumpCode := func() {
log.Debugf("python stub code: %s", hex.Dump(code))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing debugStub to return also an error and put the dump there?

Comment thread interpreter/python/decode_amd64.go Outdated
Comment on lines +53 to +57
if len(rem) >= 4 &&
code[instructionOffset] == 0xf3 &&
code[instructionOffset+1] == 0x0f &&
code[instructionOffset+2] == 0x1e &&
code[instructionOffset+3] == 0xfa {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a instruction the disassembler does not support, but we need to recognize and skip? This should go to a helper module. But if preferred, I can do this when working with the other conversions and try to see what would be a nice api.

Comment thread interpreter/python/decode_amd64.go Outdated
// #include "decode_amd64.h"
// #include "../../support/ebpf/types.h"
import "C"
const debugDecodeAMD64 = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to drop this. The debug prints that are not important can be dropped, and the meaningful ones should use the same logging mechanism as other code. It was #ifdef in the C code only because adding interface to the logger from C was not found important enough.

@korniltsev
Copy link
Copy Markdown
Contributor Author

sent coredump blobs here https://cloud-native.slack.com/archives/C03J794L0BV/p1743940210220949?thread_ts=1742556865.154259&cid=C03J794L0BV

@fabled Thanks for review. I've updated the code accordingly. ptal

"fib+3 in /home/korniltsev/trash/fib.py:4",
"fib+3 in /home/korniltsev/trash/fib.py:4",
"<module>+6 in /home/korniltsev/trash/fib.py:7",
"ffffffffffffffffffffffffffffffff+0x0",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worth creating an issue

Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Perhaps the modulestore change should be separate PR? I think this is good enough to go in, and we can do rest of the asm helper improvements in follow up PRs when transforming the other instances. Thanks!

Comment thread tools/coredump/modulestore/store.go Outdated
Comment on lines +435 to +445
if resp.StatusCode != http.StatusOK {
errorResponse, _ := io.ReadAll(resp.Body)
return "", fmt.Errorf("store returned %d %s", resp.StatusCode, errorResponse)
}

// Download the file to a temporary location to prevent half-complete modules on crashes.
file, err := os.CreateTemp(store.localCachePath, localTempPrefix)
if err != nil {
return "", fmt.Errorf("failed to create local file: %w", err)
}
defer file.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This perhaps should/could be a separate PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, lets split #433

Comment thread asm/amd/insn.go Outdated

package amd // import "go.opentelemetry.io/ebpf-profiler/asm/amd"

func IsEndbr64(code []byte) (isEndbr bool, size int) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment to the function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Comment thread interpreter/python/decode.go
Comment thread interpreter/python/python.go Outdated
Comment thread interpreter/python/python.go Outdated
Comment thread interpreter/python/python.go Outdated
Comment thread interpreter/python/python.go Outdated
korniltsev and others added 4 commits April 7, 2025 20:40
Comment thread asm/amd/insn.go Outdated
// IsEndbr64 returns true if the first 4 bytes of the code is endbr64 instruction
// https://www.felixcloutier.com/x86/endbr64
// The second returned argument is the size of the instruction which is always 4
func IsEndbr64(code []byte) (isEndbr bool, size int) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this instead just a Decode, DecodeSkippable or similar? I suspect we may need to add few more here...

Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some stylistic nits that I'd like fixed. I can do moving of code to make the helpers more useful later on (so not a requirement for now).

Comment thread asm/amd/insn.go Outdated
Comment on lines +10 to +14
if len(code) >= 4 &&
code[0] == 0xf3 &&
code[1] == 0x0f &&
code[2] == 0x1e &&
code[3] == 0xfa {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opcodeEndBr64 := []byte{0xf3,0x0f,0x1e,0xfa}

...

switch {
cast bytes.HasPrefix(code, opcodeEndbr64):
...
}

Possibly needs another case to keep lint happy?

Comment thread asm/amd/regs_state.go
case x86asm.RSP, x86asm.ESP:
return 8
case x86asm.RIP:
return 9
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing registers r8 ... r15. But probably not needed yet?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add them

Comment on lines +140 to +184
if (inst.Op == x86asm.LEA || inst.Op == x86asm.MOV) && inst.Args[0] != nil {
if reg, ok := inst.Args[0].(x86asm.Reg); ok {
var value uint64
var loadedFrom uint64

switch src := inst.Args[1].(type) {
case x86asm.Imm:
value = uint64(src)
case x86asm.Mem:
baseAddr, _ := regs.Get(src.Base)
displacement := uint64(src.Disp)

if inst.Op == x86asm.MOV {
value = memoryBase
loadedFrom = baseAddr + displacement
if src.Index != 0 {
indexValue, _ := regs.Get(src.Index)
loadedFrom += indexValue * uint64(src.Scale)
}
} else if inst.Op == x86asm.LEA {
value = baseAddr + displacement
if src.Index != 0 {
indexValue, _ := regs.Get(src.Index)
value += indexValue * uint64(src.Scale)
}
}

case x86asm.Reg:
value, _ = regs.Get(src)
}

regs.Set(reg, value, loadedFrom)
}
}

if inst.Op == x86asm.ADD && inst.Args[0] != nil && inst.Args[1] != nil {
if reg, ok0 := inst.Args[0].(x86asm.Reg); ok0 {
if _, ok1 := inst.Args[1].(x86asm.Mem); ok1 {
oldValue, _ := regs.Get(reg)
value := oldValue + memoryBase
regs.Set(reg, value, 0)
}
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably could go to the helpers. This probably directly reusable in the other modules. Though, I can do this in followup PRs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would prefer to do this in a followup when switching some other decoding routine from c to go

I have a draft for libc + reusability, but it's not ready for review yet
a9f1733#diff-0cf1ffa2fb979e85389ef109707ee2440dbc759f65a2a2f1bf51ea19ef259defR71

Comment thread interpreter/python/decode.go Outdated
Comment on lines +193 to +198
if runtime.GOARCH == "arm64" {
return decodeStubArgumentWrapperARM64(code, memoryBase), nil
}
if runtime.GOARCH == "amd64" {
return decodeStubArgumentAMD64(code, uint64(codeAddress), uint64(memoryBase))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

switch runtime.GOARCH {
case "arm64":
...
default:
  return error
}

@korniltsev korniltsev requested a review from fabled April 14, 2025 04:42
@korniltsev
Copy link
Copy Markdown
Contributor Author

Can I have another round of review please? I'd like to work on a followup which depends on this.

Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. Let's do more stuff in follow up PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: frames missing due to failure to resolve autoTLSKey

5 participants