Skip to content

Reusable amd64 interpreter for extraction of values from disassembly#447

Merged
fabled merged 39 commits intoopen-telemetry:mainfrom
grafana:reusable-amd-interpreter
Jul 11, 2025
Merged

Reusable amd64 interpreter for extraction of values from disassembly#447
fabled merged 39 commits intoopen-telemetry:mainfrom
grafana:reusable-amd-interpreter

Conversation

@korniltsev
Copy link
Copy Markdown
Contributor

@korniltsev korniltsev commented Apr 23, 2025

This PR is continuation of #412
In #412, I've added amd instruction interpreting logic to the python stub decode routine.

In this followup, I extract this interpreting logic into reusable amd.Interpreter.

To make it reusable in other packages I introduce new variable package.

The main new type here is variable.Expression interface which is common ground to facilitate arithmetic operations on register and unknown memory values.
The variable package support the following:

  • immediate values (variable.Imm),
  • some arithmetic operations like Add, Mul on Expression value operands
  • Zero/Sign Extend operation (used to load eax from RAX)
  • holding unknown values loaded from memory with variable.Mem,
  • holding variable unknown values variable.Variable, which can Match and extract immediate values

The most useful method of Expression is Match. Here is the snippet from its doc:

	// Match compares this Expression value against a pattern Expression.
	// The order of the arguments matters: a.Match(b) or b.Match(a) may
	// produce different results. The intended order The pattern should be passed as an argument, not
	// the other way around.
	// It returns true if the values are considered equal or compatible according to
	// the type-specific rules:
	// - For operations (add, mul): checks if operation types and operands match
	// - For immediate: checks if values are equal and extracts value into a Variable
	// - For memory references: checks if segments and addresses match
	// - For extend operations: checks if sizes and inner values match
	// - For variables: checks if they are pointing to the same object instance.

Now with the help of the new variable package, the new Interpreter can interpret amd instructions up to a certain point. Previously we used to have regState as the state of registers which had limited reusablity

type regState struct {
	LoadedFrom uint64
	Value      uint64
}

Now we use variable.Expression as the state of registers.

This allows us to interpret needed instructions up to a certain point and then Match the result of the computation (Expression) against some expected pattern(Expression) and extract needed offsets/addresses along the way.

In this PR I only change the python decoding routine to keep the change smaller. You can see how it will be used in other decoding routines (php, libc, fsbase) in this branch https://github.com/open-telemetry/opentelemetry-ebpf-profiler/compare/main...grafana:opentelemetry-ebpf-profiler:libc?expand=1
I've uploaded php, libc changes as well.

@korniltsev korniltsev requested review from a team as code owners April 23, 2025 10:17
@korniltsev korniltsev force-pushed the reusable-amd-interpreter branch from 99d75f5 to 1815c7a Compare April 23, 2025 12:08
@korniltsev korniltsev marked this pull request as draft May 6, 2025 05:36
Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Cool stuff! First round of comments added. Any particular reason why this is marked draft?

Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/variable/u64.go Outdated
Comment thread asm/variable/u64.go Outdated
Comment thread asm/variable/variable_test.go Outdated
@korniltsev
Copy link
Copy Markdown
Contributor Author

Wow. Cool stuff! First round of comments added. Any particular reason why this is marked draft?

I found a need to rework Crop into SignExtend ZeroExtend. And I did not have time to finish this yet. I hope to get back to this PR this week.

Thanks for the review. I will address your comments

@korniltsev korniltsev marked this pull request as ready for review June 4, 2025 07:30
@korniltsev
Copy link
Copy Markdown
Contributor Author

@fabled Thank you for your review. I've addressed your questions and marked the PR as ready for review.
I'd appreciate if you take another look. 🙏

Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First quick round done. Will do also a bit more detailed review on the code since the draft state was now removed.

Mostly just naming things things. The other thing missing is a bit improved documentation. Typically most structs and functions have some sort of comment attached. But I'd love to start building things on this, so perhaps we can leave the documentation as a follow up PR if other maintainer(s) agree.

Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/amd/insn.go Outdated
Comment thread asm/amd/regs_state.go Outdated
Comment thread asm/variable/add.go Outdated
Comment thread asm/variable/u64.go
@korniltsev korniltsev requested a review from fabled June 12, 2025 05:55
Comment thread asm/amd/regs_state.go Outdated
bits int
}

var regs [128]regEntry
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of registers does not make sense to me. Perhaps make a const maxRegs, and explain where it comes from?

Currently it seems: maximum register is TR7 = 154, or maximum register used in this code is RIP = 71. Could use directly the register name in the definition of the maximum?

If I read the code correctly regEntry is a mapping from x86asm register to the register indexes used internally in this code? How about making this registerMappings [maxX86Register]registerMapping or similar?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made id RIP+1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a second thought. I think this code may break if x86asm decides to change the values for the constants. I decoded to change this to a switch case code.

Comment thread asm/amd/regs_state.go Outdated
return e.idx
}

func regEntryFor(reg x86asm.Reg) regEntry {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regMappingFor?

Comment thread asm/amd/regs_state.go Outdated
var regs [128]regEntry

func init() {
regs[x86asm.AL] = regEntry{idx: 1, bits: 8}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idx values used here are internal register assignments. Perhaps constants could be added for these?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread asm/amd/regs_state.go Outdated
return regEntry{}
}

type RegsState struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: MachineState? or RegStates/RegisterStates? or just Registers?

Comment thread asm/amd/regs_state.go Outdated
Comment on lines +198 to +215
i.Regs.regs[0] = expression.Var("invalid reg")
i.Regs.regs[regIndex(x86asm.RAX)] = expression.Var("initial RAX")
i.Regs.regs[regIndex(x86asm.RCX)] = expression.Var("initial RCX")
i.Regs.regs[regIndex(x86asm.RDX)] = expression.Var("initial RDX")
i.Regs.regs[regIndex(x86asm.RBX)] = expression.Var("initial RBX")
i.Regs.regs[regIndex(x86asm.RSP)] = expression.Var("initial RSP")
i.Regs.regs[regIndex(x86asm.RBP)] = expression.Var("initial RBP")
i.Regs.regs[regIndex(x86asm.RSI)] = expression.Var("initial RSI")
i.Regs.regs[regIndex(x86asm.RDI)] = expression.Var("initial RDI")
i.Regs.regs[regIndex(x86asm.R8)] = expression.Var("initial R8")
i.Regs.regs[regIndex(x86asm.R9)] = expression.Var("initial R9")
i.Regs.regs[regIndex(x86asm.R10)] = expression.Var("initial R10")
i.Regs.regs[regIndex(x86asm.R11)] = expression.Var("initial R11")
i.Regs.regs[regIndex(x86asm.R12)] = expression.Var("initial R12")
i.Regs.regs[regIndex(x86asm.R13)] = expression.Var("initial R13")
i.Regs.regs[regIndex(x86asm.R14)] = expression.Var("initial R14")
i.Regs.regs[regIndex(x86asm.R15)] = expression.Var("initial R15")
i.Regs.regs[regIndex(x86asm.RIP)] = expression.Var("initial RIP")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also look a lot neater if we had our own constants for the registers. Or could this be made a loop?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a loop

Comment thread asm/expression/variable.go Outdated
Comment thread asm/expression/variable.go Outdated
Comment thread asm/expression/variable.go Outdated
Comment on lines +12 to +18
func Any() *Variable {
v := Var("any")
v.isAny = true
return v
}

func Var(name string) *Variable {
Copy link
Copy Markdown
Contributor

@fabled fabled Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that Any() and Var() are bit ambiguous and overloaded. Since Var is used to construct both non-capturing (register initial states; basically just named unknown value) and capturing (Match target to extract matched subexpression) variants.

Perhaps the constructors could be something like:

  • CaptureExpression()
  • CaptureImmediate()
  • NamedState()
  • Constant()

or similar?

Perhaps even construct different type of struct out of these unless it makes the Match implementation too much more complicated. But I'd assume the NamedState type such as the initial register states when used as match target should match anything else then itself?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed Any. I propose to keep the single Var for now. It has a name and it can match & capture an immediate. Sometimes we use the capture functionality sometimes we don't. It looks fine to me.

Let me know if you strongly believe we need to split Var to NamedState and CaptureImmediate, I will do this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you strongly believe we need to split Var to NamedState and CaptureImmediate, I will do this.

From programming safety point of view, I think it would be good to split these. The problem I see is that unknown developer might misuse Var. It also creates ambiguity as Match target. I would feel much more safer if these two functional use cases have different backing data type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. Split into Named and ImmediateCapture

Comment thread asm/expression/variable.go Outdated
Comment thread asm/expression/variable.go Outdated
@korniltsev korniltsev force-pushed the reusable-amd-interpreter branch from b0184bd to dc208b8 Compare June 20, 2025 03:57
@korniltsev korniltsev requested a review from fabled June 20, 2025 06:34
Copy link
Copy Markdown
Contributor

@fabled fabled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments still, but looking good enough so I'll give an early approval to get a second review from @christos68k or @florianl ?

Comment thread asm/expression/named.go Outdated
Comment thread interpreter/python/wrapper_decode.go Outdated
Comment on lines +13 to +17
func decodeStubArgumentWrapper(
code []byte,
codeAddress libpf.SymbolValue,
memoryBase libpf.SymbolValue,
) (libpf.SymbolValue, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this was like this, but why not inline this at the only call site, and use the pfelf.File.Machine to determine the machine type.

@korniltsev korniltsev force-pushed the reusable-amd-interpreter branch from f73df49 to 54108ed Compare June 26, 2025 06:05
Copy link
Copy Markdown
Member

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comments. Thanks for working on this!

Once merge conflicts are resolved, this should be merged :)

Comment thread asm/amd/interpreter.go Outdated
Comment thread asm/amd/interpreter.go Outdated
Comment thread asm/amd/interpreter.go
i.pc += inst.Len
i.code = i.code[inst.Len:]
i.Regs.setX86asm(x86asm.RIP, expression.Add(i.CodeAddress, expression.Imm(uint64(i.pc))))
switch inst.Op {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General observation: All switch cases call i.Regs.setX86asm() if everything works as expected. Should we inform the user, that something unexpected happened, if i.Regs.setX86asm() is not used in some cases?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are instructions and instruction types that currently do nothing in the interpreter and ignored, for example writing or reading memory, tests, jumps. We should not inform on those. We have tests for stub decodings, they should be enough.

korniltsev and others added 2 commits July 11, 2025 11:37
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
@fabled fabled merged commit e78bcc4 into open-telemetry:main Jul 11, 2025
27 checks passed
gnurizen pushed a commit to parca-dev/opentelemetry-ebpf-profiler that referenced this pull request Aug 12, 2025
…pen-telemetry#447)

Co-authored-by: Timo Teräs <timo.teras@iki.fi>
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
gnurizen pushed a commit to parca-dev/opentelemetry-ebpf-profiler that referenced this pull request Aug 13, 2025
…pen-telemetry#447)

Co-authored-by: Timo Teräs <timo.teras@iki.fi>
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants