Reusable amd64 interpreter for extraction of values from disassembly#447
Reusable amd64 interpreter for extraction of values from disassembly#447fabled merged 39 commits intoopen-telemetry:mainfrom
Conversation
99d75f5 to
1815c7a
Compare
fabled
left a comment
There was a problem hiding this comment.
Wow. Cool stuff! First round of comments added. Any particular reason why this is marked draft?
I found a need to rework Thanks for the review. I will address your comments |
Co-authored-by: Timo Teräs <timo.teras@iki.fi>
|
@fabled Thank you for your review. I've addressed your questions and marked the PR as ready for review. |
fabled
left a comment
There was a problem hiding this comment.
First quick round done. Will do also a bit more detailed review on the code since the draft state was now removed.
Mostly just naming things things. The other thing missing is a bit improved documentation. Typically most structs and functions have some sort of comment attached. But I'd love to start building things on this, so perhaps we can leave the documentation as a follow up PR if other maintainer(s) agree.
Co-authored-by: Timo Teräs <timo.teras@iki.fi>
| bits int | ||
| } | ||
|
|
||
| var regs [128]regEntry |
There was a problem hiding this comment.
The number of registers does not make sense to me. Perhaps make a const maxRegs, and explain where it comes from?
Currently it seems: maximum register is TR7 = 154, or maximum register used in this code is RIP = 71. Could use directly the register name in the definition of the maximum?
If I read the code correctly regEntry is a mapping from x86asm register to the register indexes used internally in this code? How about making this registerMappings [maxX86Register]registerMapping or similar?
There was a problem hiding this comment.
on a second thought. I think this code may break if x86asm decides to change the values for the constants. I decoded to change this to a switch case code.
| return e.idx | ||
| } | ||
|
|
||
| func regEntryFor(reg x86asm.Reg) regEntry { |
| var regs [128]regEntry | ||
|
|
||
| func init() { | ||
| regs[x86asm.AL] = regEntry{idx: 1, bits: 8} |
There was a problem hiding this comment.
The idx values used here are internal register assignments. Perhaps constants could be added for these?
| return regEntry{} | ||
| } | ||
|
|
||
| type RegsState struct { |
There was a problem hiding this comment.
nit: MachineState? or RegStates/RegisterStates? or just Registers?
| i.Regs.regs[0] = expression.Var("invalid reg") | ||
| i.Regs.regs[regIndex(x86asm.RAX)] = expression.Var("initial RAX") | ||
| i.Regs.regs[regIndex(x86asm.RCX)] = expression.Var("initial RCX") | ||
| i.Regs.regs[regIndex(x86asm.RDX)] = expression.Var("initial RDX") | ||
| i.Regs.regs[regIndex(x86asm.RBX)] = expression.Var("initial RBX") | ||
| i.Regs.regs[regIndex(x86asm.RSP)] = expression.Var("initial RSP") | ||
| i.Regs.regs[regIndex(x86asm.RBP)] = expression.Var("initial RBP") | ||
| i.Regs.regs[regIndex(x86asm.RSI)] = expression.Var("initial RSI") | ||
| i.Regs.regs[regIndex(x86asm.RDI)] = expression.Var("initial RDI") | ||
| i.Regs.regs[regIndex(x86asm.R8)] = expression.Var("initial R8") | ||
| i.Regs.regs[regIndex(x86asm.R9)] = expression.Var("initial R9") | ||
| i.Regs.regs[regIndex(x86asm.R10)] = expression.Var("initial R10") | ||
| i.Regs.regs[regIndex(x86asm.R11)] = expression.Var("initial R11") | ||
| i.Regs.regs[regIndex(x86asm.R12)] = expression.Var("initial R12") | ||
| i.Regs.regs[regIndex(x86asm.R13)] = expression.Var("initial R13") | ||
| i.Regs.regs[regIndex(x86asm.R14)] = expression.Var("initial R14") | ||
| i.Regs.regs[regIndex(x86asm.R15)] = expression.Var("initial R15") | ||
| i.Regs.regs[regIndex(x86asm.RIP)] = expression.Var("initial RIP") |
There was a problem hiding this comment.
This would also look a lot neater if we had our own constants for the registers. Or could this be made a loop?
| func Any() *Variable { | ||
| v := Var("any") | ||
| v.isAny = true | ||
| return v | ||
| } | ||
|
|
||
| func Var(name string) *Variable { |
There was a problem hiding this comment.
I feel that Any() and Var() are bit ambiguous and overloaded. Since Var is used to construct both non-capturing (register initial states; basically just named unknown value) and capturing (Match target to extract matched subexpression) variants.
Perhaps the constructors could be something like:
- CaptureExpression()
- CaptureImmediate()
- NamedState()
- Constant()
or similar?
Perhaps even construct different type of struct out of these unless it makes the Match implementation too much more complicated. But I'd assume the NamedState type such as the initial register states when used as match target should match anything else then itself?
There was a problem hiding this comment.
I've removed Any. I propose to keep the single Var for now. It has a name and it can match & capture an immediate. Sometimes we use the capture functionality sometimes we don't. It looks fine to me.
Let me know if you strongly believe we need to split Var to NamedState and CaptureImmediate, I will do this.
There was a problem hiding this comment.
Let me know if you strongly believe we need to split
VartoNamedStateandCaptureImmediate, I will do this.
From programming safety point of view, I think it would be good to split these. The problem I see is that unknown developer might misuse Var. It also creates ambiguity as Match target. I would feel much more safer if these two functional use cases have different backing data type.
There was a problem hiding this comment.
ok. Split into Named and ImmediateCapture
b0184bd to
dc208b8
Compare
fabled
left a comment
There was a problem hiding this comment.
Few comments still, but looking good enough so I'll give an early approval to get a second review from @christos68k or @florianl ?
| func decodeStubArgumentWrapper( | ||
| code []byte, | ||
| codeAddress libpf.SymbolValue, | ||
| memoryBase libpf.SymbolValue, | ||
| ) (libpf.SymbolValue, error) { |
There was a problem hiding this comment.
Seems this was like this, but why not inline this at the only call site, and use the pfelf.File.Machine to determine the machine type.
f73df49 to
54108ed
Compare
| i.pc += inst.Len | ||
| i.code = i.code[inst.Len:] | ||
| i.Regs.setX86asm(x86asm.RIP, expression.Add(i.CodeAddress, expression.Imm(uint64(i.pc)))) | ||
| switch inst.Op { |
There was a problem hiding this comment.
General observation: All switch cases call i.Regs.setX86asm() if everything works as expected. Should we inform the user, that something unexpected happened, if i.Regs.setX86asm() is not used in some cases?
There was a problem hiding this comment.
There are instructions and instruction types that currently do nothing in the interpreter and ignored, for example writing or reading memory, tests, jumps. We should not inform on those. We have tests for stub decodings, they should be enough.
Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
…pen-telemetry#447) Co-authored-by: Timo Teräs <timo.teras@iki.fi> Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
…pen-telemetry#447) Co-authored-by: Timo Teräs <timo.teras@iki.fi> Co-authored-by: Florian Lehner <florianl@users.noreply.github.com>
This PR is continuation of #412
In #412, I've added amd instruction interpreting logic to the python stub decode routine.
In this followup, I extract this interpreting logic into reusable
amd.Interpreter.To make it reusable in other packages I introduce new
variablepackage.The main new type here is
variable.Expressioninterface which is common ground to facilitate arithmetic operations on register and unknown memory values.The
variablepackage support the following:variable.Imm),Add,MulonExpressionvalue operandseaxfromRAX)variable.Mem,variable.Variable, which canMatchand extract immediate valuesThe most useful method of
ExpressionisMatch. Here is the snippet from its doc:Now with the help of the new
variablepackage, the newInterpretercan interpret amd instructions up to a certain point. Previously we used to haveregStateas the state of registers which had limited reusablityNow we use
variable.Expressionas the state of registers.This allows us to interpret needed instructions up to a certain point and then
Matchthe result of the computation (Expression) against some expected pattern(Expression) and extract needed offsets/addresses along the way.In this PR I only change the python decoding routine to keep the change smaller. You can see how it will be used in other decoding routines (php, libc, fsbase) in this branch https://github.com/open-telemetry/opentelemetry-ebpf-profiler/compare/main...grafana:opentelemetry-ebpf-profiler:libc?expand=1I've uploaded php, libc changes as well.