Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use control flow info from IDA #7

Open
PeterMatula opened this issue Jan 3, 2018 · 3 comments
Open

Use control flow info from IDA #7

PeterMatula opened this issue Jan 3, 2018 · 3 comments
Assignees

Comments

@PeterMatula
Copy link
Collaborator

At the moment, when user selects a function to decompile, retdec is provided with function's range and a JSON generated from IDA database with metainformation to use. The range is decoded using retdec decoder, without any hints on control flow from IDA - just as if retdec was used without IDA plugin.

Although retdec decoder has an ambition to be as good as IDA, it is not there yet. Moreover, even if it was as good or better, users using retdec from retdec-idaplugin would probably appreciate if the decompiled result had the same exact control flow as IDA disassembly - things might get confusing otherwise.

Therefore, it would be the best if retdec could use control flow information provided by IDA (or any other JSON producer). The following needs to be done in order to make this happen:

  1. Add control flow representation capabilities to retdec-config - BB ranges, control flow changing instruction, their types, and targets (e.g. (un)cond branches, returns, calls).
  2. Modify retdec-idaplugin to produce this info into JSON.
  3. Modify retdec's decoder pass to use this info - each control flow changing instruction is inspected in order to determine its type and targets (i.e. next BBs). It should not be hard to query potential JSON info for this info before decoder tries to determine it on its own.
@PeterMatula PeterMatula self-assigned this Jan 3, 2018
@Alexey-Danilchenko
Copy link

I think it needs to go one step further. Currently retdec is only useful as IDA plugin if it can decode file itself - IDA however has a lot of loaders that can take care of complicated binary formats and process something like firmwares even. Ideally retdec should be used what it is designed for - decompilation and use as mush as possible from IDA. I.e. for function decompilation it should be possible for IDA to pass on entire disassembly of the function with resolved symbols (if any) that should aid decompilation. There should be no need for retdec to decode the binary.

@PeterMatula
Copy link
Collaborator Author

@Alexey-Danilchenko

  1. It is true, that it would be the best if RetDec could do a high quality decompilation of a given binary data chunk. That would allow IDA plugin to send any data + metadata to RetDec - it would no longer matter what was the underlying source and whether RetDec can handle this source on its own. This is something to aim for. Exporting control flow metadata from IDA is a good first step.
  2. If IDA plugin passes binary data (selected range in binary) to decompile + metadata about basic blocks and their relations, it should be as good as if it passed IDA's assembly itself - given that RetDec's disassembler interprets the same data into the same instructions as IDA's disassembler would. However, RetDec needs to disassemble the data on its own, IDA's disassembly is useless for us. It is the disassembling/decoding phase that creates LLVM intermediate representation that we work with. RetDec does not know how to create LLVM IR from IDA's disassembly, and it would not be an easy thing to do.

@Alexey-Danilchenko
Copy link

@PeterMatula

I see (2) is certainly a blocker for this. Perhaps the approach to explore here is for plugin to pass a binary block to decompile with the metadata? For example if the function is selected for decompilation, pass the function body as bytecodes (that is accessible from IDA API and since it was loaded is not file format specific) together with any symbolic information (variables, types etc). I got particularly interested in this since RetDec includes PowerPC an I have a custom loader for firmware that I am maintaining which uses PowerPC code.

Generally speaking though the main strength of IDA is that disassembly of the code is happening interactively and in the process the data gets refined, variables could be named and typed etc - so just using the binaries and decoding them in RetDec is sort of loosing all that information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants