Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bin2llvmir: decoder refactoring #116

Closed
PeterMatula opened this issue Jan 29, 2018 · 6 comments
Closed

Bin2llvmir: decoder refactoring #116

PeterMatula opened this issue Jan 29, 2018 · 6 comments

Comments

@PeterMatula
Copy link
Collaborator

The current decoder is a relic that was hacked to use capstone2llvmir when we got rid of the old semantic models. We need to refactor/rewrite it in order to improve the quality of decompilation.

After #115 is solved, refactor/rewrite decoder in a way where control-flow pass is embedded in the new decoder and hacks in the current decoder are not needed.

Notes:

@PeterMatula
Copy link
Collaborator Author

In Delphi samples, there are often data structures in code section. Do not decode them as code. Use them, if possible, to identify code.

42d43b.zip
In this sample:

  • Function table at 0x403010 is stored to eax in the 3rd instruction on entry point.
  • Some vtable stuctures from address 0x4020A8 stored to edx later on entry point.

@PeterMatula
Copy link
Collaborator Author

PeterMatula commented Jan 29, 2018

HV16_DebugM3

  • Make sure enter instruction at 0x401000 is decoded ok.
  • There are data at 0x4010cf (256 byte array), do not decode it as code.

@PeterMatula
Copy link
Collaborator Author

Check if we detect all the functions in Delphi binary: d7ab65f96a211ae3e822eae251d6f05ce8c05f910d9ec80fe7006cfca6b9c753.

@0xBEEEF
Copy link

0xBEEEF commented Apr 3, 2018

A general question about the Delphi examples. As you can see in the IDR, a Delphi program still contains almost all required metadata somewhere in the program. IDR therefore also succeeds in restoring inhertitance and recognizing constructures and deconstructures. Would it be conceivable to use all this information here? Unfortunately, IDR is no longer being developed, and has already done a lot of preparatory work here. Maybe you can contact the author, maybe he provides some code snippets for the Delphi recognition. Then you don't have to worry about what has already been implemented in IDR.

@PeterMatula
Copy link
Collaborator Author

@0xBEEEF I don't know much about Delphi, I will have to look into it - how IDR works, what info binaries contain, if we can use it, etc.

@PeterMatula
Copy link
Collaborator Author

Decoder phase was completely rewritten. It is still not perfect, but the basic principles are OK now - complete control flow reconstruction during decoding. It will take much more fine-tuning to consistently get close to IDA quality.

However, it turns out that most problems with slow and memory consuming decompilations are caused by the currently used LLVM IR to BIR converted, not the decoding in bin2llvmir. So this is the next most important thing to solve (#211).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants