Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor capstone2llvmir #115

Closed
PeterMatula opened this issue Jan 29, 2018 · 3 comments
Closed

Refactor capstone2llvmir #115

PeterMatula opened this issue Jan 29, 2018 · 3 comments

Comments

@PeterMatula
Copy link
Collaborator

The current implementation is an early prototype. We should refactor it in order to make it prettier and easier to use and develop for others. Do the following:

  • Separate public API that is needed for users from implementation API that is needed for developers.
  • Refactor code.
    • Solve todos.
    • Move common code to parent classes.
    • Add public API function to allow instruction-by-instruction decoding.
    • Throw different kinds of exceptions.
  • Unit tests for x86 and x86_64 registers according to issues discussed here.
  • Add auto generation of unhandled instructions based on information from Capstone -> generate pseudo calls of functions named as ASM instructions.
  • Add configuration API - enable/disable throwing asserts, exceptions, unknown instructions handling, etc.
@PeterMatula
Copy link
Collaborator Author

Make sure rep movsb (and other such instructions) are ok. This is an older report, it might be fine now. If so, at least add unit or regression test.

This ASM code:

0x403975:   be d8 42 40 00                      mov esi, 0x4042d8
0x40397a:   83 c6 05                            add esi, 0x5
0x40397d:   b9 04 00 00 00                      mov ecx, 0x4
0x403982:   bf fe 42 40 00                      mov edi, 0x4042fe
0x403987:   f3 a4                               rep movsb
0x403989:   46                                  inc esi
0x40398a:   b9 04 00 00 00                      mov ecx, 0x4
0x40398f:   f3 a4                               rep movsb
0x403991:   46                                  inc esi
0x403992:   b9 04 00 00 00                      mov ecx, 0x4
0x403997:   f3 a4                               rep movsb
0x403999:   46                                  inc esi
0x40399a:   b9 04 00 00 00                      mov ecx, 0x4
0x40399f:   f3 a4                               rep movsb

Gets decompiled to (at the time of report):

char * g110; // 0x4042d8
char * g111; // 0x4042dd
char * g112; // 0x4042de
char * g113; // 0x4042df
char * g114; // 0x4042e0
...
char * g117; // 0x4042fe
...
memcpy((char *)&g117, (char *)&g111, 4);
memcpy((char *)&g117, (char *)&g112, 4);
memcpy((char *)&g117, (char *)&g113, 4);
g4 = (int32_t)memcpy((char *)&g117, (char *)&g114, 4);

Where edi and esi are set to original values after rep movsb - which does not happen on CPU.

It should be something like:

memcpy(g117, g111, 4);
memcpy(g117+4, g111+5, 4);
memcpy(g117+8, g111+10, 4);
memcpy(g117+12, g111+15, 4);

Where second operands get increased by 5 bytes, not one, and first operands by 4 bytes, not 0.

@PeterMatula
Copy link
Collaborator Author

TODO for me: write a few words about the basic principles to wiki once this is done. Something like this comment in #193.

@PeterMatula
Copy link
Collaborator Author

FIxed by c2ed626.
There probably will be further improvements, but the most important things were addresses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant