Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localize all registers #652

Closed
PeterMatula opened this issue Sep 19, 2019 · 0 comments
Closed

Localize all registers #652

PeterMatula opened this issue Sep 19, 2019 · 0 comments

Comments

@PeterMatula
Copy link
Collaborator

The current state:

  • Our binary to LLVM IR decoding represents registers as global variables.
  • Our low-level analyses make heavy use of Reaching Definition Analysis (RDA), which halts the [register] tracking at function starts - i.e. it is not inter-procedural, and therefore all analyses using it are not inter-procedural as well.
  • LLVM IR analyses are very strict - they do not make simplified assumptions, and if they are not able to prove optimization correct, they do not do it. Some of them are inter-procedural, and therefore very complex and expensive.
  • Some of our high-level analyses are inter-procedural (e.g. -global-to-local, -dead-global-assign), and therefore very complex and expensive.
  • Backend (llvmir2hll) is also strict and take inter-procedural relations into account.

All of this have the following consequences:

  • Many analyses are very complex, expensive, and not even necessary correct (it is very error-prone).
  • A lot of clutter in the resulting decompilation.

Proposal:

  • Transform all registers to local variables at some point (i.e. localize them).
    • Do not translate (binary to LLVM IR) them like local variables, it would make translation less general.
    • The cleanest solution would probably be to localize them right after the decoding, so that all analyses (ours and LLVM's) work on the same register representation. This would however require modifications to all of our analyses, so don't do it right away.
    • Do the localization after our low-level passes, and before LLVM passes. LLVM does not care about the nature of our registers, and therefore no modifications are needed.
    • Reduce the number of our high-level passes - some will become obsolete after localization, others can be moved.

Pros:

  • Cleaner and more compact decompiled code.
  • Less complex analyses.
  • Less expensive (i.e. faster) analyses.
  • This will uncover some other RetDec problems -> more issues.

Cons:

  • Loss of info needed for inter-procedural register analysis - probably not really needed - see Hex-Rays below.
  • This will uncover some other RetDec problems -> more issues.

Hex-Rays experiments:

  • Experiments with Hex-Rays decompiler showed that they probably do a version of this and don't care about possible loss of inter-procedural relations on registers.
  • Example:
    • Original code:
    int g1, g2;
    void f1() {
       g1 = rand();
       g2 = rand(); // -> ecx = rand();
    }
    void f2() {
       printf("%d\n", g1);
       printf("%d\n", g1); // -> printf("%d\n", ecx);
    }
    int main() {
       f1();
       f2();
    }
    • On ASM level I changed instruction to write to ecx instead of g2 in f1(), and read from ecx instead of g2 in f2().
    • Even though an inter-procedural (like RetDec is doing currently doing) analysis would find out that ecx = rand(); in f1() is used in a subsequently called function f2() and therefore should not be removed, Hex-Rays ignores this and throws the assignment away. It will use an uninitialized value representing ecx in f2().
    • Decompilation of modified binary:
    int g1;
    void f1() {
       g1 = rand();
       // missing (optimized-out) ecx = rand();
    }
    void f2() {
       int v1; // ecx
       printf("%d\n", g1);
       printf("%d\n", v1);
    }
    int main() {
       f1();
       f2();
    }
    • This happens in for selective decompilation (functio-by-function) and full decompilation (Produce file -> Create C file...).

P.S.
Thanks to discrete LLVM passes system used in RetDec, the whole localization will be implemented as a single, independent, pass. By default, it will be enabled, but it will be no problem to disable it on demand if needed/wanted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant