Skip to content

Commit

Permalink
0.6.0 - Decompiler improvements for loops, if-statements, try-catch, …
Browse files Browse the repository at this point in the history
…and synchronized blocks.
  • Loading branch information
TeamworkGuy2 committed Oct 22, 2022
1 parent 3653372 commit 43e8f80
Show file tree
Hide file tree
Showing 115 changed files with 3,379 additions and 1,608 deletions.
23 changes: 22 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,30 @@
All notable changes to this project will be documented in this file.
This project does its best to adhere to [Semantic Versioning](http://semver.org/).

--------
### [0.6.0](N/A) - 2022-10-22
__Add loop and if-statements detection to decompilation, also handle basic try-catch and synchronized blocks.__
#### Added
* `StringBuilderIndent` provides the same API as `StringBuilder` (unfortunately we can't extend StringBuilder because it is final) and implements `Indent` so that this class can be used to easily build source code strings
* `src/twg2/jbcm/toSource/structures` with state handlers for inserting more complex structures such as loops and try-catch statements into source code during opcode iteration

#### Changed
* `twg2.jbcm.classFormat.attributes.Code` `toClassString()` renamed `toClassCodeString()`
* `CpIndexChanger` is now stateful and contains the old and new index and uses a proper visitor pattern to handle changing indexes
* `Indent` changed from a class to an interface with a public static `Impl` subclass
* `IterateCode` renamed `CodeIterator`
* `RuntimeReloadMain` refactored, more complex threaded loading and invocation of methods from updated class files, some code moved to new classes `twg2.jbcm.runtime.ClassLoaders` and `FileUtility`
* `twg2.jbcm.runtimeLoading` package renamed `twg2.jbcm.runtime`
* `CodeFlow` contains algorithms for detecting loops and if-statements in byte code
* `DataCountingInputStream` added and used in `ClassFile` when parsing a class to improve debug and error message with exact byte index locations
* Several new unit/integration tests added, `CompileTest` renamed `CompileJava`
* Fixed compiling code during runtime to support class names with arbitrary package paths, required extensive changes to `CompileSource`
* `JumpConditionInfo` rewritten to support representing loop and nested if-statement conditions
* `TypeUtility.classNameFieldDescriptor()` renamed `toBinaryClassName()`


--------
### [0.5.1](N/A) - 2020-12-05
### [0.5.1](https://github.com/TeamworkGuy2/ClassLoading/commit/3653372d7564f749135a7db119a70f77df8b1696) - 2020-12-05
__Fix `switch` statements to decompile much more accurately based on code flow analysis. Start work on `if` statements.__
#### Added
* new `Indent` class to handle `SourceWriter` indentation
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Also see [extract-opcodes.js](extract-opcodes.js) file for how the enum literals
### twg2.jbcm.dynamicModification & twg2.jbcm.parserExamples
Classes used by the example and test packages.

### twg2.jbcm.runtimeLoading
### twg2.jbcm.runtime
Runtime class loading.

### twg2.jbcm.main
Expand Down
Binary file modified bin/class_loading.jar
Binary file not shown.
2 changes: 1 addition & 1 deletion package-lib.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"version" : "0.5.1",
"version" : "0.6.0",
"name" : "class-loading",
"description" : "Java class file parsing, manipulation, and to human readable representation",
"homepage" : "https://github.com/TeamworkGuy2/ClassLoading",
Expand Down
2 changes: 1 addition & 1 deletion res/build.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ Run the following command to compile java files in the source\classLoading folde
This folder is for various java files used to test dynamic reloading that cannot be part of the eclipse project's source/build path.
Navigate to the classpath folder, then run the following command:

"C:\Program Files\Java\jdk1.8.0_112\bin\javac" -sourcepath source -classpath C:\Users\TeamworkGuy2\Documents\Java\Projects\ClassLoading\res source\classLoading\*.java source\classLoading\base\*.java source\classLoading\load\*.java -d destination
"C:\Program Files\Java\jdk1.8.0_25\bin\javac" -sourcepath source -classpath C:\Users\TeamworkGuy2\Documents\Java\Projects\ClassLoading\res source\classLoading\*.java source\classLoading\base\*.java source\classLoading\load\*.java -d destination
257 changes: 244 additions & 13 deletions src/twg2/jbcm/CodeFlow.java
Original file line number Diff line number Diff line change
@@ -1,15 +1,25 @@
package twg2.jbcm;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.atomic.AtomicInteger;

import twg2.collections.primitiveCollections.IntArrayList;
import twg2.collections.primitiveCollections.IntListReadOnly;
import twg2.jbcm.Opcodes.Type;
import twg2.jbcm.ir.JumpConditionInfo;
import twg2.jbcm.ir.JumpConditionInfo.UsageHint;

/** Trace all possible paths through the code in a method. A code flow follows jump, branch/condition, return, and throw instructions.
* Circular paths end at the first jump/branch destination which already exists in the code flow.
* @author TeamworkGuy2
* @since 2020-12-03
*/
public class CodeFlow {
/** The size of a GOTO instruction, 1 byte opcode + 2 byte operand */
public static final int GOTO_SIZE = 3;


/** Starting at a given point in a bytecode array, follow code jumps and branches to all termination (return/throw) points potentially reachable from the starting point
* @param idx the starting point
Expand All @@ -20,22 +30,32 @@ public class CodeFlow {
* and can easily be converted back by negating them again. This differentiates non-terminal indexes from all
* valid terminal indexes because valid code indexes cannot be less than 0.
*/
public static IntArrayList getFlowPaths(int idx, byte[] instr, IntArrayList dstPath) {
for(int i = idx, size = instr.length; i < size; i++) {
Opcodes opc = Opcodes.get(instr[i] & 0xFF);
public static IntArrayList getFlowPaths(byte[] code, int idx) {
var dstPath = new IntArrayList();
getFlowPaths(code, idx, code.length, dstPath, 0);
return dstPath;
}


public static int getFlowPaths(byte[] code, int idx, int max, IntArrayList dstPath, int pathJumps) {
for(int i = idx; i < max; i++) {
Opcodes opc = Opcodes.get(code[i]);
int numOperands = opc.getOperandCount();

// Type.JUMP instruction set includes all Type.CONDITION instructions
if(opc.hasBehavior(Type.JUMP)) {
// follow the jump path if it has not already been followed (to avoid loops)
if(!dstPath.contains(~i)) {
dstPath.add(~i);
// skip the jump path if it has already been followed and this is the beginning (to avoid loops)
if(dstPath.contains(~i) && pathJumps == 0) {
break;
}
int jumpDst = opc.getJumpDestination(instr, i);
dstPath.add(~i);
pathJumps++;
int jumpDst = opc.getJumpDestination(code, i);
if(jumpDst < 0) {
jumpDst = opc.getJumpDestination(instr, i);
jumpDst = opc.getJumpDestination(code, i);
}
getFlowPaths(jumpDst, instr, dstPath);
int subPathJumps = getFlowPaths(code, jumpDst, max, dstPath, pathJumps);
pathJumps = subPathJumps;

// end this code path if the jump path is unconditional (i.e. GOTO or JSR)
if(!opc.hasBehavior(Type.CONDITION)) {
Expand All @@ -45,13 +65,224 @@ public static IntArrayList getFlowPaths(int idx, byte[] instr, IntArrayList dstP
// end this code flow path once a terminal instruction is reached
else if(opc.hasBehavior(Type.RETURN) || opc == Opcodes.ATHROW) {
dstPath.add(i);
pathJumps = 0;
break;
}

i += (numOperands < 0 ? 0 : numOperands);
}

return dstPath;
return pathJumps;
}


/**
* @param code the code array
* @param offset the offset into the code array at which to start finding instructions
* @param length the number of bytes of the code array to check through
* @return
*/
public static List<JumpConditionInfo> findFlowConditions(byte[] code, int offset, int length) {
var conditions = new ArrayList<JumpConditionInfo>(); // track GOTO/IF_* loops detected in the code

// BYTECODE LOOP:
for(int i = offset, size = offset + length; i < size; i++) {
Opcodes opc = Opcodes.get(code[i]);
int numOperands = opc.getOperandCount();
// Special handling for instructions with unpredictable byte code lengths
if(numOperands == Opcodes.Const.UNPREDICTABLE) {
if(Opcodes.WIDE.is(code[i])) {
i++; // WIDE opcodes are nested around other operations
opc = Opcodes.get(code[i]);
numOperands = opc.getOperandCount() * 2; // WIDE opcodes double the operands of the widened opcode
}
else if(Opcodes.TABLESWITCH.is(code[i])) {
throw new IllegalStateException("tableswitch code handling not implemented");
}
else if(Opcodes.LOOKUPSWITCH.is(code[i])) {
throw new IllegalStateException("lookupswitch code handling not implemented");
}
}
int jumpRelative = CodeUtility.loadOperands(numOperands, code, i);

// form 1: [..., GOTO <setup_if[0]>, instructions[], setup_if[], IF_* <instructions[0]>, ...] - for()/while() forward GOTO, condition after loop with backward jump
// form 2: [..., setup_if[], IF_* <after[0]>, instructions[], GOTO <setup_if[0]>, after[], ...] - for()/while() condition before loop with forward jump, backward GOTO
// form 3: [..., instructions[], setup_if[], IF_* <instructions[0]>, after[], ...] - do{}while() condition after loop with backward jump
var isJump = opc.hasBehavior(Opcodes.Type.JUMP);
// backward jump, required for a loop (thought experiment: create a loop, using Java bytecodes, that does not jump backward)
// although a code obfuscator could re-arrange code and include backward jumps so not all backward jumps are loops
if(isJump && jumpRelative < 0) {
conditions.add(JumpConditionInfo.loadConditionFlow(opc, i, jumpRelative, code, UsageHint.FOR_OR_WHILE_LOOP));
// 'for' or 'while' loop has to evaluate the condition first so it needs an IF or GOTO at the beginning
// 'do-while' loop evaluates condition after loop runs once, only compiled form seen so far is: no GOTO and one backward jump at the end
}
else if(opc.hasBehavior(Opcodes.Type.CONDITION)) {
conditions.add(JumpConditionInfo.loadConditionFlow(opc, i, jumpRelative, code, UsageHint.IF));
}
i += (numOperands < 0) ? 0 : numOperands;
}

Collections.sort(conditions, JumpConditionInfo.LOWER_INDEX_SORTER);

// post processing - convert special cases
for(int i = 0, size = conditions.size(); i < size; i++) {
var loop = conditions.get(i);
// find and convert if-conditions that may have been miss-identified as loops
// case: an if-statement inside a loop where there are no instructions after the if-statement and before the
// end of the loop may be compiled as a condition with a backward jump and thus look like a loop, we can tell
// in the case when it shares the same jump destination as the closest parent loop that contains it
// form: [..., loop_start, instructions[], setup_if[], IF_* <loop_start>, instructions_in_if[], loop_end, ...]
if(loop.targetOffset < 0) {
var targetIndex = loop.getTargetIndex();
var loopUpperIndex = loop.getUpperIndex();
// look at conditions beyond the current one since they are later in the code or contained within the
// current loop and a nested if-statement is contained within the nearest parent loop
for(int j = i + 1; j < size; j++) {
var loopJ = conditions.get(j);
if(loopJ.opcIdx > loopUpperIndex) {
break; // skip remaining conditions once we're past beyond the bounds of the current one
}
if(loopJ.targetOffset < 0 && targetIndex == loopJ.getTargetIndex() && containsIndex(loop, loopJ.opcIdx)) {
// TODO debugging
System.out.println("converted loop to nested IF-within-loop at " + loopJ.opcIdx + " (" + loopJ.opc + ") contained in " + loop + " to " + targetIndex);

conditions.set(j, loopJ.withLoopEndIndexForIf(loopUpperIndex));
}
}
}

// set the potential-if-index of loops
if(UsageHint.isLoop(loop.usageHint) && loop.potentialIfIndex < 0) {
var loopConditionIdx = findFirstIfConditionPointingToEndOf(conditions, i);

if(loopConditionIdx >= 0) {
loop = loop.withPotentialIfIndex(conditions.get(loopConditionIdx).opcIdx);
conditions.set(i, loop);

// TODO debugging
System.out.println("converted if index for loop: " + loop + " found IF " + (loopConditionIdx >= 0 ? conditions.get(loopConditionIdx) : "-1"));

conditions.remove(loopConditionIdx);
size--;
if(loopConditionIdx <= i) {
i--;
}
}
}
}

return conditions;
}


/** Find the first IF* condition that is contained within the condition located at {@code startIdx} in the {@code conditions} list.
* @param conditions list of conditions, should include all IF* and GOTO instructions in the code,
* sorted based on {@link JumpConditionInfo#getLowerIndex()}
* @param startIdx the index into the {@code conditions} list of the condition to find an IF* condition within
* @return the {@code conditions} index of the first matching IF* condition, else -1 if none is found
*/
public static int findFirstIfConditionPointingToEndOf(List<JumpConditionInfo> conditions, int startIdx) {
var withinThis = conditions.get(startIdx);
int maxIdx = withinThis.getUpperIndex();
int lowestOpcIdxFound = Integer.MAX_VALUE;
int lowestOpcIdxI = -1;

for(int i = startIdx + 1, size = conditions.size(); i < size; i++) {
var cond = conditions.get(i);
// stop once the condition isn't contained within the target condition, we can safely break because the loops are sorted by lower bound index
if(cond.getLowerIndex() > maxIdx) {
break;
}
if(cond != withinThis && cond.opcIdx < lowestOpcIdxFound && containsIfAndEndsWith(withinThis, cond)) {
lowestOpcIdxFound = cond.opcIdx;
lowestOpcIdxI = i;
}
}
return lowestOpcIdxI;
}


public static boolean containsJumpTo(byte[] code, int offset, int length, int targetIndex) {
// BYTECODE LOOP:
for(int i = offset, size = offset + length; i < size; i++) {
Opcodes opc = Opcodes.get(code[i]);
int numOperands = opc.getOperandCount();
// Special handling for instructions with unpredictable byte code lengths
if(numOperands == Opcodes.Const.UNPREDICTABLE) {
if(Opcodes.WIDE.is(code[i])) {
i++; // WIDE opcodes are nested around other operations
opc = Opcodes.get(code[i]);
numOperands = opc.getOperandCount() * 2; // WIDE opcodes double the operands of the widened opcode
}
else if(Opcodes.TABLESWITCH.is(code[i])) {
throw new IllegalStateException("tableswitch code handling not implemented");
}
else if(Opcodes.LOOKUPSWITCH.is(code[i])) {
throw new IllegalStateException("lookupswitch code handling not implemented");
}
}
if(opc.hasBehavior(Opcodes.Type.JUMP)) {
int jumpRelative = CodeUtility.loadOperands(numOperands, code, i);
if(i + jumpRelative == targetIndex) {
return true;
}
}

i += (numOperands < 0) ? 0 : numOperands;
}
return false;
}


public static int findLastOpcodeIndex(byte[] instr, int start, int end) {
AtomicInteger lastIdx = new AtomicInteger(-1);
CodeUtility.forEach(instr, start, end - start, (opc, instrs, idx) -> {
lastIdx.set(idx);
});
return lastIdx.get();
}


public static int findContainsIfIndex(List<JumpConditionInfo> loops, int index) {
for(int i = 0, size = loops.size(); i < size; i++) {
if(loops.get(i).potentialIfIndex == index) {
return i;
}
}
return -1;
}


public static int findOpcIndex(List<JumpConditionInfo> loops, int index) {
for(int i = 0, size = loops.size(); i < size; i++) {
if(loops.get(i).opcIdx == index) {
return i;
}
}
return -1;
}


public static boolean containsIndex(JumpConditionInfo cond, int index) {
var condTarget = cond.opcIdx + cond.targetOffset;
// avoid branch logic (ternary statements such as Math.min/max)
return (index >= cond.opcIdx && index <= condTarget) || (index >= condTarget && index <= cond.opcIdx);
}


/**
* Check that an {@code ifCond}'s lower bound (generally its opcode index) is within a loop condition's
* bounds and that the {@code ifCond}'s upper bound (generally its target index) is the instruction immediately after
* the loop end instruction.
* ASSUMPTION: the {@code loopCond}'s opcode index is its upper bound (i.e. the loop ends with a backward jump instruction)
* @param loopCond the loop condition
* @param ifCond the other condition, could be a loop or if
* @return true if the conditions described above hold, false if not
*/
public static boolean containsIfAndEndsWith(JumpConditionInfo loopCond, JumpConditionInfo ifCond) {
return loopCond.getTargetIndex() <= ifCond.getLowerIndex() &&
// require the match to be a condition that jumps to the instruction after the loop
loopCond.getOpcodeIndex() + loopCond.opc.getOperandCount() + 1 == ifCond.getUpperIndex();
}


Expand All @@ -66,18 +297,18 @@ public static int maxIndex(IntListReadOnly codeFlow) {
}


public static String flowPathToString(byte[] instr, IntListReadOnly codeFlow) {
public static String flowPathToString(byte[] code, IntListReadOnly codeFlow) {
var sb = new StringBuilder();
for(int i = 0, size = codeFlow.size(); i < size; i++) {
var idx = codeFlow.get(i);
// a conditional/jump point
if(idx < 0) {
var opc = Opcodes.get(instr[~idx] & 0xFF);
var opc = Opcodes.get(code[~idx]);
sb.append(~idx).append(' ').append(opc).append(" -> ");
}
// a terminal point
else {
var opc = Opcodes.get(instr[idx] & 0xFF);
var opc = Opcodes.get(code[idx]);
sb.append(idx).append(' ').append(opc).append("], ");
}
}
Expand Down
Loading

0 comments on commit 43e8f80

Please sign in to comment.