Skip to content

Commit

Permalink
feature: Add support for dataflow analysis in new module spoon-datafl…
Browse files Browse the repository at this point in the history
…ow (#2970)
  • Loading branch information
Egor18 authored and monperrus committed May 24, 2019
1 parent 0c26805 commit deb0dfc
Show file tree
Hide file tree
Showing 58 changed files with 8,975 additions and 0 deletions.
11 changes: 11 additions & 0 deletions chore/travis/travis-verify.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,14 @@ mvn verify license:check javadoc:jar install -DskipTests

# checkstyle in src/tests
mvn checkstyle:checkstyle -Pcheckstyle-test

# Spoon-dataflow
cd ../spoon-dataflow

# download and install z3 lib
wget https://github.com/Z3Prover/z3/releases/download/z3-4.8.4/z3-4.8.4.d6df51951f4c-x64-ubuntu-14.04.zip
unzip z3-4.8.4.d6df51951f4c-x64-ubuntu-14.04.zip
export LD_LIBRARY_PATH=./z3-4.8.4.d6df51951f4c-x64-ubuntu-14.04/bin

# build and run tests
./gradlew build
19 changes: 19 additions & 0 deletions spoon-dataflow/LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Copyright (c) 2019 Egor Bredikhin

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
135 changes: 135 additions & 0 deletions spoon-dataflow/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# spoon-dataflow

spoon-dataflow is a Z3 solver based data-flow analyzer for Java source code.

## Capabilities
At the moment spoon-dataflow has checkers which allow to detect logical errors (always true/false expressions) and null pointer dereferences.
For example:
```java
void f(boolean c) {
int x = 5;
int y = c ? 3 : 2;
if (x < y) {} // <= always false
}

void g(int x, int y) {
if (2 * x + 3 * y == 12) {
if (5 * x - 2 * y == 11) {
if (x == 3) {} // <= always true
if (y == 2) {} // <= always true
}
}
}

void h(Something x) {
if (x == null) {
x.f(); // <= null dereference
}
}
```
Check out test directory for more examples.

In general, spoon-dataflow is capable to evaluate expressions statically, perform symbolic execution, handle control flow of a program, and so on. It also features a proper memory model, so it nicely deals with reference aliasing.

## Build and run
In order to build spoon-dataflow you need JDK 8 or newer. Also, you have to download and install Z3.

### Windows:
1. Download z3-4.8.4 here: https://github.com/Z3Prover/z3/releases
2. Add bin directory to your PATH environment variable
3. Run `gradlew build`

Note: Visual C++ 2015 Redistributable may be required.

### Linux:
1. Download z3-4.8.4 here https://github.com/Z3Prover/z3/releases
2. Add bin directory to your LD_LIBRARY_PATH: `export LD_LIBRARY_PATH=/path/to/z3/bin`
3. Run `./gradlew build`

### macOS:
1. Download z3-4.8.4 here https://github.com/Z3Prover/z3/releases
2. Add bin directory to your DYLD_LIBRARY_PATH: `export DYLD_LIBRARY_PATH=/path/to/z3/bin`
3. Run `./gradlew build`

Note that you may still need to set up environment variables or java.lang.path in your IDE.

Now you can go to the build directory and run the resulting jar:
`java -jar spoon-dataflow.jar -sources <arg...> [-classpath <arg...>]`

## Under the hood
### AST
spoon-dataflow uses Spoon library to build and traverse AST in a post-order manner.
While traversing the AST, it translates the source code into a form understandable by Z3 solver. Then the checks are performed by the solver.

### Static single assignment form (SSA)
First of all, in order to be used with Z3, the code should be present in [Static Single Assignment form](https://en.wikipedia.org/wiki/Static_single_assignment_form). We do this to maintain the order of a program in a logic formula.
Here is a simple example:

Original form:
```
x = 1;
y = 2;
x = x + y;
```
SSA form:
```
x0 = 1;
y0 = 2;
x1 = x0 + y0;
```

### Conditions
After traversing if statement, we create a new variable, which is the result of a special if-then-else function provided by Z3. This function essentially joins values from branches over a condition.

Original form:
```
if (c) {
x = 1;
} else {
x = 2;
}
```
SSA form:
```
if (c) {
x0 = 1;
} else {
x1 = 2;
}
x3 = ITE(c0, x0, x1);
```

### Functions and interprocedural analysis
The most obvious approach to deal with function calls is to inline them. Unfortunately, we can do it only for small functions (like getters or setters), because of the performance considerations.
Another approach is to build Function Summaries (automatically or manually), which contain information about contracts, purity, return values and so on.
But in general, any unknown function resets the values of its arguments.

### Loops
The most obvious approach to deal with loops is to unroll them. Unfortunately, we can do it only when the number of iterations is known statically and relatively small.
So in general, we have to find loop invariants and reset everything else, like for the functions.
However, some special cases could be treated differently.

### Memory model
spoon-dataflow uses type-indexed memory model (Burstall’s memory model).
In this memory model each data type or field has its own memory array.
It allows to detect something like that:
```java
void m() {
C a = new C();
a.x = 10;
C b = a;
b.x = 20;
if (a.x == 20) {} // <= always true
}
```
Here are some useful links with a more detailed explanation of different memory models:
- https://llvm.org/pubs/2009-01-VMCAI-ScalableMemoryModel.pdf
- https://www.researchgate.net/publication/221430855_A_Memory_Model_for_Static_Analysis_of_C_Programs


## To Do list
At the moment this project is just a proof of concept, so there is a lot of work to do:
- Interprocedural analysis and Function Summaries;
- Parallel analysis and classes dependency graph;
- Support for the remaining language constructions;
- Better loop analysis;
38 changes: 38 additions & 0 deletions spoon-dataflow/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
plugins {
id 'java'
}

group 'fr.inria.spoon'
version '1.0'

sourceCompatibility = 1.8

repositories {
mavenCentral()
maven { url 'https://raw.github.com/SpoonLabs/spoon-dependencies/master' }
}

dependencies {
compile group: 'fr.inria.gforge.spoon', name: 'spoon-core', version: '7.2.0'
compile group: 'commons-cli', name: 'commons-cli', version: '1.4'
compile group: 'com.microsoft', name: 'z3', version: '4.8.4'
testCompile group: 'junit', name: 'junit', version: '4.12'
}

task copyDependencies(type: Copy) {
from configurations.compile
into "${buildDir}/libs"
}

compileJava {
dependsOn copyDependencies
}

jar {
archiveName 'spoon-dataflow.jar'
destinationDir buildDir
manifest {
attributes 'Main-Class': 'fr.inria.spoon.dataflow.Main'
attributes 'Class-Path': configurations.compile.collect { "libs/" + it.getName() }.join(' ')
}
}
Binary file added spoon-dataflow/gradle/wrapper/gradle-wrapper.jar
Binary file not shown.
6 changes: 6 additions & 0 deletions spoon-dataflow/gradle/wrapper/gradle-wrapper.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#Mon Apr 22 16:55:18 EDT 2019
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
zipStoreBase=GRADLE_USER_HOME
zipStorePath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-4.10-all.zip
172 changes: 172 additions & 0 deletions spoon-dataflow/gradlew
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
#!/usr/bin/env sh

##############################################################################
##
## Gradle start up script for UN*X
##
##############################################################################

# Attempt to set APP_HOME
# Resolve links: $0 may be a link
PRG="$0"
# Need this for relative symlinks.
while [ -h "$PRG" ] ; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG=`dirname "$PRG"`"/$link"
fi
done
SAVED="`pwd`"
cd "`dirname \"$PRG\"`/" >/dev/null
APP_HOME="`pwd -P`"
cd "$SAVED" >/dev/null

APP_NAME="Gradle"
APP_BASE_NAME=`basename "$0"`

# Add default JVM options here. You can also use JAVA_OPTS and GRADLE_OPTS to pass JVM options to this script.
DEFAULT_JVM_OPTS=""

# Use the maximum available, or set MAX_FD != -1 to use that value.
MAX_FD="maximum"

warn () {
echo "$*"
}

die () {
echo
echo "$*"
echo
exit 1
}

# OS specific support (must be 'true' or 'false').
cygwin=false
msys=false
darwin=false
nonstop=false
case "`uname`" in
CYGWIN* )
cygwin=true
;;
Darwin* )
darwin=true
;;
MINGW* )
msys=true
;;
NONSTOP* )
nonstop=true
;;
esac

CLASSPATH=$APP_HOME/gradle/wrapper/gradle-wrapper.jar

# Determine the Java command to use to start the JVM.
if [ -n "$JAVA_HOME" ] ; then
if [ -x "$JAVA_HOME/jre/sh/java" ] ; then
# IBM's JDK on AIX uses strange locations for the executables
JAVACMD="$JAVA_HOME/jre/sh/java"
else
JAVACMD="$JAVA_HOME/bin/java"
fi
if [ ! -x "$JAVACMD" ] ; then
die "ERROR: JAVA_HOME is set to an invalid directory: $JAVA_HOME
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi
else
JAVACMD="java"
which java >/dev/null 2>&1 || die "ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
Please set the JAVA_HOME variable in your environment to match the
location of your Java installation."
fi

# Increase the maximum file descriptors if we can.
if [ "$cygwin" = "false" -a "$darwin" = "false" -a "$nonstop" = "false" ] ; then
MAX_FD_LIMIT=`ulimit -H -n`
if [ $? -eq 0 ] ; then
if [ "$MAX_FD" = "maximum" -o "$MAX_FD" = "max" ] ; then
MAX_FD="$MAX_FD_LIMIT"
fi
ulimit -n $MAX_FD
if [ $? -ne 0 ] ; then
warn "Could not set maximum file descriptor limit: $MAX_FD"
fi
else
warn "Could not query maximum file descriptor limit: $MAX_FD_LIMIT"
fi
fi

# For Darwin, add options to specify how the application appears in the dock
if $darwin; then
GRADLE_OPTS="$GRADLE_OPTS \"-Xdock:name=$APP_NAME\" \"-Xdock:icon=$APP_HOME/media/gradle.icns\""
fi

# For Cygwin, switch paths to Windows format before running java
if $cygwin ; then
APP_HOME=`cygpath --path --mixed "$APP_HOME"`
CLASSPATH=`cygpath --path --mixed "$CLASSPATH"`
JAVACMD=`cygpath --unix "$JAVACMD"`

# We build the pattern for arguments to be converted via cygpath
ROOTDIRSRAW=`find -L / -maxdepth 1 -mindepth 1 -type d 2>/dev/null`
SEP=""
for dir in $ROOTDIRSRAW ; do
ROOTDIRS="$ROOTDIRS$SEP$dir"
SEP="|"
done
OURCYGPATTERN="(^($ROOTDIRS))"
# Add a user-defined pattern to the cygpath arguments
if [ "$GRADLE_CYGPATTERN" != "" ] ; then
OURCYGPATTERN="$OURCYGPATTERN|($GRADLE_CYGPATTERN)"
fi
# Now convert the arguments - kludge to limit ourselves to /bin/sh
i=0
for arg in "$@" ; do
CHECK=`echo "$arg"|egrep -c "$OURCYGPATTERN" -`
CHECK2=`echo "$arg"|egrep -c "^-"` ### Determine if an option

if [ $CHECK -ne 0 ] && [ $CHECK2 -eq 0 ] ; then ### Added a condition
eval `echo args$i`=`cygpath --path --ignore --mixed "$arg"`
else
eval `echo args$i`="\"$arg\""
fi
i=$((i+1))
done
case $i in
(0) set -- ;;
(1) set -- "$args0" ;;
(2) set -- "$args0" "$args1" ;;
(3) set -- "$args0" "$args1" "$args2" ;;
(4) set -- "$args0" "$args1" "$args2" "$args3" ;;
(5) set -- "$args0" "$args1" "$args2" "$args3" "$args4" ;;
(6) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" ;;
(7) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" ;;
(8) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" ;;
(9) set -- "$args0" "$args1" "$args2" "$args3" "$args4" "$args5" "$args6" "$args7" "$args8" ;;
esac
fi

# Escape application args
save () {
for i do printf %s\\n "$i" | sed "s/'/'\\\\''/g;1s/^/'/;\$s/\$/' \\\\/" ; done
echo " "
}
APP_ARGS=$(save "$@")

# Collect all arguments for the java command, following the shell quoting and substitution rules
eval set -- $DEFAULT_JVM_OPTS $JAVA_OPTS $GRADLE_OPTS "\"-Dorg.gradle.appname=$APP_BASE_NAME\"" -classpath "\"$CLASSPATH\"" org.gradle.wrapper.GradleWrapperMain "$APP_ARGS"

# by default we should be in the correct project dir, but when run from Finder on Mac, the cwd is wrong
if [ "$(uname)" = "Darwin" ] && [ "$HOME" = "$PWD" ]; then
cd "$(dirname "$0")"
fi

exec "$JAVACMD" "$@"
Loading

0 comments on commit deb0dfc

Please sign in to comment.