This is the repository for my contribution to Google’s Summer of Code 2017.
- Link
- https://summerofcode.withgoogle.com/projects/#5306561080590336
- Student
- Nicolas Jeannerod
- Project Name
- Verification and Testing of Heap-based Programs with Symbolic PathFinder
- Organization
- The Java Pathfinder Team
- Mentors
-
- Nikos Gorogiannis
- Aymeric Fromherz
Symbolic Pathfinder (SPF) is an open-source symbolic execution tool, based on the NASA Java Pathfinder (JPF) model checker, which is used in research and industry labs. It executes Java bytecode using a custom JVM to perform its analysis.
It currently uses lazy initialization, a brute-force enumeration of all heap objects that can bind to the structured inputs accessed by the program. This explicit enumeration may lead to path explosion, a huge amount of false alarms and a lot of memory and time wasted for nothing.
In this project, we tried to explore an alternative way of representing constraints over the heap. This would allow SPF to avoid a complete enumeration of all the possible cases, eliminating the ones violating the data structures properties. We focused in particular on separation logic and tried to determine whether it would bring an improvement compared to the lazy initialization.
This repository contains a separation logic-augmented SPF (SPF+SL). The extension will be detailed hereafter.
The project contains a Dockerfile to create a complete environment for executing the tool and reproducing the results. Building the docker image and running it on an example can be done easily:
docker build -t gsoc .
docker run gsoc jpf-symbc/src/examples/seplogic/TestExtends.jpf
It is of course possible to build and run JPF/SPF locally. This will at least require the following dependencies to be installed: Java, ANT, Autoconf, Swig2, GMP and ANTLR3.
On Debian, installing the following packages should be sufficient:
openjdk-8-jdk-headless
, openjdk-8-jre-headless
, ant
, build-essential
,
autoconf
, libtool
, swig2.0
, libgmp-dev
, antlr3
, libantlr3c-dev
,
libboost-dev
.
JPF/SPF runs on Java .class
files and JPF-specific .jpf
files
containing the necessary configuration. Here is an example of a .jpf
file:
target = seplogic.TestNoncyclic
classpath = ${jpf-symbc}/build/examples
sourcepath = ${jpf-symbc}/src/examples
type_classpath = ${jpf-symbc}/build/examples/seplogic
symbolic.debug = true
symbolic.seplogic = true
search.multiple_errors = true
symbolic.method = seplogic.TestNoncyclic.length_noncyclic(sym)
symbolic.seplogic.precondition = seplogic.TestNoncyclic.length_noncyclic#0->Tree(next)
search.depth_limit = 10
This configuration file is an extended key-value file (see the full description here in JPF’s documentation). Important generic options are:
target
that points to the.class
file we are interested in;symbolic.debug
that enables the debugging of SPF;search.multiple_errors
that tells JPF to keep going when encountering an exception;search.depth_limit
that tells JPF to stop exploring after reaching a certain depth. This is very important in the case of symbolic execution of objects of potentially-unbounded depth (lists, trees);symbolic.method
that tells SPF which methods should have their arguments considered symbolic. In that example, it is the methodlength_noncyclic
in the classseplogic.TestNoncyclic
, that takes only one argument that should be considered symbolic. A common mistake is to think that this method will be the entry point of JPF/SPF. This is not true: the entry point is themain
method, as usual in Java. However, whenever JPF will encounter the methodlength_noncyclic
and run it, SPF will start executing and do its job.
There are now options that are specific to SPF+SL:
symbolic.seplogic
that enables the seplogic module, just likesymbolic.lazy
would enable the lazy-initialization module. Note thatsymbolic.seplogic
will take priority oversymbolic.lazy
if both are set to true.symbolic.seplogic.precondition
that gives information about the variables of the methods insymbolic.method
. The syntax of the preconditions is described hereafter.
The precondition language is quite
limited. symbolic.seplogic.precondition
is a comma-separated list of
preconditions. These preconditions can be either:
- an equality
=
(resp. a disequality!=
) between two variables - an equality
=
(resp. a disequality!=
) between a variable andnil
; - a unary predicate applied to a variable.
All the variables are considered separated. They must be described by their absolute name composed of the name of the class followed by the name of the method and the number of the variable. Here is an example:
seplogic.TestNoncyclic . length_noncyclic # 0 ^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ ^ class method variable
The only predicate/s available at the moment is/are “tree”. This predicate can be indexed by a set of strings. For instance:
ClassName.methodName#3 -> Tree(right,left)
specifies that the fourth variable of methodName
in className
is
either null
or points to an object with at least two fields, right
and
left
, which are themselves Tree(right,left)
. In particular, Tree(next)
would represent a linked list.
There is no way to define a new predicate through the precondition language. However, it is quite easy to add them to SPF+SL. For now, the predicates can only be unary, and can not involve a branching of the tool. For instance, a predicate like the following (in Cyclist’s syntax) cannot be implemented:
BinTreeSeg {
x=y => BinTreeSeg(x,y) |
x->x',y' * BinTreeSeg(x',y) * BinTree(y') => BinTreeSeg(x,y) |
x->x',y' * BinTree(x') * BinTreeSeg(y',y) => BinTreeSeg(x,y)
} ;
The base idea is to overload the JVM’s instructions that talk about the heap, and to use them to keep a constraint talking about the state of the heap up-to-date. Each branch of the symbolic execution gets its own constraint. Every once in a while, these constraints are be sent to a prover and checked for unsatisfiability. All the branches corresponding to an unsatisfiable constraint are then killed, avoiding to spend time in branches that actually unreachable.
By default, the constraints built by SPF+SL are always satisfiable because they represent states of the memory that are built using the JVM’s instructions. This is only in the presence of preconditions that we can start killing branches. These preconditions have to be provided by the user.
The first goal was to have a modular interface on which we could plug any prover. We showed that it was possible against both CVC4 and Cyclist. However, we encountered two major difficulties, coming from two features unsupported by most separation logic provers:
- non-separating clauses – that is, the logical “and” –; that means
that we have to handle the unfolding of the predicates by
ourselves. Indeed, when we have both
x -> {| ... |}
andx -> A
, for instance:- we can’t write
x -> {| ... |} ∧ x -> A
as provers do not support that, - we can’t write
x -> {| ... |} * x -> A
as the separation would make this constraint unsatisfiable, although this is not the case.
- we can’t write
- record update; that means that we have to handle the update of a constraint ourselves.
Having to handle the two by ourselves led to the creation of a more subtle data-structure that would make efficient these two operations. This structure is a union-find structure (for an efficient handling of aliases) where the representant of each equivalence class carry the information that we have on it.
In fact, once this structure exists, there is not much to add to obtain a full separation logic prover. In addition, keeping the prover inside SPF had a few other advantages:
- this does not require the additional translation step nor the spawning of an external process;
- the symbolic engine already takes care of a part of what we need, making the SL-prover much easier to write;
- the check for unsatisfiability can be checked incrementally while we update the structure, allowing an important speedup.
For this reason, we decided to forget about sending the constraints to external provers and to have it in SPF.
The tool that is present in this repository is a modification of SPF in which:
- some symbolic JVM instructions have been overriden (see
bytecode/seplogic
theSymbolicInstructionFactory
as well asheap/seplogic/Helper
), - specific
HeapChoiceGenerator
andPathCondition
have been written (inheap/seplogic
), - and more importantly, a constraint structure has been implemented
(in
heap/seplogic/constraint
).
This constraint structure is an efficiently-written union-find
structure on SymbolicInteger
(think of it as the variables in SPF). It
also carries an information for each equivalence class that can be:
- that they are
nil
, - that they point to a record, and the fields of the records (represented as other nodes of the structure directly, for the sake of efficiency),
- that there is (or are) a predicate that gives us information about them.
All the operations that this structure allows of this structure may
raise an UnsatException
. They are:
- the
find
operation of the union-find that takes a node and return the node representing its equivalence class. This can allow to test the equality of two nodes, but this is not the case here. However, it allows to find the representant, which is the one carrying the important information. - the
union
operation of the union-find that takes two nodes and merges their equivalence classes. This corresponds to adding an equality between two variables. This merge operation also checks that the merge is allowing – that is, that there is no disequality about those two equivalence classes. Finally, it also merges the information carried by the nodes, which is where everything subtle happens. - the addition of some information to a node. This also triggers a merge of this information with the information that is already there.
- the update of a record information. This is much easier than in a standard SL prover. In a standard SL prover, there is a phase of rearrangement that makes sure that this field update will make sense (that is, there is indeed a record onto which this variable points containing the right field). In our case, and since we are working with a symbolic engine that already checked that, there is no need.
The merge of two information is where we really check the
unsatisfiability of a formula. Some things are trivial to merge (nil
with nil
is nil
, nil
with a record is unsatisfiable, …) and some are
more difficult.
For instance, merging two records is something important. We build a
record whose fields will be the union of the fields of the two records
to merge. Whenever a field is present in both the records, we trigger
a union
of the nodes they contain, which may eventually lead to an
UnsatException
.
Everything becomes more complex as soon as there are predicates. The merge of two predicates is quite harmless: we keep them both. However, as soon as we know something else about the node, we need to unfold the predicate/s, and see which branch of the unfolding makes sense. For now, there may be only one.
In the code, all these “information” extend the abstract class
heap/seplogic/Information
. The predicates extend the abstract class
heap/seplogic/Predicate
(which is, itself, extending
Information
). They have to provide a unify
function that takes the
other information and deals with the merge.
When unifying a predicate with an other information, the only
important cases are the case of nil
and the case of a record, which
makes predicates easy to write.
The union-find structure has been written in an efficient way. There is however some room of improvement on the disequality test. We believe that the main bottleneck comes from the fact that, each time SPF branches (which happens quite a lot), we have to copy the whole structure. It would be much efficient to work with persistent data structures.
Nevertheless, the tool seems quite efficient: given the right preconditions, it can kill a lot of branches. The overhead it involves in time is very little (actually, it is even much better than the usual prover’s overhead, which may be explained by the incrementallity of our prover and the fact that we don’t need to translate the whole constraint every time).
We attempted to run benchmarks on SPF vs. SPF+SL to highlight those
facts. These benchmarks can be reproduced by simply running the
benchmarks
script in the root of the repository. There are two
benchmarks only right now. The first one compares the SPF against
SPF+SL with preconditions, to show that killing branches does save a
lot of time. The second one compares SPF and SPF+SL but makes sure
that the same branches are killed in both cases. It aims at showing
that, even when killing branches does not make a difference, we save a
lot of time just because of the implementation that is incremental and
without external solver.
Although the tool seems quite efficient as it is, there is still a lot of room for improvement. In particular:
- It would be nice to have a way to support non-separated variable, as it is sometimes wanted.
- We would like to make SPF branch from the SL structure. That would allow the predicates to be written in an easier way (and maybe even read from the preconditions), and to support predicates where we do not know which branch to choose.
- We would like to support non-unary predicates, like the segments. This goes with the support of branching in SPF, as most of the natural branching predicates are not unary.
- We would like SPF to carry enough information on the concrete nodes
to have more precise constraints. For now, the constraint looses a
lot of precision each time it hash to handle a
PUTFIELD
, as we cannot determine what has been put in the field (and we can only create a fresh variable).