Skip to content
/ GSoC17 Public

Google Summer of Code 2017 – Verification and Testing of Heap-based Programs with Symbolic PathFinder

Notifications You must be signed in to change notification settings

Niols/GSoC17

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Verification and Testing of Heap-based Programs with Symbolic PathFinder

This is the repository for my contribution to Google’s Summer of Code 2017.

Project description

Link
https://summerofcode.withgoogle.com/projects/#5306561080590336
Student
Nicolas Jeannerod
Project Name
Verification and Testing of Heap-based Programs with Symbolic PathFinder
Organization
The Java Pathfinder Team
Mentors
  • Nikos Gorogiannis
  • Aymeric Fromherz

Abstract

Symbolic Pathfinder (SPF) is an open-source symbolic execution tool, based on the NASA Java Pathfinder (JPF) model checker, which is used in research and industry labs. It executes Java bytecode using a custom JVM to perform its analysis.

It currently uses lazy initialization, a brute-force enumeration of all heap objects that can bind to the structured inputs accessed by the program. This explicit enumeration may lead to path explosion, a huge amount of false alarms and a lot of memory and time wasted for nothing.

In this project, we tried to explore an alternative way of representing constraints over the heap. This would allow SPF to avoid a complete enumeration of all the possible cases, eliminating the ones violating the data structures properties. We focused in particular on separation logic and tried to determine whether it would bring an improvement compared to the lazy initialization.

This repository contains a separation logic-augmented SPF (SPF+SL). The extension will be detailed hereafter.

Building and running

With Docker

The project contains a Dockerfile to create a complete environment for executing the tool and reproducing the results. Building the docker image and running it on an example can be done easily:

docker build -t gsoc .
docker run gsoc jpf-symbc/src/examples/seplogic/TestExtends.jpf

Without Docker

It is of course possible to build and run JPF/SPF locally. This will at least require the following dependencies to be installed: Java, ANT, Autoconf, Swig2, GMP and ANTLR3.

On Debian, installing the following packages should be sufficient: openjdk-8-jdk-headless, openjdk-8-jre-headless, ant, build-essential, autoconf, libtool, swig2.0, libgmp-dev, antlr3, libantlr3c-dev, libboost-dev.

Trying it out on your own examples

The JPF file

JPF/SPF runs on Java .class files and JPF-specific .jpf files containing the necessary configuration. Here is an example of a .jpf file:

target = seplogic.TestNoncyclic

classpath = ${jpf-symbc}/build/examples
sourcepath = ${jpf-symbc}/src/examples
type_classpath = ${jpf-symbc}/build/examples/seplogic

symbolic.debug = true
symbolic.seplogic = true
search.multiple_errors = true

symbolic.method = seplogic.TestNoncyclic.length_noncyclic(sym)
symbolic.seplogic.precondition = seplogic.TestNoncyclic.length_noncyclic#0->Tree(next)

search.depth_limit = 10

This configuration file is an extended key-value file (see the full description here in JPF’s documentation). Important generic options are:

  • target that points to the .class file we are interested in;
  • symbolic.debug that enables the debugging of SPF;
  • search.multiple_errors that tells JPF to keep going when encountering an exception;
  • search.depth_limit that tells JPF to stop exploring after reaching a certain depth. This is very important in the case of symbolic execution of objects of potentially-unbounded depth (lists, trees);
  • symbolic.method that tells SPF which methods should have their arguments considered symbolic. In that example, it is the method length_noncyclic in the class seplogic.TestNoncyclic, that takes only one argument that should be considered symbolic. A common mistake is to think that this method will be the entry point of JPF/SPF. This is not true: the entry point is the main method, as usual in Java. However, whenever JPF will encounter the method length_noncyclic and run it, SPF will start executing and do its job.

There are now options that are specific to SPF+SL:

  • symbolic.seplogic that enables the seplogic module, just like symbolic.lazy would enable the lazy-initialization module. Note that symbolic.seplogic will take priority over symbolic.lazy if both are set to true.
  • symbolic.seplogic.precondition that gives information about the variables of the methods in symbolic.method. The syntax of the preconditions is described hereafter.

The preconditions language

The precondition language is quite limited. symbolic.seplogic.precondition is a comma-separated list of preconditions. These preconditions can be either:

  • an equality = (resp. a disequality !=) between two variables
  • an equality = (resp. a disequality !=) between a variable and nil;
  • a unary predicate applied to a variable.

All the variables are considered separated. They must be described by their absolute name composed of the name of the class followed by the name of the method and the number of the variable. Here is an example:

seplogic.TestNoncyclic . length_noncyclic # 0
^^^^^^^^^^^^^^^^^^^^^^   ^^^^^^^^^^^^^^^^   ^
        class                 method        variable

The only predicate/s available at the moment is/are “tree”. This predicate can be indexed by a set of strings. For instance:

ClassName.methodName#3 -> Tree(right,left)

specifies that the fourth variable of methodName in className is either null or points to an object with at least two fields, right and left, which are themselves Tree(right,left). In particular, Tree(next) would represent a linked list.

There is no way to define a new predicate through the precondition language. However, it is quite easy to add them to SPF+SL. For now, the predicates can only be unary, and can not involve a branching of the tool. For instance, a predicate like the following (in Cyclist’s syntax) cannot be implemented:

BinTreeSeg {
  x=y => BinTreeSeg(x,y) |
  x->x',y' * BinTreeSeg(x',y) * BinTree(y') => BinTreeSeg(x,y) |
  x->x',y' * BinTree(x') * BinTreeSeg(y',y) => BinTreeSeg(x,y)
} ;

How it works

The idea

The base idea is to overload the JVM’s instructions that talk about the heap, and to use them to keep a constraint talking about the state of the heap up-to-date. Each branch of the symbolic execution gets its own constraint. Every once in a while, these constraints are be sent to a prover and checked for unsatisfiability. All the branches corresponding to an unsatisfiable constraint are then killed, avoiding to spend time in branches that actually unreachable.

By default, the constraints built by SPF+SL are always satisfiable because they represent states of the memory that are built using the JVM’s instructions. This is only in the presence of preconditions that we can start killing branches. These preconditions have to be provided by the user.

The first goal was to have a modular interface on which we could plug any prover. We showed that it was possible against both CVC4 and Cyclist. However, we encountered two major difficulties, coming from two features unsupported by most separation logic provers:

  • non-separating clauses – that is, the logical “and” –; that means that we have to handle the unfolding of the predicates by ourselves. Indeed, when we have both x -> {| ... |} and x -> A, for instance:
    • we can’t write x -> {| ... |} ∧ x -> A as provers do not support that,
    • we can’t write x -> {| ... |} * x -> A as the separation would make this constraint unsatisfiable, although this is not the case.
  • record update; that means that we have to handle the update of a constraint ourselves.

Having to handle the two by ourselves led to the creation of a more subtle data-structure that would make efficient these two operations. This structure is a union-find structure (for an efficient handling of aliases) where the representant of each equivalence class carry the information that we have on it.

In fact, once this structure exists, there is not much to add to obtain a full separation logic prover. In addition, keeping the prover inside SPF had a few other advantages:

  • this does not require the additional translation step nor the spawning of an external process;
  • the symbolic engine already takes care of a part of what we need, making the SL-prover much easier to write;
  • the check for unsatisfiability can be checked incrementally while we update the structure, allowing an important speedup.

For this reason, we decided to forget about sending the constraints to external provers and to have it in SPF.

In practice

The tool that is present in this repository is a modification of SPF in which:

  • some symbolic JVM instructions have been overriden (see bytecode/seplogic the SymbolicInstructionFactory as well as heap/seplogic/Helper),
  • specific HeapChoiceGenerator and PathCondition have been written (in heap/seplogic),
  • and more importantly, a constraint structure has been implemented (in heap/seplogic/constraint).

This constraint structure is an efficiently-written union-find structure on SymbolicInteger (think of it as the variables in SPF). It also carries an information for each equivalence class that can be:

  • that they are nil,
  • that they point to a record, and the fields of the records (represented as other nodes of the structure directly, for the sake of efficiency),
  • that there is (or are) a predicate that gives us information about them.

All the operations that this structure allows of this structure may raise an UnsatException. They are:

  • the find operation of the union-find that takes a node and return the node representing its equivalence class. This can allow to test the equality of two nodes, but this is not the case here. However, it allows to find the representant, which is the one carrying the important information.
  • the union operation of the union-find that takes two nodes and merges their equivalence classes. This corresponds to adding an equality between two variables. This merge operation also checks that the merge is allowing – that is, that there is no disequality about those two equivalence classes. Finally, it also merges the information carried by the nodes, which is where everything subtle happens.
  • the addition of some information to a node. This also triggers a merge of this information with the information that is already there.
  • the update of a record information. This is much easier than in a standard SL prover. In a standard SL prover, there is a phase of rearrangement that makes sure that this field update will make sense (that is, there is indeed a record onto which this variable points containing the right field). In our case, and since we are working with a symbolic engine that already checked that, there is no need.

The merge of information

The merge of two information is where we really check the unsatisfiability of a formula. Some things are trivial to merge (nil with nil is nil, nil with a record is unsatisfiable, …) and some are more difficult.

For instance, merging two records is something important. We build a record whose fields will be the union of the fields of the two records to merge. Whenever a field is present in both the records, we trigger a union of the nodes they contain, which may eventually lead to an UnsatException.

Everything becomes more complex as soon as there are predicates. The merge of two predicates is quite harmless: we keep them both. However, as soon as we know something else about the node, we need to unfold the predicate/s, and see which branch of the unfolding makes sense. For now, there may be only one.

In the code, all these “information” extend the abstract class heap/seplogic/Information. The predicates extend the abstract class heap/seplogic/Predicate (which is, itself, extending Information). They have to provide a unify function that takes the other information and deals with the merge.

When unifying a predicate with an other information, the only important cases are the case of nil and the case of a record, which makes predicates easy to write.

Efficiency

The union-find structure has been written in an efficient way. There is however some room of improvement on the disequality test. We believe that the main bottleneck comes from the fact that, each time SPF branches (which happens quite a lot), we have to copy the whole structure. It would be much efficient to work with persistent data structures.

Nevertheless, the tool seems quite efficient: given the right preconditions, it can kill a lot of branches. The overhead it involves in time is very little (actually, it is even much better than the usual prover’s overhead, which may be explained by the incrementallity of our prover and the fact that we don’t need to translate the whole constraint every time).

We attempted to run benchmarks on SPF vs. SPF+SL to highlight those facts. These benchmarks can be reproduced by simply running the benchmarks script in the root of the repository. There are two benchmarks only right now. The first one compares the SPF against SPF+SL with preconditions, to show that killing branches does save a lot of time. The second one compares SPF and SPF+SL but makes sure that the same branches are killed in both cases. It aims at showing that, even when killing branches does not make a difference, we save a lot of time just because of the implementation that is incremental and without external solver.

Future work

Although the tool seems quite efficient as it is, there is still a lot of room for improvement. In particular:

  • It would be nice to have a way to support non-separated variable, as it is sometimes wanted.
  • We would like to make SPF branch from the SL structure. That would allow the predicates to be written in an easier way (and maybe even read from the preconditions), and to support predicates where we do not know which branch to choose.
  • We would like to support non-unary predicates, like the segments. This goes with the support of branching in SPF, as most of the natural branching predicates are not unary.
  • We would like SPF to carry enough information on the concrete nodes to have more precise constraints. For now, the constraint looses a lot of precision each time it hash to handle a PUTFIELD, as we cannot determine what has been put in the field (and we can only create a fresh variable).