GSoC 2019 Project Ideas

Project Ideas

Please note that this list is not exclusive. If you have other ideas and topics related to JPF, please let us know on the JPF Google group. A possible proposal template can be found at the bottom of our GSoC page: JPF Google Summer of Code 2019.

Projects Descriptions

Support Java 11 for jpf-core

jpf-core is essentially a JVM that currently supports only Java 8. The goal of this project is to make it up-to-date with new features of Java 11. The JPF source itself has already been made compatible with Java 11. Now, JPF should support new features of Java 11 bytecode and archives. Among new features of Java 11 are multi-version archives (JAR files) and the ability to link JAR files before they are used by the JVM, and bootstrap methods that are generated at load time. This is a high-priority project, as support for Java 8 is limited to the near future.

Support Java 11 for jpf-symbc

jpf-symbc is essentially a (symbolic) JVM that currently supports only Java 8. The goal of this project is to make it up-to-date with new features of Java 11. This is a high-priority project, as support for Java 8 is limited to the near future.

Support for gradle for jpf-core and extensions

We have recently moved the build for jpf-core from ant to gradle. However, gradle support for Java 11 is broken, and we have not migrated extension to gradle yet (or even the extension template). The goal of this project is to (1) fix gradle support for Java 11 (branch "java-10-gradle"), (2) to update the extension template, including gradle support and updated documentation, and (3) update widely used extensions with gradle support.

Visualization of Execution Traces v2

JPF is able to find notorious concurrency bugs such as deadlocks. Although finding bugs is one of the major strengths of JPF, providing feedback to the programmer is one of its main weaknesses. For example, for a deadlock JPF provides the programmer at which line each thread is stuck. Although this is of some use, what is much more valuable is to report how each thread got to that point. This project is concerned with extending jpf-visual, a visual analytics tool from GSoC 2017. The web site has about a dozen possible enhancements listed as open issues Some of these may be feasible to implement as a GSoC project. Other ideas for enhancements are of course also welcome.

Model Checking Distributed Java Applications

jpf-nas is an extension of JPF that provides support for model checking distributed multithreaded Java applications. It relies on the multiprocess support included in the JPF core which provides basic functionality to verify the bytecode of distributed applications. jpf-nas supports interprocess communication via TCP sockets by modeling the Java networking package java.net. This tool can handle simple multi-client server applications. Some examples can be found in the jpf-nas distribution (at jpf-nas/src/examples/). The goal of this project is to extend the functionality of jpf-nas in various ways, such as extending the communication model supported by the tool towards an existing open source Java library/framework, called QuickServer, increasing the performance of the tool by improving the mechanism used to manage the state of communication objects, extending the tool with the cache-based approach used in net-iocache, etc.

Verification of Multi Agent Systems

The goal of this project is to develop techniques that analyze key properties in multi-agent systems. The jpf-mas extension will initially provide the ability to generate the reachable state space of Brahms models. The reachable state space can then be encoded into input for a variety of model checkers such as SPIN, NuSMV and PRISM, thereby enabling the verification of LTL, CTL and PCTL properties. The project will also need to investigate how to generate the set of reachable states for other kinds of models, such as Jason models, and how to compose reachable states of different modelling languages both at run-time and off-line.

Automatic program repair using annotations

Automated program repair (APR) techniques often generate overfitting patches due to the reliance on test cases for patch generation and validation. In this project, we propose to overcome the overffiting issue in APR by leveraging developer-provided partial annotations to aid semantic reasoning. Developer annotations can come in different forms, e.g., JPF annotation. The advantage of developer annotations is two-fold. First, in addition to test cases, it helps augment the specifications of the program under analysis and thus provides more complete specifications. These annotations, despite being simple, can help significantly in semantic reasoning, e.g., null pointer analysis. Second, these annotations are not required to be complex so that to reduce the burden of manual effort by developers. For example, to reason about null pointer exception errors, developers are only required to add a few Nullable or Non-Nullable annotations to class fields or method parameters, etc. We will use JPF and SPF for symbolically reasoning about the semantics of programs under analysis and generating repairs. We will also use JPF-Annotation as a way for developers to provide annotations.

Verification of Event-Driven Applications

The goal of this project is to evaluate and/or advance existing state-of-the-art tools for analysis of event-driven applications, for example jpf-awt, jpf-android. Some of the issues that need to be addressed include event sequence generation, search heuristics, and scalability. Other related problems and solutions are welcome.

Refactoring SPF constraint library

SPF constraints need to be refactored to allow different kinds of constraints to be combined during the construction of a path condition. An example of how it should be after the refactoring is the Abstract Syntax Tree constructed by GREEN.

Whitebox Fuzzer and Grammar Learner

Build a whitebox fuzzing tool on top of Symbolic PathFinder, that can learn the input grammar of a piece of code in an iterative fashion. The idea would be to first run the code on symbolic input of a fixed length and then learn a possible grammar for this length, at that point extend the length and generalise the grammar. The main research goal behind this project is to see if one can do whitebox fuzzing without a pre-determined seed file (which is the way most whitebox fuzzers work at the moment).

Fuzzing and Symbolic Execution

Develop a fuzzer for Java that can be integrated with SPF (or another Java based symbolic execution engine). The idea would be that when fuzzing gets stuck and makes no progress that the symbolic analysis can create a new seed file to allow analysis to progress.

Comparison between concolic and classical symbolic execution

Comparison between concolic execution, e.g. DEEPSEA and JDart, and classical symbolic execution, e.g. SPF.

Analysis of Android Applications

Various ideas are welcome here. Here are a couple of possible subprojects:

The goal of this project is collecting interesting Android applications, and evaluating and applying JPF-Android to analyze them. JPF-Android is an extension to JPF used to model check Android Applications. Android applications have many dependencies which make them hard to test and verify. They also require events to drive the execution of the applications. The goal of this project is to identify interesting Android applications to run on ]JPF-Android and then evaluate the efficiency and effectiveness of the tool on these apps. You will be using/improving existing approaches to generate stubs and models for applications and then compare the coverage and runtime on JPF-Android to other dynamic analysis tools.
This project includes using JPF-Android to generated test sequences for android applications, and implementing a tool to convert these sequences into tests that can be run on the emulator. JPF-Android verifies Android applications outside of the Android software stack on JPF using a model environment to improve coverage and efficiency. It generates event sequences to drive the execution of the application during exploration. Each sequence also includes the configuration of the environment (device) for which the sequence was executed. This project uses the AndroidViewClient API in Python to run the set of event sequences as detected by JPF-Android on an emulator to find the number of valid sequences and the code coverage they obtain compared to JPF-Android.
An extension of jpf-mobile-devices to generate the right initialization sequence for applications running inside JPF on Android. This project is different from the ones above in the sense that JPF is run as an Android application that can use the underlying Android environment, not as a normal application that models the Android environment. Also see the paper on jpf-mobile-devices.

Smart Contract Analysis

Develop a mechanism to allow the analysis of Ethereum Virtual Machine (EVM) bytecode by replacing the JVM bytecodes with EVM bytecodes within JPF. The second part of the project would be to extend the bytecodes further to allow symbolic execution as well.

JDart for Dynamic Taint Analysis

JDart is a dynamic symbolic execution engine Java based on Java PathFinder (JPF). The tool executes Java programs with concrete and symbolic values at the same time and records symbolic constraints describing all the decisions along a particular path of the execution. These path constraints are then used to find new paths in the program. Concrete data values for exercising these paths are generated using a constraint solver.

Currently JDart has two mechanisms for marking data values in a program for symbolic analysis. First, JDart analyzes methods during starting symbolic analysis. All parameters of an analyzed method are treated symbolically. Second, special annotations can be used to mark class members that should be analyzed symbolically. This mode of operation is geared towards generating test cases as well as the symbolic summaries of methods.

The goal of this project is to add a new mode of operation to JDart, in which symbolic variables can be introduced dynamically during execution – and also be made purely concrete again. This can be done by introducing symbolic variables for (or removing symbolic annotations from) the return values of specific methods (e.g., Random.nextInt()) – using listeners provided by JPF. This new mode of operation would have two benefits:

Existing programs can be analyzed without modifications. And analysis is not confined to individual methods. This will, e.g., enable JDart to analyze the Java programs in the SVCOMP benchmarks (Injection Flaws).
It is one step towards enabling using JDart for dynamic taint analysis, e.g., for analyzing potential for injection attacks (Minepump). Web-applications tend to contain code of form:

u = Request.getValueOf(“user”); //Taint u (symbolically) 
query = “SELECT user FROM table WHERE uid=”+u; //query becomes tainted
query = santitize(query); //Un-taint query (symbolically.) 
db.select(query);

If the value of u is not properly analyzed and sanitized, an attacker can exploit this code to gain access to arbitrary information in the applications database. Making the return value of Request.getValueOf(“user”) symbolic, is the first step towards tracing taint of this value.

Please note: For a full dynamic taint analysis, JDart would also have to be extended to analyze strings and arrays symbolically --- this is out of the scope of this project.

New Features for JDart

JDart is an open-source, dynamic symbolic analysis framework built on Java PathFinder. It has been applied to industrial scale software, including complex NASA systems.

We have many ideas for improving JDart and welcome additional ideas too:

JDart supports various (fixed-size) symbolic data structures, e.g., arrays and HashMap. It would be interesting to improve this, in particular with symbolic array indexes and support for unbounded data structures possibly with Lazy Initialization.
Add more exploration strategies to JDart. In particular, new exploration heuristics could be interesting, e.g., for targeted concolic execution. It could also combine other static analyses, such as, program slicing, where the computed slice can be used to constrain the exploration.
JConstraints is a solver abstraction layer used by JDart to interact transparently with SMT solvers. JConstraints has support for some solvers, e.g., Z3 and SMTinterpol. Often, however, one needs to select a solver that is best suited for the constraints generated (e.g., linear, non-linear). It could be very useful adding additional solvers to JDart based on JConstraints and evaluate them to understand better their applicability. This could also comprise adding a general interface to solvers that support the SMTLib format.

Concolic Execution for Android Apps

JDart is an open-source, dynamic symbolic analysis framework built on Java PathFinder. It has been applied to industrial scale software, including complex NASA systems. This project seeks to extend this capability to Android applications by supporting the Dalvik instruction set, e.g., by using jpf-pathdroid. This would enable analyses build on JDart to also work for Android, e.g., automated test case generation, finding bugs, and program understanding.

Support for parallel or distributed exploration in JDart and Regression tests for JDart

Here is an incomplete list of ideas and projects for extending and improving JDart. If you would like to work on any of the projects or have your own ideas (or just want to contribute to JDart), let us know and we can talk more.

Add regression test suite
Add visualization capabilities to JDart. This could be really useful -- especially for program understanding and debugging. It could be as simple as translating the constraints tree to DOT, but it would be a great feature to have more powerful, interactive visualizations, e.g., browser-based with d3, or using Gephi, yEd, jung, prefuse, or jgraph.
Finish (or re-implement) JConstraints SMTLib interface and experiment with other solvers. dReal is (partly) integrated using this, but it would be interesting to experiment more
Add more search heuristics and evaluate
Make support for PathDroid. This would allow JDart to analyze Dalvik bytecode programs
Work on constraints caching in a similar fashion as Green (or make support for Green)
Add support for parallel/distributed exploration
Add support for symbolic data structures. Maybe ala Lazy Initialization
Improve test suite generation from symbolic analysis. Currently, static target methods are fully supported while instance methods are only partly supported
General refactoring and code improvement

Environment and Test Case Generation for Specific Domains

When model checking applications belonging to specific domains (e.g., Swing, Android), JPF users have to provide application environment, consisting of test drivers and models for libraries that are too complex for JPF to run. The goal of this project is to evaluate the existing (or provide new) semi-automated support for generation of test drivers and library models/stubs based on the results of domain-specific static analysis, specifications, run-time information, or other suitable techniques. Once generated, such drivers and stubs can be used to verify applications belonging to specific domains using appropriate jpf extensions (e.g., jpf-awt, jpf-android). The project can be implemented on top of OCSEGen or another suitable tool.

Test Case Generation/Model-based Testing with Modbat for JPF

JPF requires test cases as a starting point to explore a system. It is therefore suitable to use test case generation to create test cases automatically. Modbat is an open-source tool for test case generation. For testing concurrent software, an obvious choice would be to combine Modbat (to generate tests) with JPF (to execute tests and find concurrency problems). This has been done once as a proof of concept but is not supported in the current version of Modbat. The main reason for this is that Modbat's reporting has to read and parse bytecode, which requires access to some native code that JPF does not support. The goal is to find all problems where Modbat requires native access, and to use jpf-nhandler to resolve as many of these cases as possible. Remaining cases can be handled with custom model/peer classes.

Symbolic data-race detection for Habanera Java

Habanero Java is a Java implementation of the Habanero Extreme-scale programming model for multithreaded applications. The model is based on X10 and supports fork/join semantics as well as futures, isolation, and phasers. The advantage of structured parallelism such as Habanero is that the language itself provides concurrency guarantees such as deadlock freedom and determinacy if and only the program is free of data-race. A data-race occurs when two or more threads of execution access the same memory location and at least one of those accesses is a write. An additional advantage of structured parallelism is that run-times and analysis can be optimized based on the language structure itself. Recent work adds to JPF the ability to model check Habanero Java programs using a verification specific runtime and an algorithm that constructs and analyzes a computation graph representing the happens-before relation of the program execution (1, 2. The analysis is predictive because it infers from the single observed schedule the presence or absence of data-race in other non-observed schedules and only needs to enumerate schedules around isolation. Enumeration schedules around isolation though is still expensive and leads to state explosion in JPF. The work in this project is to mitigate this state explosion in enumerating schedules around isolation by building a symbolic computation graph from the program execution that adds constraints on the graph edges indicating under what condition the edge is active, and then using an SMT solver to find a set of edges on which a data-race exists. A first step in the project is to add a dynamic partial order reduction to JPF that is able to inform the symbolic computation graph about dependencies.