Skip to content

Our Refinement Based Formal Verification Approach

Daejun Park edited this page Apr 13, 2020 · 6 revisions

We illustrate our refinement-based formal verification approach.

Illustrating Example

Let us consider a sum program that computes the summation from 1 to n, as follows:

int sum(int n) {
    int s = 0;
    int i = 1;
    while(i <= n) {
        s = s + i;
        i = i + 1;
    }
    return s;
}
Abstract Model

Given this program, we first manually write an abstract model of the program in the K framework. Such a K model is essentially a state transition system of the program, and can be written as follows:

rule: sum(N) => loop(s: 0, i: 1, n: N)
rule: loop(s: S, i: I, n: N) => loop(s: S + I, i: I + 1, n: N) when I <= N
rule: loop(s: S, i: I, n: N) => return(S) when I > N

The first transition rule corresponds to the initialization (i.e., int s = 0; int i = 1;); the second one corresponds to the loop; and the third one corresponds to the return statement. (Here, the indexed tuple (s: S, i: I, n: N) represents the state of the program variables s, i, and n.)

Note that this abstract model can be also automatically derived by instantiating the language semantics with the particular program, if a formal semantics of the language is available.

Functional Correctness Proof

Then, given the abstract model, we specify the functional correctness property in the K reachability logic, as follows:

claim: sum(N) => return(N * (N + 1) / 2) when N > 0

This reachability claim says that sum(N) will eventually return N * (N + 1) / 2 in all possible execution paths, if N is positive. We verify this specification using the K reachability logic theorem prover, which requires us to provide the following loop invariant:

invariant: loop(s: I * (I - 1) / 2, i: I, n: N) => return(N * (N + 1) / 2) when I <= N + 1
Refinement

Once we prove the desired property of the abstract model, we manually refine the model to a bytecode specification, by translating each rule of the abstract model into a reachability claim at the bytecode level, as follows:

claim: evm(pc: BEGIN, calldata: #bytes(32, N), stack: [], ...) => evm(pc: LOOPHEAD, stack: [0, 1, N], ...)
claim: evm(pc: LOOPHEAD, stack: [S, I, N], ...) => evm(pc: LOOPHEAD, stack: [S + I, I + 1, N], ...) when I <= N
claim: evm(pc: LOOPHEAD, stack: [S, I, N], ...) => evm(pc: END, stack: [], output: #bytes(32, S), ...) when I > N

Here, the indexed tuple evm(pc: Int, calldata: Bytes, stack: IntList, output: Bytes) represents (part of) the EVM state, and #bytes(SIZE, DATA) denotes a sequence of SIZE bytes of the two's complement representation of the DATA value.

We verify this bytecode specification against the compiled bytecode using the K reachability theorem prover. Note that no loop invariant needs to be provided in this bytecode verification, since each reachability claim involves only a bounded number of execution steps (that is, especially, the second claim involves only a single iteration of the loop).

Refinement Proof

Then, we manually prove the soundness of the refinement, which can be stated as follows:

For any EVM states s1 and s2, if s1 => s2, then \alpha(s1) => \alpha(s2)

where the abstraction function \alpha is defined as follows: (here, _ means an arbitrary value.)

\alpha(evm(pc: BEGIN, calldata: #bytes(32, N), stack: [], output: _)) = sum(N)
\alpha(evm(pc: LOOPHEAD, calldata: _, stack: [S, I, N], output: _)) = loop(s: S, i: I, n: N)
\alpha(evm(pc: END, calldata: _, stack: [], output: #bytes(32, S))) = return(S)

Putting all the results together, we finally conclude that the compiled bytecode will return #bytes(32, N * (N + 1) / 2).

Trust Base

Note that the abstract model and the compiler are not in the trust base, thanks to the refinement, while the K reachability logic theorem prover and the EVM semantics are.


Comparison: Direct Approach without Refinement

For reference, here we provide a direct way to verify the functional correctness of the bytecode without using the refinement.

The functional correctness of the bytecode can be directly specified in the K reachability logic as follows:

claim: evm(pc: BEGIN, calldata: #bytes(32, N), stack: [], output: _)
    => evm(pc: END,   calldata: #bytes(32, N), stack: [], output: #bytes(32, N * (N + 1) / 2))

which can be automatically verified by the K reachability logic theorem prover provided the following loop invariant:

invariant: evm(pc: LOOPHEAD, calldata: #bytes(32, N), stack: [I * (I - 1) / 2, I, N], output: _)
        => evm(pc: END,      calldata: #bytes(32, N), stack: [], output: #bytes(32, N * (N + 1) / 2))
           when I <= N + 1

Note that the abstract model, the functional correctness proof of the abstract model, the refinement, and the refinement proof are not needed at all in this direct approach. Indeed, for the sum program, the direct approach is much easier, but for more complicated programs and/or properties to verify, the refinement-based approach can be more efficient.


Remark: The loop invariants of the K reachability logic mentioned here look different from that of the Hoare logic. See the comparison between two logic proof systems in Section 4 (esp. Figure 6) of the paper "Semantics-Based Program Verifiers for All Languages". These loop invariants can be seen as transition invariants.