Skip to content

Verification

Lennart Oldenburg edited this page Jul 4, 2017 · 11 revisions

Disclamer: This page contains our thoughts about the verification of the pluto-concept. We are excited to discuss promising approaches. Feel free to leave a comment or an issue to get in touch. Keep in mind that the following text is not intended to be full of details.

Challenge

The main challenge of the verification is to show that the application state converges. By converge we mean, that whenever pluto workers are modifying their local state, the state of the workers will be identical after all messages are exchanged.

We make several assumptions to the system model and the underlying network layer, which we discuss on this page.

System Model

Modeling a system is tough. In general, a model provides a certain abstraction from real-world details and implementations. In the same time, we do not want to loose the connection to the working system by over-abstracting all interesting features. On the other hand, we need to be able to express our implementation and the environment precisely, but without rewriting the source code. The major challenge is to find the right level of abstraction for this purpose.

For pluto, we identified the implemented CRDT layer as most interesting to reason about. A suitable system model therefore should cover how the IMAP state is stored, and how all IMAP commands are processed. Because pluto promises convergence of diverged replica, the surrounding components and the network architecture must be modeled as well. This leads to a classical model of a distributed system. Luckily, we are able to build on top of decades of distributed systems research.

For now, we build our model with the essentials provided by Nancy Lynch. Hint: her book Distributed Algorithms is one of the most fundamental books for distributed systems. We begin by defining plutos environment before we discuss the implementation of the IMAP state.

Our system consists of:

  • A set of processes P, which can be seen as multiple instances of pluto.
  • A network of communication channels between all processes.
  • One process can send and receive messages, we distinguish between the following types of messages:
    • IMAP messages, such as CREATE, DELETE, APPEND, STORE, etc. with their parameters.
    • Internal messages, such as downstream operations from the CRDT layer.
  • We have an asynchronous system, i.e. there is no relative speed between the processes.
  • For now, we assume causal delivery of the messages.
  • Processes can crash, links can break. However, we assume that the state of a crashed process can be recovered and that unsuccessful transmissions are noticed.

Having this relatively traditional model of a distributed system, we can start to implement the processes. For now, we do not want to go into details. The basic intuition of the implementations is, that a process stores the application state in a CRDT. All incoming IMAP messages are translated to an operation on the CRDT. According to the specification of the CRDT, downstream operations are sent to all replica in order to achieve convergence.

Convergence

In order to actually verify something, we need to express which properties are desirable for our system. In our case, the leading property is convergence. Now, what do we mean by that?

Eventual consistency essentially contains the notion of convergence. In the simplest definition, a system provides eventual consistency when an identical state of all replica is reached when all messages are exchanged. We note that there are several definitions and multiple levels of granularity.

In our system, we need to express this property and prove that this property holds as long as the assumptions are fulfilled. A description of this property could look like this:

Convergence Theorem: in every finite and complete trace of our system, we see that the application state of all processes is identical in the first and in the last configuration.

Now, this definition is relatively in concrete for the moment. We note that several things are not well defined (complete trace, configuration, etc.). However, we plan to make things more concrete over the following weeks. The current state can be followed in the verification repository.

How to prove?

Well, there are many options to verify that the convergence property holds in our system. Again, we can be lucky that we can build on top of existing definitions and do not need to start from scratch. Especially reasoning about eventual consistency has been a thing in the past years (e.g. Buajjani et al). Moreover we have seen work from Peter Zeller on verifying CRDTs. Thankfully Peter has provided us with some interesting starting points to explore the world of (semi) automatic proofs:

We are happy to announce that we already achieved the first milestone by verifying the commutativity of pluto's underlying IMAP CmRDT. The proof is fully implemented in Isabelle/HOL and the code can be found in our verification repository. Moreover, we base the IMAP CmRDT on the classical op-based OR-Set, which we also implemented in Isabelle proved to be commutative.

Clone this wiki locally