This paper shows that a new division of responsibility between the network and the application can eliminate nearly all replication overhead. The authors rely on an insight that the communication layer should provide a new ordered unreliable multicast (OUM) primitive. OUM guarantees all receivers to process multicast messages in the same order, but messages may be lost. Based on OUM, they propose a new replication protocol, Network-Ordered Paxos (NOPaxos). In the normal case, NOPaxos can avoid coordination by relying on the ordering property provided by the network layer. Once packet losses occur, NOPaxos uses application level coordination to handle dropped packets. In this paper, the authors also introduce several different ways to build the OUM communication layer, all of which benefit the performance of NOPaxos. At last, the authors also prove the correctness of NOPaxos and evaluate its performance.
- The authors propose the ordered unreliable multicast model for data center networks.
- The authors design an algorithm, NOPaxos, which provides state machine replication on an ordered, unreliable network.
- Three implementations are presented in this paper: (1) an implementation in P4 for programmable switches, (2) a middlebox-style prototype using a Cavium Octeon network processor, and (3) a software-based implementation that requires no specialized hardware.
- OUM provides network ordering guarantee instead of no guarantee or best-effort ordering property.
- NOPaxos does not require any coordination on every incoming request. Instead, it uses application-level coordination only when request losses or certain failures of network components occur in the network.
- NOPaxos outperforms classic leader-based Paxos by 54% in latency and 4.7x in throughput.
- NOPaxos uses the SDN network switch to order concurrent multicasts, but it does not exploit RDMA to provide the support of complex structures with subgroups and shards, durable replicated storage for versioned data.
- The application scenario of NOPaxos is limited because it is based on the scenario of data centers, which means NOPaxos is not suitable for the large-scale distributed system over the Internet.