Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels
Amazon runs a worldwide e-commerce platform using tens of thousands of servers located in many data centers around the world. From Amazon's point of view, reliability is one of the most important requirements. Amazon cannot tolerate the slightest outage, which has significant financial consequences and impacts customer trust. Besides, the platform has to be highly scalable in order to support continuous growth.
This paper describes the design and implementation of Dynamo, a highly available and scalable distributed data store, which is used to manage the state of services that have very high-reliability requirements and need tight control over the tradeoffs between availability, consistency, cost-effectiveness, and performance. Dynamo even allows read and write operations to continue during network partitions and resolves update conflicts using different conflict resolution mechanisms. When compared to traditional database systems, Dynamo sacrifice some features such as richness of schema representation and strong consistency in favor of higher performance and availability at massive scales.
- The authors combine different techniques to provide a single highly-available system.
- Dynamo demonstrates that an eventually-consistent storage system can be used in production with demanding applications.
- This paper provides insight into the tuning of different techniques to meet the requirements of production systems with very strict performance demands.
- Dynamo employs a mechanism to dynamically partition the data over the set of nodes in the system, which makes incrementally scale possible.
- To further achieve high availability and durability, Dynamo replicates the data on multiple hosts.
- Dynamo exposes data consistency and reconciliation logic issues to the developers. For any new applications that want to use Dynamo, some analysis is required during the initial stages of the development to pick the right conflict resolution mechanisms that meet the business case appropriately, which requires extra efforts.
- A write operation in Dynamo also requires a read to be performed for managing the vector timestamps. This can be very limiting in environments where systems need to handle a very high write throughput.