Skip to content

Latest commit

 

History

History
28 lines (18 loc) · 2.65 KB

2018.03.25-Bigtable.md

File metadata and controls

28 lines (18 loc) · 2.65 KB

Bigtable: A Distributed Storage System for Structured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber

Summary

In this paper, the authors present Bigtable, a distributed storage system for storing structured data at Google. They mainly focus on addressing scaling problem in the real-world environment. Bigtable is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Instead of providing a full relational data model, Bigtable uses a simple data model, which supports dynamic control over data layout and format. Besides, Bigtable allows clients to reason about the locality properties of the data represented in the underlying storage. A Bigtable can be considered as a sparse, distributed, persistent multi-dimensional sorted map. Each value in the map is an uninterpreted array of bytes and is indexed by a row key, column key, and a timestamp. At Google, Bigtable is built atop several Google's internal infrastructures, such as GFS for log and data storage, SSTable file format for Bigtable data, and Chubby. Apart from the design details, the authors also illustrate Bigtable's three major components: a client-side library, one master server, and many tablet servers. After that, the authors describe many refinements for the implementation to achieve the high performance, availability, and reliability. At last, they evaluate the performance of Bigtable and conclude useful experiences of developing Bigtable.

Contributions

  • The authors design and implement Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
  • As a corporation software, Bigtable has already been used in many Google's projects, such as Google Finance, Google Earth, which further demonstrate its effectiveness.
  • The authors discuss several optimizations and refinements of Bigtable in the production environment.

Strengths

  • Bigtable has high throughput, high scalability, and high availability.
  • Bigtable is more flexible than a traditional database, because it treats all data as uninterpreted strings but does not enforce any schemas.
  • Bigtable also allows clients dynamically control whether to serve data out of memory or from disk.

Weaknesses

  • Compactions of Bigtable can be expensive, and the split and migration of tablets may cause the imbalance on the load system.
  • The paper does not introduce any fault-tolerance techniques in Bigtable.
  • The scalability of the random read operation is quite worse, which can be a possible performance bottleneck.