Skip to content

Latest commit

 

History

History
29 lines (19 loc) · 2.53 KB

2018.03.25-GFS.md

File metadata and controls

29 lines (19 loc) · 2.53 KB

The Google File System

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Summary

Corporation software often has different requirements so that co-designing the applications and the file system API with practical scenarios can benefit the overall system. GFS can be considered as an optimized file system especially for append operations and sequential read operations. Apart from this, GFS is mainly designed for large GB-level files. In this paper, the authors describe the design and implementation of GFS, which consists of a master server and multiple chunkservers. The master periodically communicates with the chunkservers through HeartBeat messages to give it instructions and collect its state. Files are stored as multiple fixed-size chunks, which are replicated on different chunkservers. The authors also relax GFS's consistency model. They introduce an atomic append operation so that multiple clients can append concurrently to a file without extra synchronization between them. At last, the authors evaluate GFS in both real-world environment and micro-benchmarks.

Contributions

  • The authors reexamine traditional choices of the distributed file system for large distributed data-intensive applications.
  • The authors design and implement the Google File System.
  • The authors also highlight that co-designing the applications and the file system API can benefit the overall system.

Strengths

  • GFS provides high availability by applying two effective strategies: fast recovery and replication. Both the master and the chunkserver are designed to restore their state and start in seconds no matter how they terminated.
  • Besides, GFS also supports chunk replication, and users can specify different replication levels for different parts of the file namespace. As well, GFS can also detect corrupted replicas through checksum verification. Therefore, GFS ensures high reliability.
  • GFS provides high throughput for large sequential reads and appends. Clients caches chunk metadata after initial requests to the master node.
  • GFS is built from many inexpensive commodity components, and it still delivers high aggregate performance to a large number of clients.

Weaknesses

  • GFS is incompatible with POSIX APIs.
  • Due to the practical scenario limitations, GFS does not optimize small writes at an arbitrary position in a file.
  • In GFS, the master node can be a possible bottleneck, in that GFS only uses a single master to simplify the design, which also helps the master make sophisticated chunk placement and replication decisions.