Adding checksum to each page #492

cenkalti · 2023-05-15T18:32:11Z

This may help detecting corruptions. I'm creating this issue to discuss:

feasibility
backwards compatibility
performance
implications.

cenkalti · 2023-05-19T02:23:50Z

Copying my comment from #505 (comment).

There are many sources of corruption that we cannot control:

Faulty hardware (disk, memory, CPU)
Unsafe filesystems, misconfiguration
Other faulty/malicious processes on the same host
Cosmic rays

Most databases put checksum to each page.

serathius · 2023-05-19T08:27:30Z

This goes into Merkele tree proposal.
For etcd project we would like to have a mechanism for fast calculation of checksum on any range of keys. So it's not only needs to be calculated per page but also for subtree https://en.wikipedia.org/wiki/Merkle_tree.

If someone is interested in this issue, please reachout to me, I'm happy to provide requirements and help with the design.

cenkalti · 2023-05-19T15:18:07Z

Merkle-tree makes sense where content is mostly read-only such as Bittorrent V2 or IPFS but I don't recommend using it for a database where write/read ratio can be high. With Merkle-tree, updating a leaf page would cause updating every page up to the root, causing very high write amplification which affects performance dramatically.

My proposal is about ensuring physical integrity of database pages at low level (protection against faulty hardware, bitflips, etc.).

etcd's problem looks related but not exactly overlaps with this bbolt's problem.

bbolt needs a mechanism for detecting corruption in any part of the database.
etcd needs a mechanism for having an identifier to represent the state of database in time.

Because etcd uses FreelistMapType (randomized iteration of maps in Go), replicas already have different B+ trees on disk which makes it impossible to use Merkle-tree hashes in page level. etcd's problem should be addressed in key space rather than page level.

serathius · 2023-05-19T16:17:58Z

With Merkle-tree, updating a leaf page would cause updating every page up to the root, causing very high write amplification which affects performance dramatically.

Correct me if I'm wrong, but as a MVCC bbolt is already updating every page up the root during a write?

My proposal is about ensuring physical integrity of database pages at low level (protection against faulty hardware, bitflips, etc.).

Database integrity not only depends on integrity of data, but also on integrity of executed instructions. As terrifying as it seems a bitflip can happen in your if branch logic and change execution path. :P

Because etcd uses FreelistMapType (randomized iteration of maps in Go), replicas already have different B+ trees on disk which makes it impossible to use Merkle-tree hashes in page level.

Looks like a blocker for implementing it on bbolt level. Still happy to collaborate and work on design to address all the action items in https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md#action-items

cenkalti · 2023-05-19T20:11:48Z

Correct me if I'm wrong, but as a MVCC bbolt is already updating every page up the root during a write?

You're right. I was wrong about that.

ahrtr · 2023-05-20T05:44:59Z

Adding checksum needs super careful consideration. The impact on performance is also a big concern to consider. It needs an overall design, recovery is the point, so it must be considered in the first place. Note that the existing Check can already detect most of the data corruption.

Again, I'd like us to spend more effort to analyze all the existing data corruption cases before we move on adding any new big features; otherwise, it will add more technical debts.

serathius · 2024-05-09T08:07:48Z

Action item from https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md#action-items

cenkalti added type/feature area/corruption labels May 15, 2023

cenkalti mentioned this issue May 19, 2023

Create more labels #505

Closed

cenkalti self-assigned this May 19, 2023

github-actions bot added the stale label Apr 17, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 9, 2024

serathius reopened this May 9, 2024

serathius removed the stale label May 9, 2024

github-actions bot added the stale label Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding checksum to each page #492

Adding checksum to each page #492

cenkalti commented May 15, 2023

cenkalti commented May 19, 2023

serathius commented May 19, 2023 •

edited

Loading

cenkalti commented May 19, 2023 •

edited

Loading

serathius commented May 19, 2023 •

edited

Loading

cenkalti commented May 19, 2023

ahrtr commented May 20, 2023

serathius commented May 9, 2024

Adding checksum to each page #492

Adding checksum to each page #492

Comments

cenkalti commented May 15, 2023

cenkalti commented May 19, 2023

serathius commented May 19, 2023 • edited Loading

cenkalti commented May 19, 2023 • edited Loading

serathius commented May 19, 2023 • edited Loading

cenkalti commented May 19, 2023

ahrtr commented May 20, 2023

serathius commented May 9, 2024

serathius commented May 19, 2023 •

edited

Loading

cenkalti commented May 19, 2023 •

edited

Loading

serathius commented May 19, 2023 •

edited

Loading