[DB] Data error checks and frequency changes #872

CaveSpectre11 · 2020-11-22T21:20:49Z

Problem

PindexWalk->pprev assert failures occur randomly on startup

Root Cause

Note that root cause is not yet completely understood. This is not a fix, but rather a mitigation that alleviates the risk of the problem occurring.

A corruption occurs occasionally in the writing of the block database, where an index no longer points to the correct block, but rather reads a different block off disk. Usually the block read happens to be an orphan block. It definitely is connected to writing of side chains and orphans, but the root cause is yet unknown. This problem results in a valid block not being read from disk, so when it begins to connect the blocks together [for the staking algorithmic calculations], there is a block that doesn't have a valid previous [because it basically walks off the chain because there's a block missing.

Mitigation

Note that this PR will not correct an already corrupt database; it will only minimize the occurrences. Anyone currently with a corrupted database will have to recover their chain prior to being able to run this PR.

Several changes were made. Information is added to the debug log file to give an indication as to what the problem block was, when a problem block is detected on startup. When the header information is disconnected from the block information, the block is no longer written to disk (This generally occurs with orphan blocks). If a block is read when this occurs, it also is reported to the log file.

The frequency that blocks are written to disk has been changed to write a similar number of blocks at a time as Bitcoin. Since veil's block creation is 10 times faster, the write timers have been changed to be 10% of what they were in Bitcoin. This showed significant improvement to the occurrences of the corruption. It's important to remind again that this is not a fix; there is still a lingering issue. However over heavy test with two nodes, one of them running with the new write frequency and one without the write frequency; the one without the change found 63 instances of corrupted blocks in 1923 blocks. The node with the change saw zero corruptions.

For that reason, this PR is being pushed to greatly reduce the occurrences that have become prevalent again as more people are in wallet mining and staking at the same time. Issue #692 will remain open while this is continued to be worked over time, as attempts to correct the corruption are investigated, or root cause is continued to be found. This PR however does reduce the urgency of the research.

codeofalltrades

ACK 76abd68
I never received the error. I ran the wallet on intel amd and in a VM.

[DB] Data error checks and frequency changes

76abd68

CaveSpectre11 requested a review from codeofalltrades November 22, 2020 21:20

CaveSpectre11 self-assigned this Nov 22, 2020

codeofalltrades approved these changes Nov 25, 2020

View reviewed changes

codeofalltrades merged commit ea827dc into Veil-Project:master Nov 25, 2020

CaveSpectre11 added Dev Status: Merged Issue is completely finished. and removed Tag: Waiting For Code Review Waiting for code review from a core developer labels Dec 6, 2020

CaveSpectre11 deleted the dbCorrupt1 branch December 11, 2020 14:21

CaveSpectre11 added this to the v1.1.1 milestone Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DB] Data error checks and frequency changes #872

[DB] Data error checks and frequency changes #872

CaveSpectre11 commented Nov 22, 2020 •

edited

Loading

codeofalltrades left a comment

[DB] Data error checks and frequency changes #872

[DB] Data error checks and frequency changes #872

Conversation

CaveSpectre11 commented Nov 22, 2020 • edited Loading

Problem

Root Cause

Mitigation

codeofalltrades left a comment

Choose a reason for hiding this comment

CaveSpectre11 commented Nov 22, 2020 •

edited

Loading