From 34d97c9569987dd2c4cec46301e23b6beb5e62ba Mon Sep 17 00:00:00 2001 From: Anatoly Yakovenko Date: Mon, 14 Oct 2019 23:34:08 -0700 Subject: [PATCH 01/11] leader slashing --- book/src/SUMMARY.md | 1 + .../leader-duplicate-block-slashing.md | 42 +++++++++++++++++++ 2 files changed, 43 insertions(+) create mode 100644 book/src/proposals/leader-duplicate-block-slashing.md diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index ff0a57f591e9d5..09b1da8ab901b9 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -61,6 +61,7 @@ * [Tick Verification](proposals/tick-verification.md) * [Block Confirmation](proposals/block-confirmation.md) * [ABI Management](proposals/abi-management.md) + * [Handling Duplicate Leader Blocks](proposals/leader-duplicate-block-slashing.md) * [Implemented Design Proposals](implemented-proposals/README.md) * [Blockstore](implemented-proposals/blockstore.md) * [Cluster Software Installation and Updates](implemented-proposals/installer.md) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md new file mode 100644 index 00000000000000..e7d3b28e3f27fd --- /dev/null +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -0,0 +1,42 @@ +# Leader Duplicate Block Slashing + +This design describes how slashing leaders that produce duplicate +blocks is implemented in the cluster. + +## Shred Format + +Shreds are produced by leaders during their scheduled slot. Each +shred is signed by the leader and is transmitted to the cluster via +Turbine. A shred contains the following + +* Signature +* header: slot, shred index +* msg + +The signature is of the merkle tree of the shred data. + +``` + merke root + / \ +(slot index, shred index) hash(msg) +``` + +## Proof of a Duplicate Shred + +Any two different signatures that show a merkle path to the same +`(slot index, shred index)` for the same leader are proof that the +leader produced two conflicting blocks. + +## Avoiding Accidental Slashing + +Leaders could be killed and restarted in the middle of their block, +and loose any information that they had started producing the block. +Leaders should store the last block that the leader had started +producing in persistent storage before sending the first shred. + +If the persistent storage is corrupted, leaders need to wait long +enough to observe a round of consensus in which the leader has +participated before starting producing blocks. Waiting for +confirmation that their own signatures are included in the next +block ensures that the blocks the leader is observing are new and +not replayed. From 6bab8b3d395f4e5974eec44882a58f21ba51bd61 Mon Sep 17 00:00:00 2001 From: Anatoly Yakovenko Date: Mon, 14 Oct 2019 23:38:32 -0700 Subject: [PATCH 02/11] nits --- book/src/proposals/leader-duplicate-block-slashing.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index e7d3b28e3f27fd..ce8704e196d00b 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -1,7 +1,10 @@ # Leader Duplicate Block Slashing -This design describes how slashing leaders that produce duplicate -blocks is implemented in the cluster. +This design describes how the cluster slashes leaders that produce +duplicate blocks. + +Leaders that produce multiple blocks for the same slot increase the +number of potential forks that the cluster has to resolve. ## Shred Format From 7a5869e8cb1e3328bce5a075479ac9aba7b955a7 Mon Sep 17 00:00:00 2001 From: Anatoly Yakovenko Date: Tue, 15 Oct 2019 00:07:57 -0700 Subject: [PATCH 03/11] tag --- book/src/proposals/leader-duplicate-block-slashing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index ce8704e196d00b..a7fc5fe0bc20b1 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -18,7 +18,7 @@ Turbine. A shred contains the following The signature is of the merkle tree of the shred data. -``` +```text merke root / \ (slot index, shred index) hash(msg) From 1dbb545b8280b1c92df2ea869429ae4f3046d3c0 Mon Sep 17 00:00:00 2001 From: Anatoly Yakovenko Date: Tue, 15 Oct 2019 10:41:08 -0700 Subject: [PATCH 04/11] nits --- book/src/proposals/leader-duplicate-block-slashing.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index a7fc5fe0bc20b1..3f4d08114623eb 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -12,9 +12,9 @@ Shreds are produced by leaders during their scheduled slot. Each shred is signed by the leader and is transmitted to the cluster via Turbine. A shred contains the following -* Signature -* header: slot, shred index -* msg +* signature +* header: slot index, shred index +* message data The signature is of the merkle tree of the shred data. @@ -33,7 +33,7 @@ leader produced two conflicting blocks. ## Avoiding Accidental Slashing Leaders could be killed and restarted in the middle of their block, -and loose any information that they had started producing the block. +and lose any information that they had started producing the block. Leaders should store the last block that the leader had started producing in persistent storage before sending the first shred. From 709d1d4f3ee98dec75cdf6276bc0b87a1a455bc2 Mon Sep 17 00:00:00 2001 From: Carl Date: Mon, 23 Dec 2019 18:13:32 -0800 Subject: [PATCH 05/11] Add designs for multiple versions of a slot --- .../leader-duplicate-block-slashing.md | 161 ++++++++++++++++-- 1 file changed, 148 insertions(+), 13 deletions(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index 3f4d08114623eb..860b116a6cb779 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -30,16 +30,151 @@ Any two different signatures that show a merkle path to the same `(slot index, shred index)` for the same leader are proof that the leader produced two conflicting blocks. -## Avoiding Accidental Slashing - -Leaders could be killed and restarted in the middle of their block, -and lose any information that they had started producing the block. -Leaders should store the last block that the leader had started -producing in persistent storage before sending the first shred. - -If the persistent storage is corrupted, leaders need to wait long -enough to observe a round of consensus in which the leader has -participated before starting producing blocks. Waiting for -confirmation that their own signatures are included in the next -block ensures that the blocks the leader is observing are new and -not replayed. +## Storing Multiple Versions of the Same Slot +When there is a possibility of duplicate blocks, a validator need to track and +replay multiple versions of a slot in case any one of these versions is +finalized by the cluster. + +### Generating a Unique Identifier for Each Version of a Slot +We distinguish different versions of a slot by the ending blockhash. +This blockhash thus needs to be "functionally unique" (meaning a malicious +leader cannot grind out a duplicate in the amount of time they have to +generate a block) across all possible shred outputs generated by a leader. + +To this end it's important to note that: +1) Two different versions of a slot must generate a different set of shreds. +Two different versions of a slot must either: + a) Chain from different parents, in which case the PoH of the entries will + different + b) Contain different set of transactions, in which case the PoH of the + entries will be different + +2) The resulting merkle trees of two different sets of shreds have +"functionally unique" merkle roots. + +Then if we were to generate a merkle tree out of the shred batches for a slot, +and define the blockhash of a slot to be: + + `hash(last tick, shred merkle root)` + +This blockhash should be "functionally unique". + +### Generating the Merkle Tree for the Shreds +1) Leader blocks on Poh before last tick can be generated +2) Leader waits for BroadcastStage to finish generating all the shreds +3) Computes the Merkle tree of all the shreds +4) Inserts an entry into the Poh Stream with the Merkle root +5) Restart PoH + +### Indexing the Column Families by Blockhash +We augment all slot-related column families in blocktree to include a blockhash +to support tracking multiple versions of the same slot. We outline the notable +ones below: + +1) `Data` and `Erasure` column families: `(slot, blockhash, shred_index)` +Two cases here: + a) When leaders generate their own slot, the `blockhash` portion of the key + is set to `Hash::default()` until the blockhash is computed at the end of + the slot. Then the leader will have to store a mapping from + `Hash::default()` to the actual blockhash in a separate area of storage + (another column family?). This is important in order to respond to repairs + which will now specify a `blockhash` in addtion to a `slot` and `index`. + Thus if the node crashes before this mapping can be stored, then on restart + the validator needs to recompute the blockhash in `blocktree_processor`. + + b) When validators receive slots from validators, the `blockhash` portion + of the key is set to `Hash::default()`, until they receive the entire + block, at whch point they follow the same procedure as case a) above to + map `Hash::default()`to the actual blockhash. + + If validators see a different version of the same shred for the same slot + and index, then that means there has been a duplicate transmission. If such + a conflicting version is detected before the block is completed, then + validators drop the block. If the block is already completed, the validators + simply drop the duplicate and rely on the procedure outlined in the section + `Repairing Multiple versions of the Same Slot` below. + +2) `SlotMeta` column family: `(slot, blockhash)` +Each version of a slot will have different `consumed`, `received` and +potentially different children as well. Thus you need a separate SlotMeta +for each version. + +3) `ErasureMeta` and `IndexMeta` column families: `(slot, blockhash)` +Each version of a slot can potentially have different erasure configurations, +FEC blocks etc. Each of these needs to be tracked separately + +4) `Roots` column family: `(slot, blockhash)` +Roots will have to know which version was rooted so we can serve the correct +version through Repairman. + +5) `Dead Slots` column family: `(slot, blockhash)` +Validators need to specify which version of a slot is dead. For instance if one +version of a slot doesn't pass entry verification, but another version does, +then the two versions must be distinguishable. + +An issue to note here is that if some version of a slot violates correctness +before the slot is finished, then the validator does not know what the ending +blockhash is and thus cannot store the slot as dead. In these cases we can drop +the entire slot and wait for repair. Because repair will include a merkle proof +of each shred for each repair (which includes the final blockhash `B`), +then the advantage there is if a repaired shred fails to play, we know the +entire version of that slot with blockhash `B` is no good, and we can store +`(slot, B)` in the Dead Slots column family and ignore all forks that build on + top of `(slot, B)`. + +### Repairing Multiple versions of the Same Slot +Repair is augmented with a blockhash. The various types of repairs: + +`Shred(slot, blockhash, index)` - Ask for a specific shred in a slot +with version `blockhash`. + +Notes: + +1) Setting `blockhash` to `Hash::default()` means this validator assumes there +is only one version of this slot in the cluster and will acccept any shred +for `slot` and `index`. However, if a validator gets a mixed bag of such +shreds that do not chain, they will then drop the entire `slot`, and only +repair `slot` if some later child slot chains back to `slot` (this will be +initiated by `Orphan` repairs below). + +2) Repair responses need to be tied to a particular blockhash. However, +because repair responses are limited to the size of a shred, we cannot +include the blockhash in the repair response. This means we need to include +some repair "cookie" in the request + response that maps some number of bits +to a particular `blockhash`. This breaks down if there are more versions of +this slot than can be tracked by the number of bits allocated to the cookie. + + +`Orphan(slot, first_tick_hash, num_hashes)` - Ask for the last shred +of the parent of the child slot, where the child has slot equal to `slot`. +This last shred must contain the last tick `T_p` in that parent slot such that +hashing `T_p` `num_hashes` times is equal to `first_tick_hash`. + +Notes: + +1) The response will include the last shred of the parent slot. From this +last shred the validator should be able to get the last tick and the +merkle root, which can be used to compute the `blockhash`. This `blockhash` +is then used to make requests of the form `Shred(slot, blockhash, index)` +to repair the rest of this slot. + +2) Also requires a cookie similar to the `Shred(slot, blockhash, index)` +requests above. + +### Chaining Multiple Versions of the Same Slot +When there are multiple versions for a slot `A`, a natural question that arises +is how validators know which version of slot `A` some descendant slot `B` +chains to. Shreds currently specify only a `parent` slot, but not which version +of that parent. The approach to figure out chaining behavior is then: + +1) Wait for all the shreds for the first entry `E_B` of slot `B` to arrive +(Implementation can make sure first shred `S_B` always contains only a tick +to avoid waiting for multiple shreds). +2) These shreds for slot `B` are stored under the version `Hash::default()` +(optimistically assume this child is the only version). If another conflicting +version of `B` is detected before this version is completed, we drop all the +shreds for slot `B`. +3) For all possible versions of slot `A` see which version chains to `E_B` +4) If no version of slot `A` chains, then deserializie `S_B` to find the first +tick `T_B`, then make a `Orphan(slot, T_B.hash, T_B.num_hashes)` request +to get the last shred in the version of slot `A` that chains to slot `B`. From bfce68164578f04192b94796b9481dcccfccace6 Mon Sep 17 00:00:00 2001 From: Carl Date: Mon, 23 Dec 2019 18:32:45 -0800 Subject: [PATCH 06/11] Add merkle proof fetching --- .../leader-duplicate-block-slashing.md | 53 ++++++++++++++++--- 1 file changed, 46 insertions(+), 7 deletions(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index 860b116a6cb779..999caeaa4cd671 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -115,12 +115,8 @@ then the two versions must be distinguishable. An issue to note here is that if some version of a slot violates correctness before the slot is finished, then the validator does not know what the ending blockhash is and thus cannot store the slot as dead. In these cases we can drop -the entire slot and wait for repair. Because repair will include a merkle proof -of each shred for each repair (which includes the final blockhash `B`), -then the advantage there is if a repaired shred fails to play, we know the -entire version of that slot with blockhash `B` is no good, and we can store -`(slot, B)` in the Dead Slots column family and ignore all forks that build on - top of `(slot, B)`. +the entire slot and wait for repair. More details in the `Replay Failures` +section below. ### Repairing Multiple versions of the Same Slot Repair is augmented with a blockhash. The various types of repairs: @@ -175,6 +171,49 @@ to avoid waiting for multiple shreds). version of `B` is detected before this version is completed, we drop all the shreds for slot `B`. 3) For all possible versions of slot `A` see which version chains to `E_B` -4) If no version of slot `A` chains, then deserializie `S_B` to find the first +4) If no version of slot `A` chains, then deserialize `S_B` to find the first tick `T_B`, then make a `Orphan(slot, T_B.hash, T_B.num_hashes)` request to get the last shred in the version of slot `A` that chains to slot `B`. + +### Replay Failures +As summarized under the `Dead Slots` column family in the +`Indexing the Column Families by Blockhash` section above, validators must now +account for the possibility that some versions of a slot have correctness +issues while other versions don't. + +Let `V_A` be a version of slot `A` with blockhash `B_A`. + +Assume that on replay of `V_A` the validator runs into some correctness issue +(entry verification failure, bad tick count, etc.) while replaying the entries. + +Define `S` to be the set of shreds as follows: + +1) On entry verification failures of entries`E1` and `E2`: + +Let `S` be the set of all shreds that contain any part of `E1` and `E2`. + +2) On TransactionError in some entry `E`: + +Let `S` be the set of all shreds that contain any part of `E`. + +3) On Blocktree inability to deserialize an entry from a set of shreds: + +Let `S` be the FEC set that failed to deserialize + +4) On BlockErrors (InvalidTickCount, InvalidHashCount, TrailingEntry, etc.) +on some entry `E` + +Let `S` be the set of all shreds that contain any part of `E`. + + +Protocol: + +1) The validator queries for a merkle proof of all shreds in `S` to prove that +all the offending shreds were indeed part of the version `A` with blockhash `B_A`. + +2) If the merkle proof checks out, we add `(A, B_A)` to the `Dead Slots` column +family. No further forks chaining to this slot will be played. + +3) If the merkle proof instead shows that there is a different version of some +shred in `S`, that means we got maliciously sent the wrong shred for version +`B_A`. We must then drop those wrong shreds and repair them again. \ No newline at end of file From 57ff17fc4b320a5a0b1ee5c2790d56ad150baad6 Mon Sep 17 00:00:00 2001 From: Carl Date: Thu, 26 Dec 2019 09:01:04 -0500 Subject: [PATCH 07/11] Updates to formatting --- .../leader-duplicate-block-slashing.md | 57 ++++++++++--------- 1 file changed, 31 insertions(+), 26 deletions(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index 999caeaa4cd671..104da64a8cea92 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -31,7 +31,7 @@ Any two different signatures that show a merkle path to the same leader produced two conflicting blocks. ## Storing Multiple Versions of the Same Slot -When there is a possibility of duplicate blocks, a validator need to track and +When there is a possibility of duplicate blocks, a validator needs to track and replay multiple versions of a slot in case any one of these versions is finalized by the cluster. @@ -42,12 +42,15 @@ leader cannot grind out a duplicate in the amount of time they have to generate a block) across all possible shred outputs generated by a leader. To this end it's important to note that: + 1) Two different versions of a slot must generate a different set of shreds. Two different versions of a slot must either: - a) Chain from different parents, in which case the PoH of the entries will - different - b) Contain different set of transactions, in which case the PoH of the - entries will be different + +* Chain from different parents, in which case the PoH of the entries will + differ + +* Contain different set of transactions, in which case the PoH of the + entries will be different 2) The resulting merkle trees of two different sets of shreds have "functionally unique" merkle roots. @@ -61,31 +64,33 @@ This blockhash should be "functionally unique". ### Generating the Merkle Tree for the Shreds 1) Leader blocks on Poh before last tick can be generated -2) Leader waits for BroadcastStage to finish generating all the shreds +2) Leader waits all shreds to be generated 3) Computes the Merkle tree of all the shreds 4) Inserts an entry into the Poh Stream with the Merkle root 5) Restart PoH -### Indexing the Column Families by Blockhash -We augment all slot-related column families in blocktree to include a blockhash +### Indexing Keyspaces by Blockhash +We augment all slot-related keyspaces in blocktree to include a blockhash to support tracking multiple versions of the same slot. We outline the notable ones below: -1) `Data` and `Erasure` column families: `(slot, blockhash, shred_index)` +1) `Data` and `Erasure` keyspaces: `(slot, blockhash, shred_index)` + Two cases here: - a) When leaders generate their own slot, the `blockhash` portion of the key - is set to `Hash::default()` until the blockhash is computed at the end of - the slot. Then the leader will have to store a mapping from - `Hash::default()` to the actual blockhash in a separate area of storage - (another column family?). This is important in order to respond to repairs - which will now specify a `blockhash` in addtion to a `slot` and `index`. - Thus if the node crashes before this mapping can be stored, then on restart - the validator needs to recompute the blockhash in `blocktree_processor`. - - b) When validators receive slots from validators, the `blockhash` portion + +* When leaders generate their own slot, the `blockhash` portion of the key + is set to `Hash::default()` until the blockhash is computed at the end of + the slot. Then the leader will have to store a mapping from + `Hash::default()` to the actual blockhash in a separate area of storage + (another column family?). This is important in order to respond to repairs + which will now specify a `blockhash` in addtion to a `slot` and `index`. + Thus if the node crashes before this mapping can be stored, then on restart + the validator needs to recompute the blockhash in `blocktree_processor`. + +* When validators receive slots from validators, the `blockhash` portion of the key is set to `Hash::default()`, until they receive the entire - block, at whch point they follow the same procedure as case a) above to - map `Hash::default()`to the actual blockhash. + block, at whch point they follow the same procedure as the first case above + to map `Hash::default()`to the actual blockhash. If validators see a different version of the same shred for the same slot and index, then that means there has been a duplicate transmission. If such @@ -94,20 +99,20 @@ Two cases here: simply drop the duplicate and rely on the procedure outlined in the section `Repairing Multiple versions of the Same Slot` below. -2) `SlotMeta` column family: `(slot, blockhash)` +2) `SlotMeta` keyspace: `(slot, blockhash)` Each version of a slot will have different `consumed`, `received` and potentially different children as well. Thus you need a separate SlotMeta for each version. -3) `ErasureMeta` and `IndexMeta` column families: `(slot, blockhash)` +3) `ErasureMeta` and `IndexMeta` keyspaces: `(slot, blockhash)` Each version of a slot can potentially have different erasure configurations, FEC blocks etc. Each of these needs to be tracked separately -4) `Roots` column family: `(slot, blockhash)` +4) `Roots` keyspace: `(slot, blockhash)` Roots will have to know which version was rooted so we can serve the correct version through Repairman. -5) `Dead Slots` column family: `(slot, blockhash)` +5) `Dead Slots` keyspace: `(slot, blockhash)` Validators need to specify which version of a slot is dead. For instance if one version of a slot doesn't pass entry verification, but another version does, then the two versions must be distinguishable. @@ -176,7 +181,7 @@ tick `T_B`, then make a `Orphan(slot, T_B.hash, T_B.num_hashes)` request to get the last shred in the version of slot `A` that chains to slot `B`. ### Replay Failures -As summarized under the `Dead Slots` column family in the +As summarized under the `Dead Slots` keyspace in the `Indexing the Column Families by Blockhash` section above, validators must now account for the possibility that some versions of a slot have correctness issues while other versions don't. From 9f1852c3166ad997fe9e64dcadff477dd966d920 Mon Sep 17 00:00:00 2001 From: Carl Date: Wed, 1 Jan 2020 17:22:14 -0500 Subject: [PATCH 08/11] Alternate proposal --- .../leader-duplicate-block-slashing.md | 232 +++--------------- 1 file changed, 39 insertions(+), 193 deletions(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index 104da64a8cea92..188666bc5f320e 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -6,219 +6,65 @@ duplicate blocks. Leaders that produce multiple blocks for the same slot increase the number of potential forks that the cluster has to resolve. -## Shred Format +## Overview -Shreds are produced by leaders during their scheduled slot. Each -shred is signed by the leader and is transmitted to the cluster via -Turbine. A shred contains the following +1. Slashing Condition: If you have voted on version `S1` of a slot, it is a slashing condition to vote on any descendant of another version `S2` of the slot, unless that descendant of version `S2` has seen supermajority (> 66%) of the stake voting on it (See section `Slashing Condition` below for how this proof is to be generated). +2. If you see two blockhashes for the same block, and you are less than 2^THRESHOLD lockout on that block, drop it. Flag the block as "duplicated". +3. Only repair a version of the "duplicated" block if you see a `Repair Proof` as outlined in the `Repair Proof` section below -* signature -* header: slot index, shred index -* message data - -The signature is of the merkle tree of the shred data. - -```text - merke root - / \ -(slot index, shred index) hash(msg) -``` - -## Proof of a Duplicate Shred - -Any two different signatures that show a merkle path to the same -`(slot index, shred index)` for the same leader are proof that the -leader produced two conflicting blocks. - -## Storing Multiple Versions of the Same Slot -When there is a possibility of duplicate blocks, a validator needs to track and -replay multiple versions of a slot in case any one of these versions is -finalized by the cluster. - -### Generating a Unique Identifier for Each Version of a Slot -We distinguish different versions of a slot by the ending blockhash. -This blockhash thus needs to be "functionally unique" (meaning a malicious -leader cannot grind out a duplicate in the amount of time they have to -generate a block) across all possible shred outputs generated by a leader. - -To this end it's important to note that: - -1) Two different versions of a slot must generate a different set of shreds. -Two different versions of a slot must either: - -* Chain from different parents, in which case the PoH of the entries will - differ - -* Contain different set of transactions, in which case the PoH of the - entries will be different - -2) The resulting merkle trees of two different sets of shreds have -"functionally unique" merkle roots. - -Then if we were to generate a merkle tree out of the shred batches for a slot, -and define the blockhash of a slot to be: - - `hash(last tick, shred merkle root)` - -This blockhash should be "functionally unique". - -### Generating the Merkle Tree for the Shreds -1) Leader blocks on Poh before last tick can be generated -2) Leader waits all shreds to be generated -3) Computes the Merkle tree of all the shreds -4) Inserts an entry into the Poh Stream with the Merkle root -5) Restart PoH - -### Indexing Keyspaces by Blockhash -We augment all slot-related keyspaces in blocktree to include a blockhash -to support tracking multiple versions of the same slot. We outline the notable -ones below: - -1) `Data` and `Erasure` keyspaces: `(slot, blockhash, shred_index)` - -Two cases here: - -* When leaders generate their own slot, the `blockhash` portion of the key - is set to `Hash::default()` until the blockhash is computed at the end of - the slot. Then the leader will have to store a mapping from - `Hash::default()` to the actual blockhash in a separate area of storage - (another column family?). This is important in order to respond to repairs - which will now specify a `blockhash` in addtion to a `slot` and `index`. - Thus if the node crashes before this mapping can be stored, then on restart - the validator needs to recompute the blockhash in `blocktree_processor`. - -* When validators receive slots from validators, the `blockhash` portion - of the key is set to `Hash::default()`, until they receive the entire - block, at whch point they follow the same procedure as the first case above - to map `Hash::default()`to the actual blockhash. - - If validators see a different version of the same shred for the same slot - and index, then that means there has been a duplicate transmission. If such - a conflicting version is detected before the block is completed, then - validators drop the block. If the block is already completed, the validators - simply drop the duplicate and rely on the procedure outlined in the section - `Repairing Multiple versions of the Same Slot` below. - -2) `SlotMeta` keyspace: `(slot, blockhash)` -Each version of a slot will have different `consumed`, `received` and -potentially different children as well. Thus you need a separate SlotMeta -for each version. +## Primitives -3) `ErasureMeta` and `IndexMeta` keyspaces: `(slot, blockhash)` -Each version of a slot can potentially have different erasure configurations, -FEC blocks etc. Each of these needs to be tracked separately +For a bank `B`, let `A` be the latest ancestor of `B` that has gotten supermajority > 66% votes. -4) `Roots` keyspace: `(slot, blockhash)` -Roots will have to know which version was rooted so we can serve the correct -version through Repairman. +1) Define the function `Confirmed` to be `Confirmed(B) = A`. +2) Define `parent(B)` to be the parent bank of `B`. -5) `Dead Slots` keyspace: `(slot, blockhash)` -Validators need to specify which version of a slot is dead. For instance if one -version of a slot doesn't pass entry verification, but another version does, -then the two versions must be distinguishable. +The `bank_hash` of each bank `B` is now augmented to be: -An issue to note here is that if some version of a slot violates correctness -before the slot is finished, then the validator does not know what the ending -blockhash is and thus cannot store the slot as dead. In these cases we can drop -the entire slot and wait for repair. More details in the `Replay Failures` -section below. +`hash(Confirmed(A).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. -### Repairing Multiple versions of the Same Slot -Repair is augmented with a blockhash. The various types of repairs: +## Repair Proof -`Shred(slot, blockhash, index)` - Ask for a specific shred in a slot -with version `blockhash`. +Let `S'` be some version of slot `S` that a validator has already detected a +duplicate version of and dropped and marked as "duplicate" +(step 2 of the `Overview` section above). A validator will only repair +`S'` if it's provided a proof that `S'` has been confirmed +(been voted on by greater than 66% of the stake). -Notes: +Call such a proof `RepairProof(S')`. -1) Setting `blockhash` to `Hash::default()` means this validator assumes there -is only one version of this slot in the cluster and will acccept any shred -for `slot` and `index`. However, if a validator gets a mixed bag of such -shreds that do not chain, they will then drop the entire `slot`, and only -repair `slot` if some later child slot chains back to `slot` (this will be -initiated by `Orphan` repairs below). +This proof consists of a supermajority of validator's votes, where each vote `V`: -2) Repair responses need to be tied to a particular blockhash. However, -because repair responses are limited to the size of a shred, we cannot -include the blockhash in the repair response. This means we need to include -some repair "cookie" in the request + response that maps some number of bits -to a particular `blockhash`. This breaks down if there are more versions of -this slot than can be tracked by the number of bits allocated to the cookie. +* Contains a bank hash `H(B)` for some bank `B` such that `Confirmed(B).slot < S'.slot` (if a supermajority confirmed `S'`, then such a set must exist because there must have been some initial set of validators that voted on `S'` before it was confrmed). This can be confirmed for each vote if the prover provides `state_hash(B)`, `Confirmed(B).slot`, `bank_hash(parent(B))`, `parent(B).slot` such that `H(B) == hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. +* Show that `V.slot` is descended from `S'` by proving the bank hash `H(B)` can be derived from the bank hash and slot of `S'`. This is done by providing the trail of bank hashes and parent slots to +recreate the hash. -`Orphan(slot, first_tick_hash, num_hashes)` - Ask for the last shred -of the parent of the child slot, where the child has slot equal to `slot`. -This last shred must contain the last tick `T_p` in that parent slot such that -hashing `T_p` `num_hashes` times is equal to `first_tick_hash`. +## Slashing Condition -Notes: +### Goals +The proof of slashing aims to show a validator signed two votes on two bank hashes `H1` and `H2` for two banks `B1` and `B2` where both: +* `Confirmed(B1) < S.slot` and `Confirmed(B2) < S.slot`. +* Chain from two different versions of some slot `S` -1) The response will include the last shred of the parent slot. From this -last shred the validator should be able to get the last tick and the -merkle root, which can be used to compute the `blockhash`. This `blockhash` -is then used to make requests of the form `Shred(slot, blockhash, index)` -to repair the rest of this slot. +### Contents of the Proof +Let `S1` and `S2` be two versions of some slot `S`. A proof shows two signed votes for two +banks `B1` and `B2` with bank hashes `H(B1)` and `H(B2)`. The proof shows: -2) Also requires a cookie similar to the `Shred(slot, blockhash, index)` -requests above. +* `Proof of Minority`: For each of these hashes `H` and each bank `B`, show that `Confirmed(B) < S` by providing `state_hash(B)`, `Confirmed(B).slot`, `bank_hash(parent(B))`, `parent(B).slot` such that: -### Chaining Multiple Versions of the Same Slot -When there are multiple versions for a slot `A`, a natural question that arises -is how validators know which version of slot `A` some descendant slot `B` -chains to. Shreds currently specify only a `parent` slot, but not which version -of that parent. The approach to figure out chaining behavior is then: +`H(B) == hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. -1) Wait for all the shreds for the first entry `E_B` of slot `B` to arrive -(Implementation can make sure first shred `S_B` always contains only a tick -to avoid waiting for multiple shreds). -2) These shreds for slot `B` are stored under the version `Hash::default()` -(optimistically assume this child is the only version). If another conflicting -version of `B` is detected before this version is completed, we drop all the -shreds for slot `B`. -3) For all possible versions of slot `A` see which version chains to `E_B` -4) If no version of slot `A` chains, then deserialize `S_B` to find the first -tick `T_B`, then make a `Orphan(slot, T_B.hash, T_B.num_hashes)` request -to get the last shred in the version of slot `A` that chains to slot `B`. +* `Proof of Chaining`: Prove that both banks are descended from a different version of `S.slot` -### Replay Failures -As summarized under the `Dead Slots` keyspace in the -`Indexing the Column Families by Blockhash` section above, validators must now -account for the possibility that some versions of a slot have correctness -issues while other versions don't. +From the protocol outlined in `Conditions for Repairing a Slot with Multiple Versions`, if a validator +is shown valid `Repair Proof(S1)` and `Repair Proof(S2)`, then that means at least 33% +of the validators must have votes for different versions of `S` in both repair proofs. We can then generate a slashing proof for these 33% by comparing the overlapping validator's votes that were +contained in both repair proofs (Both `Proof of Minority` and the conflicting `Proof of Chaining` will exist in the two `Repair Proof`'s). -Let `V_A` be a version of slot `A` with blockhash `B_A`. +### Guarantees: -Assume that on replay of `V_A` the validator runs into some correctness issue -(entry verification failure, bad tick count, etc.) while replaying the entries. +1) If a correct validator in the cluster is locked out greater than 2^THRESHOLD on a version "A" of a block, then that version is the only version that correct validators will repair, unless at least 33% of validators equivocate and get slashed +2) If no correct valdiator is locked out greater than 2^THRESHOLD, then everybody must have dropped the block and picked another fork -Define `S` to be the set of shreds as follows: -1) On entry verification failures of entries`E1` and `E2`: - -Let `S` be the set of all shreds that contain any part of `E1` and `E2`. - -2) On TransactionError in some entry `E`: - -Let `S` be the set of all shreds that contain any part of `E`. - -3) On Blocktree inability to deserialize an entry from a set of shreds: - -Let `S` be the FEC set that failed to deserialize - -4) On BlockErrors (InvalidTickCount, InvalidHashCount, TrailingEntry, etc.) -on some entry `E` - -Let `S` be the set of all shreds that contain any part of `E`. - - -Protocol: - -1) The validator queries for a merkle proof of all shreds in `S` to prove that -all the offending shreds were indeed part of the version `A` with blockhash `B_A`. - -2) If the merkle proof checks out, we add `(A, B_A)` to the `Dead Slots` column -family. No further forks chaining to this slot will be played. - -3) If the merkle proof instead shows that there is a different version of some -shred in `S`, that means we got maliciously sent the wrong shred for version -`B_A`. We must then drop those wrong shreds and repair them again. \ No newline at end of file From c33ca8e66567ac63925c214672e77ec30714219b Mon Sep 17 00:00:00 2001 From: Carl Date: Thu, 2 Jan 2020 14:18:04 -0500 Subject: [PATCH 09/11] Fix formatting --- .../leader-duplicate-block-slashing.md | 85 +++++++++++++------ 1 file changed, 57 insertions(+), 28 deletions(-) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/leader-duplicate-block-slashing.md index 188666bc5f320e..0c07307729f249 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/leader-duplicate-block-slashing.md @@ -1,70 +1,99 @@ # Leader Duplicate Block Slashing -This design describes how the cluster slashes leaders that produce -duplicate blocks. +This design describes how the cluster slashes leaders that produce duplicate +blocks. -Leaders that produce multiple blocks for the same slot increase the -number of potential forks that the cluster has to resolve. +Leaders that produce multiple blocks for the same slot increase the number of +potential forks that the cluster has to resolve. ## Overview -1. Slashing Condition: If you have voted on version `S1` of a slot, it is a slashing condition to vote on any descendant of another version `S2` of the slot, unless that descendant of version `S2` has seen supermajority (> 66%) of the stake voting on it (See section `Slashing Condition` below for how this proof is to be generated). -2. If you see two blockhashes for the same block, and you are less than 2^THRESHOLD lockout on that block, drop it. Flag the block as "duplicated". -3. Only repair a version of the "duplicated" block if you see a `Repair Proof` as outlined in the `Repair Proof` section below +1. Slashing Condition: If you have voted on version `S` of a slot, it is a +slashing condition to vote on any descendant of another version `S'` of the +slot, unless that descendant of version `S'` has seen supermajority (> 66%) +of the stake voting on it (See section `Slashing Condition` below for how this +proof is to be generated). +2. If you see two blockhashes for the same block, and you are less than +`2^THRESHOLD` lockout on that block, drop it. Flag the block as "duplicated". +3. Only repair a version of the "duplicated" block if you see a `Repair Proof` +as outlined in the `Repair Proof` section below ## Primitives -For a bank `B`, let `A` be the latest ancestor of `B` that has gotten supermajority > 66% votes. +For a bank `B`, let `A` be the latest ancestor of `B` that has gotten +supermajority > 66% votes. 1) Define the function `Confirmed` to be `Confirmed(B) = A`. 2) Define `parent(B)` to be the parent bank of `B`. The `bank_hash` of each bank `B` is now augmented to be: -`hash(Confirmed(A).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. +`hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. + +We include these components to support the proof generation described +in the sections below. ## Repair Proof Let `S'` be some version of slot `S` that a validator has already detected a -duplicate version of and dropped and marked as "duplicate" -(step 2 of the `Overview` section above). A validator will only repair -`S'` if it's provided a proof that `S'` has been confirmed -(been voted on by greater than 66% of the stake). +duplicate version of and dropped and marked as "duplicate" (step 2 of the +`Overview` section above). A validator will only repair `S'` if it's +provided a proof that `S'` has been confirmed (been voted on by greater than +66% of the stake). Call such a proof `RepairProof(S')`. This proof consists of a supermajority of validator's votes, where each vote `V`: -* Contains a bank hash `H(B)` for some bank `B` such that `Confirmed(B).slot < S'.slot` (if a supermajority confirmed `S'`, then such a set must exist because there must have been some initial set of validators that voted on `S'` before it was confrmed). This can be confirmed for each vote if the prover provides `state_hash(B)`, `Confirmed(B).slot`, `bank_hash(parent(B))`, `parent(B).slot` such that `H(B) == hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. +* Contains a bank hash `H(B)` for some bank `B` such that +`Confirmed(B).slot < S'.slot` (if a supermajority confirmed `S'`, then such a +set must exist because there must have been some initial set of validators that +voted on `S'` before it was confrmed). This can be confirmed for each vote if +the prover provides `state_hash(B)`, `Confirmed(B).slot`, +`bank_hash(parent(B))`, `parent(B).slot` such that + +`H(B) == hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. -* Show that `V.slot` is descended from `S'` by proving the bank hash `H(B)` can be derived from the bank hash and slot of `S'`. This is done by providing the trail of bank hashes and parent slots to -recreate the hash. +* Show that `V.slot` is descended from `S'` by proving the bank hash `H(B)` +can be derived from the bank hash and slot of `S'`. This is done by providing +the trail of bank hashes and parent slots to recreate the hash. ## Slashing Condition ### Goals -The proof of slashing aims to show a validator signed two votes on two bank hashes `H1` and `H2` for two banks `B1` and `B2` where both: +The proof of slashing aims to show a validator signed two votes on two bank +hashes `H1` and `H2` for two banks `B1` and `B2` where both: +* Chain from two different versions of the same slot `S`. * `Confirmed(B1) < S.slot` and `Confirmed(B2) < S.slot`. -* Chain from two different versions of some slot `S` ### Contents of the Proof -Let `S1` and `S2` be two versions of some slot `S`. A proof shows two signed votes for two +Let `S` and `S'` be two versions of some slot. A proof shows two signed votes for two banks `B1` and `B2` with bank hashes `H(B1)` and `H(B2)`. The proof shows: - -* `Proof of Minority`: For each of these hashes `H` and each bank `B`, show that `Confirmed(B) < S` by providing `state_hash(B)`, `Confirmed(B).slot`, `bank_hash(parent(B))`, `parent(B).slot` such that: +s +* `Proof of Minority`: For each of these hashes `H` and each bank `B`, show +that `Confirmed(B) < S` by providing `state_hash(B)`, `Confirmed(B).slot`, +`bank_hash(parent(B))`, `parent(B).slot` such that: `H(B) == hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. -* `Proof of Chaining`: Prove that both banks are descended from a different version of `S.slot` +* `Proof of Chaining`: Prove that both banks are descended from a different +version of `S.slot` -From the protocol outlined in `Conditions for Repairing a Slot with Multiple Versions`, if a validator -is shown valid `Repair Proof(S1)` and `Repair Proof(S2)`, then that means at least 33% -of the validators must have votes for different versions of `S` in both repair proofs. We can then generate a slashing proof for these 33% by comparing the overlapping validator's votes that were -contained in both repair proofs (Both `Proof of Minority` and the conflicting `Proof of Chaining` will exist in the two `Repair Proof`'s). +From the protocol outlined in `Conditions for Repairing a Slot with Multiple Versions`, +if a validator is shown valid `Repair Proof(S)` and `Repair Proof(S')`, +then that means at least 33% of the validators must have votes for different +versions of the slot in both repair proofs. We can then generate a slashing +proof for these 33% by comparing the overlapping validator's votes that were +contained in both repair proofs (Both `Proof of Minority` and the conflicting +`Proof of Chaining` will exist in the two `Repair Proof`'s). ### Guarantees: -1) If a correct validator in the cluster is locked out greater than 2^THRESHOLD on a version "A" of a block, then that version is the only version that correct validators will repair, unless at least 33% of validators equivocate and get slashed -2) If no correct valdiator is locked out greater than 2^THRESHOLD, then everybody must have dropped the block and picked another fork +1) If a correct validator in the cluster is locked out greater than `2^THRESHOLD` +on a version "A" of a block, then that version is the only version that correct +validators will repair, unless at least 33% of validators equivocate and get +slashed. +2) If no correct valdiator is locked out greater than `2^THRESHOLD`, then +everybody must have dropped the block and picked another fork. From fd45dd0c2b102237027d4c1281e3c6a50fdd0d38 Mon Sep 17 00:00:00 2001 From: Carl Date: Tue, 7 Jan 2020 21:59:39 -0800 Subject: [PATCH 10/11] naming nits --- book/src/SUMMARY.md | 2 +- ...-duplicate-block-slashing.md => handle-duplicate-block.md} | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) rename book/src/proposals/{leader-duplicate-block-slashing.md => handle-duplicate-block.md} (97%) diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md index 09b1da8ab901b9..510e5c5780d064 100644 --- a/book/src/SUMMARY.md +++ b/book/src/SUMMARY.md @@ -61,7 +61,7 @@ * [Tick Verification](proposals/tick-verification.md) * [Block Confirmation](proposals/block-confirmation.md) * [ABI Management](proposals/abi-management.md) - * [Handling Duplicate Leader Blocks](proposals/leader-duplicate-block-slashing.md) + * [Handling Duplicate Leader Blocks](proposals/handle-duplicate-block.md) * [Implemented Design Proposals](implemented-proposals/README.md) * [Blockstore](implemented-proposals/blockstore.md) * [Cluster Software Installation and Updates](implemented-proposals/installer.md) diff --git a/book/src/proposals/leader-duplicate-block-slashing.md b/book/src/proposals/handle-duplicate-block.md similarity index 97% rename from book/src/proposals/leader-duplicate-block-slashing.md rename to book/src/proposals/handle-duplicate-block.md index 0c07307729f249..dba693fbd4a689 100644 --- a/book/src/proposals/leader-duplicate-block-slashing.md +++ b/book/src/proposals/handle-duplicate-block.md @@ -33,7 +33,7 @@ The `bank_hash` of each bank `B` is now augmented to be: We include these components to support the proof generation described in the sections below. -## Repair Proof +## Proof of Repair Safety Let `S'` be some version of slot `S` that a validator has already detected a duplicate version of and dropped and marked as "duplicate" (step 2 of the @@ -70,7 +70,7 @@ hashes `H1` and `H2` for two banks `B1` and `B2` where both: Let `S` and `S'` be two versions of some slot. A proof shows two signed votes for two banks `B1` and `B2` with bank hashes `H(B1)` and `H(B2)`. The proof shows: s -* `Proof of Minority`: For each of these hashes `H` and each bank `B`, show +* `Proof of Unconfirmed`: For each of these hashes `H` and each bank `B`, show that `Confirmed(B) < S` by providing `state_hash(B)`, `Confirmed(B).slot`, `bank_hash(parent(B))`, `parent(B).slot` such that: From f8eeef50396d2dd3a32ab56a38fc64d39c4ca75a Mon Sep 17 00:00:00 2001 From: Carl Date: Wed, 15 Jan 2020 19:24:34 -0800 Subject: [PATCH 11/11] Updated for ongoing implementation --- book/src/proposals/handle-duplicate-block.md | 169 +++++++++---------- 1 file changed, 78 insertions(+), 91 deletions(-) diff --git a/book/src/proposals/handle-duplicate-block.md b/book/src/proposals/handle-duplicate-block.md index dba693fbd4a689..5cb6a32909d32e 100644 --- a/book/src/proposals/handle-duplicate-block.md +++ b/book/src/proposals/handle-duplicate-block.md @@ -6,94 +6,81 @@ blocks. Leaders that produce multiple blocks for the same slot increase the number of potential forks that the cluster has to resolve. -## Overview - -1. Slashing Condition: If you have voted on version `S` of a slot, it is a -slashing condition to vote on any descendant of another version `S'` of the -slot, unless that descendant of version `S'` has seen supermajority (> 66%) -of the stake voting on it (See section `Slashing Condition` below for how this -proof is to be generated). -2. If you see two blockhashes for the same block, and you are less than -`2^THRESHOLD` lockout on that block, drop it. Flag the block as "duplicated". -3. Only repair a version of the "duplicated" block if you see a `Repair Proof` -as outlined in the `Repair Proof` section below - -## Primitives - -For a bank `B`, let `A` be the latest ancestor of `B` that has gotten -supermajority > 66% votes. - -1) Define the function `Confirmed` to be `Confirmed(B) = A`. -2) Define `parent(B)` to be the parent bank of `B`. - -The `bank_hash` of each bank `B` is now augmented to be: - -`hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. - -We include these components to support the proof generation described -in the sections below. - -## Proof of Repair Safety - -Let `S'` be some version of slot `S` that a validator has already detected a -duplicate version of and dropped and marked as "duplicate" (step 2 of the -`Overview` section above). A validator will only repair `S'` if it's -provided a proof that `S'` has been confirmed (been voted on by greater than -66% of the stake). - -Call such a proof `RepairProof(S')`. - -This proof consists of a supermajority of validator's votes, where each vote `V`: - -* Contains a bank hash `H(B)` for some bank `B` such that -`Confirmed(B).slot < S'.slot` (if a supermajority confirmed `S'`, then such a -set must exist because there must have been some initial set of validators that -voted on `S'` before it was confrmed). This can be confirmed for each vote if -the prover provides `state_hash(B)`, `Confirmed(B).slot`, -`bank_hash(parent(B))`, `parent(B).slot` such that - -`H(B) == hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. - -* Show that `V.slot` is descended from `S'` by proving the bank hash `H(B)` -can be derived from the bank hash and slot of `S'`. This is done by providing -the trail of bank hashes and parent slots to recreate the hash. - -## Slashing Condition - -### Goals -The proof of slashing aims to show a validator signed two votes on two bank -hashes `H1` and `H2` for two banks `B1` and `B2` where both: -* Chain from two different versions of the same slot `S`. -* `Confirmed(B1) < S.slot` and `Confirmed(B2) < S.slot`. - -### Contents of the Proof -Let `S` and `S'` be two versions of some slot. A proof shows two signed votes for two -banks `B1` and `B2` with bank hashes `H(B1)` and `H(B2)`. The proof shows: -s -* `Proof of Unconfirmed`: For each of these hashes `H` and each bank `B`, show -that `Confirmed(B) < S` by providing `state_hash(B)`, `Confirmed(B).slot`, -`bank_hash(parent(B))`, `parent(B).slot` such that: - -`H(B) == hash(Confirmed(B).slot, state_hash(B), bank_hash(parent(B)), parent(B).slot)`. - -* `Proof of Chaining`: Prove that both banks are descended from a different -version of `S.slot` - -From the protocol outlined in `Conditions for Repairing a Slot with Multiple Versions`, -if a validator is shown valid `Repair Proof(S)` and `Repair Proof(S')`, -then that means at least 33% of the validators must have votes for different -versions of the slot in both repair proofs. We can then generate a slashing -proof for these 33% by comparing the overlapping validator's votes that were -contained in both repair proofs (Both `Proof of Minority` and the conflicting -`Proof of Chaining` will exist in the two `Repair Proof`'s). - -### Guarantees: - -1) If a correct validator in the cluster is locked out greater than `2^THRESHOLD` -on a version "A" of a block, then that version is the only version that correct -validators will repair, unless at least 33% of validators equivocate and get -slashed. -2) If no correct valdiator is locked out greater than `2^THRESHOLD`, then -everybody must have dropped the block and picked another fork. - - +## Procedure +1. Blockstore changes: + * a. Augment `DeadSlots` keyspace in Blockstore to be an Option. + This is called the `approved_blockhash`. + * b. Augment the `Roots` column family to include the blockhash of the + rooted banks + +2. Once the CheckDuplicate thread detects a duplicate slot, it: + * a. Stores a proof of the two duplicate shreds for that slot + * b. Sends a signal to ReplayStage for that slot. + +3. Once ReplayStage receives a signal about a duplicate slot, it checks if +the current version with blockhash `B` has less than 2^THRESHOLD lockout. + * a. If so, then remove that slot and all its children from the progress + map. Mark the slot as `dead` in `DeadSlots`, with `approved_blockhash` + set to `None`. + * b. Otherwise, check `!approved_blockhash.is_none()`. + * i. If true, set the `approved_blockhash` for that slot to `B` in + `DeadSlots`. This is why `1b` is important, because banks for slots + earlier than the root will have been purged by BankForks and thus + the blockhash will not be available in memory. + * ii. If false, this hash must have been set by step `6b`, in which + case, check that the existing hash matches `B`. If not true, throw an + error because that meanss >33% of the cluster is malicious. + * c. When fetching new slots in ReplayStage, a slot is not replayed if it is + dead, unless an `approved_blockhash` is set. + + Note: It's posssible the slot is already dead when ReplayStage receives + the duplicate signal. In this case the `approved_blockhash` in `DeadSlots` + will also be set as `None`. This case is be handled by step `5` below if another + playble version of this slot gains approvals (The "approved blockhash" + will be set). + +4. WindowService will now stop accepting shreds for dead slots or shreds with +parents chaining to dead slots, unless the shred is also: + * a. A repair request + * b. For the `approved_blockhash` (TODO: Need a way to confirm this, as + repair requests are currently too small to contain another hash. + Probably will need a merkle) + + Thus, a duplicate slot marked "dead" by ReplayStage will not receive further + shreds unless an "approved blockhassh" is set. + +5. Repair thread iterates over the set of slots in Blocktree that are: + * a. Greater than the root + * b. The slot exists in `DeadSlots` in Blockstore but the + `approved_blockhash` in `DeadSlots` is `None` (implies ReplayStage + has either gotten a signal from the CheckDuplicate thread, or + has seen a bad version of this slot). + * c. The slot exists in `DuplicateSlots` in Blockstore + + For each of these slots that passes the above criteria, the repair thread + queries the cluster's validators about their `approved_blockhash`. + + If the repair thread sees >33% of validators with the same `approved_blockhash` + `B`, then that means the following condition must be true: + + `There are > 66% of people who have voted on blockhash B` + + This is because there are <= 33% malicious validators on the network, and + `> 33%` responded with the same `approved_blockhash`, so there is at least + one correct validator that saw the above condition hold. This is then a + safe version of the slot to repair because no correct validator can be locked + out more than `2^THRESHOLD` on another version of this slot, and thus all + correct validators will either repair this version off the slot, or skip + this slot. + + The repair thread then sends an `Approved Blockhash` signal to ReplayStage + +6. Upon receiving the `Approved Blockhash` signal, ReplayStage checks if +an `approved_blockhash` is equal to `None`. If not: + * a. Clear the slot-related columns in Blockstore. This is safe because + there are no simultaneous writes to these columns from WindowService as + guaranteed by step `4`. + * b. Set the `approved_blockhash` in DeadSlots, allowing ReplayStage to once + again replay this slot. (see step `3c`). + + If the hash does exist, run step `3.b.ii` above. \ No newline at end of file