Skip to content
This repository has been archived by the owner on Aug 18, 2020. It is now read-only.

CBR-525: Fix failure to sync issue when switching to OBFT #4153

Merged
merged 8 commits into from
May 31, 2019

Conversation

erikd
Copy link
Member

@erikd erikd commented May 30, 2019

Description

Fix issue that caused the node to loose sync with the chain in OBFT era and then fail to re-sync once out of sync.

Linked issue

https://iohk.myjetbrains.com/youtrack/issue/CBR-525

  • CHANGELOG entry has been added and is linked to the correct PR on GitHub.

Testing checklist

  • I have added tests to cover my changes.
  • All new and existing tests passed.

QA Steps

Synced the OBFT testnet numerous times.

@erikd erikd force-pushed the erikd/obft-validation branch 5 times, most recently from 464bbdf to ff01d02 Compare May 30, 2019 08:46
@mhuesch mhuesch force-pushed the erikd/obft-validation branch from ff01d02 to 9a5a3db Compare May 30, 2019 17:16
@mhuesch
Copy link
Contributor

mhuesch commented May 30, 2019

It seems that Hydra ran out of disk space when running over Erik's last push (source):

/nix/store/w9avgn80c1vqsn9bpm1x5401qknlc966-cctools-binutils-darwin/bin/ranlib: file: dist/build/libHScardano-sl-utxo-3.0.1-KHxprOrWJ7jBRIg17UkM2t-ghc8.4.4.a(elf_reloc_aarch64.o) has no symbols
fatal error: /nix/store/w9avgn80c1vqsn9bpm1x5401qknlc966-cctools-binutils-darwin/bin/ranlib: can't write to output file (No space left on device)
running auto-GC to free 26179796992 bytes
finding garbage collector roots...
`ranlib' failed in phase `Ranlib'. (Exit code: 1)

Since it ran a gc I thought it may work if I just retrigger CI, so I ran git commit --amend and force pushed.


There are also errors with AppVeyor (1 & 2), but they are not required so at present I plan to ignore them.

Copy link
Contributor

@intricate intricate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments about typos (no big deal). Going to run this on OBFT testnet and make sure it all goes well before approving on my end.

CHANGELOG.md Outdated Show resolved Hide resolved
db/src/Pos/DB/Block/Logic/SplitByEpoch.hs Outdated Show resolved Hide resolved
db/src/Pos/DB/Block/Logic/VAR.hs Outdated Show resolved Hide resolved
db/src/Pos/DB/Block/Logic/VAR.hs Outdated Show resolved Hide resolved
db/src/Pos/DB/Block/Logic/SplitByEpoch.hs Show resolved Hide resolved
Copy link
Contributor

@mhuesch mhuesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job. I left some comments. Approving because I do not believe any of the few issues I raised to be dealbreakers. The most important, I think, is this one but I expect that in practice the two values are always equal.

obftLeaderCanMint :: AddressHash PublicKey -> BlockCount -> OldestFirst [] LastSlotInfo -> Bool
obftLeaderCanMint leaderAddrHash blkSecurityParam (OldestFirst lastBlkSlots) =
obftLeaderCanMint :: AddressHash PublicKey -> BlockCount -> LastBlkSlots -> Bool
obftLeaderCanMint leaderAddrHash blkSecurityParam lastBlkSlots =
blocksMintedByLeaderInLastKSlots leaderAddrHash lastBlkSlots
<= leaderMintThreshold blkSecurityParam
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there risk of a mismatch between blkSecurityParam and the lbsCount inside of lastBlkSlots? I think, but am not sure, that if those get out of sync we get an invalid comparison.

The LastBlkSlots is binning based on one k value, and the threshold demanded is based on another k value.

Perhaps we can just check that the values are equal at some point. They should always be equal, right?

Copy link
Member Author

@erikd erikd May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if those get out of sync we get an invalid comparison.

The only way they could get out of sync was there was a successful update proposal to change it and I really don't think there will be such a thing.

There is however another related issue. For OBFT we are checking that the block we are validating has not been signed too often in the last k blocks when I think it should actually be last k slots. This is only the same thing if there are no slots with missing blocks.
Confirmed with @dcoutts that blocks is indeed correct.

db/src/Pos/DB/Block/Slog/Logic.hs Show resolved Hide resolved
db/src/Pos/DB/Update/Logic/Global.hs Show resolved Hide resolved
chain/test/Test/Pos/Chain/Block/Slog/LastBlkSlots.hs Outdated Show resolved Hide resolved
db/src/Pos/DB/Block/Logic/VAR.hs Outdated Show resolved Hide resolved
db/src/Pos/DB/Block/Logic/VAR.hs Show resolved Hide resolved
@mhuesch
Copy link
Contributor

mhuesch commented May 30, 2019

I documented the delegator/delegatee story in this commit (also my auto-formatter apparently moved imports). I believe we are doing the right thing. Would appreciate a double-check from @intricate and @erikd.

erikd and others added 3 commits May 31, 2019 07:35
The old version of the struct was a list which made calculating the
OBFT parameters highly inefficient. New version is much more
efficient and has tests.
@erikd erikd force-pushed the erikd/obft-validation branch from 0488fd9 to 50e0eb5 Compare May 30, 2019 21:43
@erikd erikd force-pushed the erikd/obft-validation branch from 50e0eb5 to 5abdfa0 Compare May 30, 2019 21:57
@disassembler
Copy link
Contributor

bors r+

iohk-bors bot added a commit that referenced this pull request May 31, 2019
4153: CBR-525: Fix failure to sync issue when switching to OBFT r=disassembler a=erikd

## Description

Fix issue that caused the node to loose sync with the chain in OBFT era and then fail to re-sync once out of sync.

## Linked issue

https://iohk.myjetbrains.com/youtrack/issue/CBR-525

- [x] CHANGELOG entry has been added and is linked to the correct PR on GitHub.

## Testing checklist
<!-- If you aren't providing any tests as part of this PR, use this section to state clearly why. It needs to be a strong motivation and definitely the exception, not the rule. -->
- [x] I have added tests to cover my changes.
- [x] All new and existing tests passed.

## QA Steps
Synced the OBFT testnet numerous times.


Co-authored-by: Erik de Castro Lopo <[email protected]>
Co-authored-by: Michael Hueschen <[email protected]>
Co-authored-by: Samuel Leathers <[email protected]>
@disassembler
Copy link
Contributor

bors r+

iohk-bors bot added a commit that referenced this pull request May 31, 2019
4153: CBR-525: Fix failure to sync issue when switching to OBFT r=disassembler a=erikd

## Description

Fix issue that caused the node to loose sync with the chain in OBFT era and then fail to re-sync once out of sync.

## Linked issue

https://iohk.myjetbrains.com/youtrack/issue/CBR-525

- [x] CHANGELOG entry has been added and is linked to the correct PR on GitHub.

## Testing checklist
<!-- If you aren't providing any tests as part of this PR, use this section to state clearly why. It needs to be a strong motivation and definitely the exception, not the rule. -->
- [x] I have added tests to cover my changes.
- [x] All new and existing tests passed.

## QA Steps
Synced the OBFT testnet numerous times.


Co-authored-by: Erik de Castro Lopo <[email protected]>
Co-authored-by: Michael Hueschen <[email protected]>
Co-authored-by: Samuel Leathers <[email protected]>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented May 31, 2019

@iohk-bors iohk-bors bot merged commit be04da9 into develop May 31, 2019
@iohk-bors iohk-bors bot deleted the erikd/obft-validation branch May 31, 2019 19:39
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants