Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat|refactor(header/sync): network head determination #990

Merged
merged 12 commits into from
Sep 13, 2022
Merged

Conversation

Wondertan
Copy link
Member

@Wondertan Wondertan commented Aug 7, 2022

Context

In #978 we started adding the Head method to Syncer; however, it's not that straightforward, and the deeper we looked, the more issues we found in Syncer related to the new Head method, so it was decided to extract a separate preparation PR that ensures the new Head is safe to use by multiple routines and returns as recent header as possible.

The previous version of Syncer struggled with two issues:

  • On Syncer's start, synchronization didn't start and waited for a gossiped header trigger sync and set a sync target (only when the subjective head was not expired)
    • This is why Node could wait up to block time second to start syncing
  • There was no way to request the most recent objective header of the network
    • I.e. if the user wanted to request the latest possible state, it wasn't able to do that besides waiting for full sync to finish

Changes

The new reimplementation fixes these two problems and improves code readability and docs. Mainly, it splits the existing trustedHead into two methods subjectiveHead and networkHead. The networkHead is supposed to be used by the future Head.
Where the latter now relies on the latest known header timestamp and block time to determine its recency. If the header is not recent, we request it from the trusted peer(s), assuming it's always synced.

In fact two of our bootstrappers falling out of sync, creating syncing issues.

Besides that, three more issues under fix commits related to multithreading were fixed with supporting tests.

Other

Tested sync manually(thought with some struggles due to broken bootstrappers).
As always, review CBC and checkout locally to see the full picture.

TODO

  • Double check that the node correctly determines recency, as their DA network might be lagging one block behind
  • Do we need time drift?
  • Telemetry

@Wondertan Wondertan closed this Aug 7, 2022
@Wondertan Wondertan reopened this Aug 7, 2022
@Wondertan Wondertan force-pushed the hlib/syncer-head branch 2 times, most recently from 2d627be to a0bc2e4 Compare August 7, 2022 09:27
@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2022

Codecov Report

Merging #990 (44ab8a2) into main (4618e2b) will increase coverage by 0.34%.
The diff coverage is 63.35%.

@@            Coverage Diff             @@
##             main     #990      +/-   ##
==========================================
+ Coverage   56.58%   56.92%   +0.34%     
==========================================
  Files         135      136       +1     
  Lines        8994     9063      +69     
==========================================
+ Hits         5089     5159      +70     
+ Misses       3369     3366       -3     
- Partials      536      538       +2     
Impacted Files Coverage Δ
params/network.go 76.92% <ø> (ø)
header/sync/sync_head.go 58.77% <58.77%> (ø)
header/sync/sync.go 69.50% <62.50%> (+4.00%) ⬆️
node/services/service.go 80.83% <72.72%> (-0.93%) ⬇️
header/header.go 50.00% <100.00%> (+3.22%) ⬆️
header/sync/ranges.go 82.47% <100.00%> (ø)
header/verify.go 82.22% <100.00%> (+0.82%) ⬆️
fraud/pb/proof.pb.go 36.61% <0.00%> (+1.89%) ⬆️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@Wondertan Wondertan changed the title feat(params|header/sync): dirty intergration of block time into Syncer feat|refactor(header/sync): revision of Syncer's objective head deteremination logic Aug 7, 2022
@Wondertan Wondertan added the area:header Extended header label Aug 7, 2022
@Wondertan Wondertan self-assigned this Aug 7, 2022
@Wondertan Wondertan marked this pull request as ready for review August 7, 2022 11:21
@Wondertan
Copy link
Member Author

Wondertan commented Aug 7, 2022

Tests failed becauseSyncer.Start now requires initialized Store. Fixed now

header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial thoughts here -- will do a deeper review after our call.

Want to discuss objectiveHead method.

Also we should consider splitting out process/validation-related functions into a separate file (maybe head.go) and just leaving the sync loop logic in sync.go

header/header.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Show resolved Hide resolved
header/header.go Show resolved Hide resolved
header/verify.go Show resolved Hide resolved
@Wondertan Wondertan changed the title feat|refactor(header/sync): revision of Syncer's objective head deteremination logic feat|refactor(header/sync): network head deteremination Aug 8, 2022
@Wondertan
Copy link
Member Author

@renaynay, @distractedm1nd, pls give another read to the docs. I think it's a different level now.

@Wondertan
Copy link
Member Author

We should not merge this one until we downgrade our network to 0.34. Testing recency on Mamaki is impossible due to bigger block times than they should be. Also, this change does not bring ant value without stable block times
image

@Wondertan Wondertan changed the title feat|refactor(header/sync): network head deteremination feat|refactor(header/sync): network head determination Aug 8, 2022
distractedm1nd
distractedm1nd previously approved these changes Aug 9, 2022
Copy link
Collaborator

@distractedm1nd distractedm1nd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs are great

header/sync/sync.go Show resolved Hide resolved
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testing this against arabica and everything looks fine so far -- the problem is not enough blocks have been produced to be able to observe any larger range syncs.

I think we should each (@Wondertan and I) review this PR again with fresh eyes and then we can likely merge.

header/header.go Show resolved Hide resolved
@Wondertan
Copy link
Member Author

testing this against arabica and everything looks fine so far -- the problem is not enough blocks have been produced to be able to observe any larger range syncs.

@renaynay, How do you test? Do you compare that the node's instant head is equal to the core's one? Do you request core's one directly from the validator?

Copy link
Member

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks generally good. I think it is critical that we document the assumptions around subjective head, network head, trusted peers clearly and make any implicit assumptions very explicit to the user.

header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Show resolved Hide resolved
header/sync/sync_head.go Show resolved Hide resolved
renaynay
renaynay previously approved these changes Sep 12, 2022
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on all node types -- works very well. Thank you!

header/sync/sync.go Outdated Show resolved Hide resolved
header/header.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync.go Outdated Show resolved Hide resolved
header/sync/sync_head.go Show resolved Hide resolved
Wondertan and others added 12 commits September 13, 2022 12:36
…emination logic

The previous version of Syncer struggled with two issues:
* On Syncer's start, synchronization didn't start and waited for a gossiped header trigger sync and set a sync target (only when the subjective head was not expired)
	* This is why Node could wait up to block time second to start syncing
* There was no way to request the most recent objective header of the network
	* I.e. if the user wanted to request the latest possible state, it wasn't able to do that besides waiting for full sync to finish.
	The new reimplementation fixes these two problems and improves code readability and docs. Mainly, it splits the existing `trustedHead` into two methods `subjectiveHead` and `objectiveHead`.
 	Where the latter now relies on the latest known header timestamp and block time to determine its recency.

 The new reimplementation fixes these two problems and improves code readability and docs. Mainly, it splits the existing trustedHead into two methods subjectiveHead and objectiveHead.
 Where the latter now relies on the latest known header timestamp and block time to determine its recency. If the header is not recent, we request it from the trusted peer(s), assuming it's always synced.
Mainly, allow Start to error so that subsequent Stop does not panic. While also make lifecycle logic less confusing and less error-prone
Going further, there wiil be multiple readers that should not block each other
…vements

* Terminology change from the 'objective head' to the 'network head' consistent over docs and logs
* More logs for unhappy cases + more information for extisting logs
* Extracttion of head retrieval logic into a separate file
Co-authored-by: rene <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

morge

@renaynay renaynay merged commit 00d80c4 into main Sep 13, 2022
@renaynay renaynay deleted the hlib/syncer-head branch September 13, 2022 11:28
distractedm1nd pushed a commit to renaynay/celestia-node that referenced this pull request Sep 19, 2022
* feat(params|header/sync): dirty intergration of block time into Syncer

Co-authored-by: Rene <[email protected]>

* feat(header): new utility funcs for the ExtendedHeader in preparation for Syncer improvements

Co-authored-by: Rene <[email protected]>

* feat|refactor(header/sync): revision of Syncer's objective head deteremination logic

The previous version of Syncer struggled with two issues:
* On Syncer's start, synchronization didn't start and waited for a gossiped header trigger sync and set a sync target (only when the subjective head was not expired)
	* This is why Node could wait up to block time second to start syncing
* There was no way to request the most recent objective header of the network
	* I.e. if the user wanted to request the latest possible state, it wasn't able to do that besides waiting for full sync to finish.
	The new reimplementation fixes these two problems and improves code readability and docs. Mainly, it splits the existing `trustedHead` into two methods `subjectiveHead` and `objectiveHead`.
 	Where the latter now relies on the latest known header timestamp and block time to determine its recency.

 The new reimplementation fixes these two problems and improves code readability and docs. Mainly, it splits the existing trustedHead into two methods subjectiveHead and objectiveHead.
 Where the latter now relies on the latest known header timestamp and block time to determine its recency. If the header is not recent, we request it from the trusted peer(s), assuming it's always synced.

* docs(header/sync): add TODO for potential optimization

* refactor(header/sync): rework Syncer lifecycling

Mainly, allow Start to error so that subsequent Stop does not panic. While also make lifecycle logic less confusing and less error-prone

* fix(node): do not fail the Start for Syncer if it is not initialized, so that Node tests does not fail

* fix(header/sync): use RWLock for sync ranges

Going further, there wiil be multiple readers that should not block each other

* fix(header/sync): ensure objective head is requested only once when many at any moment

* chore(header/sync): cleanup syncing code and update the tests to use WaitSync

* chore(header/sync): documentation, logging and code dispoisiton improvements

* Terminology change from the 'objective head' to the 'network head' consistent over docs and logs
* More logs for unhappy cases + more information for extisting logs
* Extracttion of head retrieval logic into a separate file

* Apply docs suggestions from @renaynay and @liamsi

Co-authored-by: rene <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>

* Update header/header.go

Co-authored-by: rene <[email protected]>

Co-authored-by: Rene <[email protected]>
Co-authored-by: rene <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
distractedm1nd pushed a commit to distractedm1nd/celestia-node that referenced this pull request Sep 21, 2022
* feat(params|header/sync): dirty intergration of block time into Syncer

Co-authored-by: Rene <[email protected]>

* feat(header): new utility funcs for the ExtendedHeader in preparation for Syncer improvements

Co-authored-by: Rene <[email protected]>

* feat|refactor(header/sync): revision of Syncer's objective head deteremination logic

The previous version of Syncer struggled with two issues:
* On Syncer's start, synchronization didn't start and waited for a gossiped header trigger sync and set a sync target (only when the subjective head was not expired)
	* This is why Node could wait up to block time second to start syncing
* There was no way to request the most recent objective header of the network
	* I.e. if the user wanted to request the latest possible state, it wasn't able to do that besides waiting for full sync to finish.
	The new reimplementation fixes these two problems and improves code readability and docs. Mainly, it splits the existing `trustedHead` into two methods `subjectiveHead` and `objectiveHead`.
 	Where the latter now relies on the latest known header timestamp and block time to determine its recency.

 The new reimplementation fixes these two problems and improves code readability and docs. Mainly, it splits the existing trustedHead into two methods subjectiveHead and objectiveHead.
 Where the latter now relies on the latest known header timestamp and block time to determine its recency. If the header is not recent, we request it from the trusted peer(s), assuming it's always synced.

* docs(header/sync): add TODO for potential optimization

* refactor(header/sync): rework Syncer lifecycling

Mainly, allow Start to error so that subsequent Stop does not panic. While also make lifecycle logic less confusing and less error-prone

* fix(node): do not fail the Start for Syncer if it is not initialized, so that Node tests does not fail

* fix(header/sync): use RWLock for sync ranges

Going further, there wiil be multiple readers that should not block each other

* fix(header/sync): ensure objective head is requested only once when many at any moment

* chore(header/sync): cleanup syncing code and update the tests to use WaitSync

* chore(header/sync): documentation, logging and code dispoisiton improvements

* Terminology change from the 'objective head' to the 'network head' consistent over docs and logs
* More logs for unhappy cases + more information for extisting logs
* Extracttion of head retrieval logic into a separate file

* Apply docs suggestions from @renaynay and @liamsi

Co-authored-by: rene <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>

* Update header/header.go

Co-authored-by: rene <[email protected]>

Co-authored-by: Rene <[email protected]>
Co-authored-by: rene <[email protected]>
Co-authored-by: Ismail Khoffi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:header Extended header
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

6 participants