Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

das: Parallelise DASer #988

Merged
merged 1 commit into from
Sep 19, 2022

Conversation

walldiss
Copy link
Member

@walldiss walldiss commented Aug 5, 2022

Resolves

Distribute data availability sampling across pool of workers.

New DASer implementation bring pool of workers. Core features:

  • Sampling from header with height 1 to latest known one.
  • Subscription to new headers is running in parallel and pushes new headers to DASer callback on demand.
  • New found headers are processed with higher priority, but from only 1 worker at time. This way catch-up workers are not block by low speed of recent headers sampling.
  • Add very basic error handling: If header is impossible to sample its height is stored for retry upon restart. Amount of retries persist over checkpoint until header is sampled successfully
  • Store sampling state periodically in case of force quit.

TODO:

  • add tracing
  • add metrics

@walldiss walldiss self-assigned this Aug 5, 2022
das/daser.go Outdated Show resolved Hide resolved
das/state.go Outdated Show resolved Hide resolved
@walldiss walldiss force-pushed the vlad/daser-parallelization branch 3 times, most recently from 8424ef6 to d2af2e9 Compare August 15, 2022 14:58
Copy link
Collaborator

@distractedm1nd distractedm1nd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just looking at docs and names for now

das/stats.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/manager.go Outdated Show resolved Hide resolved
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving an intermediate review for now --

I like the general direction. I think there are things that could be simplified (e.g. the backgroundStore). Also, can we rename discovery of headers to something other than discovery? Just something that indicates it's recent/received through subscription rather than some active discovery process.

Another thing is the frequency/duplicity of storing the checkpoint to disk:

2022-08-18T10:34:56.270+0200	INFO	das	das/daser.go:200	discovered new header	{"height": 181780}
2022-08-18T10:34:56.270+0200	INFO	das	das/state.go:63	added recent headers maxKnown DASer priority queue 	{"from_height": 181779, "to_height": 181780}
2022-08-18T10:35:15.924+0200	INFO	das	das/daser.go:151	stored checkpoint to disk	{"checkpoint": {"min_sampled_height":173870,"max_known_height":181780,"skipped":{}}}
2022-08-18T10:36:15.920+0200	INFO	das	das/daser.go:151	stored checkpoint to disk	{"checkpoint": {"min_sampled_height":173870,"max_known_height":181780,"skipped":{}}}
2022-08-18T10:37:15.918+0200	INFO	das	das/daser.go:151	stored checkpoint to disk	{"checkpoint": {"min_sampled_height":173870,"max_known_height":181780,"skipped":{}}}
2022-08-18T10:38:15.921+0200	INFO	das	das/daser.go:151	stored checkpoint to disk	{"checkpoint": {"min_sampled_height":173870,"max_known_height":181780,"skipped":{}}}
2022-08-18T10:39:15.916+0200	INFO	das	das/daser.go:151	stored checkpoint to disk	{"checkpoint": {"min_sampled_height":173870,"max_known_height":181780,"skipped":{}}}
2022-08-18T10:40:15.915+0200	INFO	das	das/daser.go:151	stored checkpoint to disk	{"checkpoint": {"min_sampled_height":173870,"max_known_height":181780,"skipped":{}}}

I am running your branch against mamaki and this is taken from my logs ^.

I think this PR would benefit from some further simplification as well as cleaning up of weird behaviours like the one I just mentioned. Also more documentation :) There is a doc.go file and it would be nice if you could update it.

das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/manager.go Outdated Show resolved Hide resolved
@renaynay
Copy link
Member

Also, when stopping the node (without force quitting), I get the following:

2022-08-18T10:45:07.166+0200	ERROR	node	node/node.go:142	Stopping Light Node: cannot close topic: outstanding event handlers or subscriptions; cannot close topic: outstanding event handlers or subscriptions
Error: cannot close topic: outstanding event handlers or subscriptions; cannot close topic: outstanding event handlers or subscriptions

@walldiss walldiss force-pushed the vlad/daser-parallelization branch 2 times, most recently from b9cefff to c2f8aaf Compare August 18, 2022 11:59
das/daser.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/manager.go Outdated Show resolved Hide resolved
das/manager.go Outdated Show resolved Hide resolved
das/manager.go Outdated Show resolved Hide resolved
das/state.go Outdated Show resolved Hide resolved
das/state.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
@Wondertan
Copy link
Member

Oops, sorry for making each comment a separate review. I've started using different software for reviews and not yet got used to it

@walldiss walldiss marked this pull request as ready for review August 19, 2022 14:12
@walldiss walldiss changed the title WIP: Daser parallelization Daser parallelization Aug 19, 2022
@Wondertan
Copy link
Member

As per the sync discussion, we decided to rework one by one header height handling in favor of height ranges

@walldiss walldiss force-pushed the vlad/daser-parallelization branch from a7dc00c to cb2e732 Compare August 24, 2022 12:41
@Wondertan
Copy link
Member

Wondertan commented Aug 24, 2022

TODO list from sync call

  • backgroundStore finish
  • test documentation
  • DASer state API
  • Docs
  • Metrics

Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! The logic is correct AFAICS, and I wasn't able to find any bugs. Left mostly stylistic nits/questions and thoughts on things. Looking forward to the above TODO's changes to land

das/worker.go Outdated Show resolved Hide resolved
das/checkpoint_store.go Outdated Show resolved Hide resolved
das/state.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
das/manager.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
das/worker.go Outdated Show resolved Hide resolved
@walldiss walldiss force-pushed the vlad/daser-parallelization branch from 92f0609 to 5f721ba Compare August 26, 2022 10:50
@walldiss
Copy link
Member Author

walldiss commented Sep 8, 2022

More changes from review comments will come in later commits

@walldiss walldiss force-pushed the vlad/daser-parallelization branch 5 times, most recently from c442f34 to 94e4526 Compare September 9, 2022 14:26
@walldiss
Copy link
Member Author

walldiss commented Sep 9, 2022

I’ve squashed DAS’er commit history so it is easier to track new changes. All previous reviews comments were addressed before the squash.

Wondertan
Wondertan previously approved these changes Sep 14, 2022
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just comments and logs fixes. The logic is correct, so a solid approval.

das/daser.go Outdated Show resolved Hide resolved
das/store.go Outdated Show resolved Hide resolved
das/store.go Outdated Show resolved Hide resolved
das/subscriber.go Outdated Show resolved Hide resolved
das/worker.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
das/daser.go Outdated Show resolved Hide resolved
das/store.go Outdated Show resolved Hide resolved
Wondertan
Wondertan previously approved these changes Sep 15, 2022
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapproval

@renaynay renaynay changed the title Daser parallelization das: Parallelise DASer Sep 16, 2022
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome progress on this PR, it looks great now and much easier to follow :) thanks a lot for being so thorough with this.

Nit that can be left for a subsequent PR:
we should godoc every exported function, struct etc. (e.g. WorkerStats) even if it seems obvious as it's best practice

das/daser.go Outdated Show resolved Hide resolved
das/coordinator.go Outdated Show resolved Hide resolved
das/done.go Outdated Show resolved Hide resolved
das/state.go Show resolved Hide resolved
@walldiss walldiss force-pushed the vlad/daser-parallelization branch 2 times, most recently from 4bf660c to 6046d72 Compare September 19, 2022 11:53
 - add pool of workers
 - resume DAS'er and workers upon restart from last checkpoint
 - save checkpoint periodically
 - prioritise headers of newly produced blocks over old ones
@walldiss walldiss force-pushed the vlad/daser-parallelization branch from 6046d72 to 658139d Compare September 19, 2022 11:58
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

morge it

@Wondertan
Copy link
Member

Let's also resolve all the comments to be nice and tidy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:das Related to DASer area:shares Shares and samples kind:break! Attached to breaking PRs
Projects
No open projects
Status: Done
5 participants