Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pipeline builder #1017

Merged
merged 12 commits into from
Jan 27, 2023
Merged

feat: pipeline builder #1017

merged 12 commits into from
Jan 27, 2023

Conversation

onbjerg
Copy link
Collaborator

@onbjerg onbjerg commented Jan 24, 2023

Supercedes #963 with a dedicated PipelineBuilder. I will make a follow up PR to clean up the config we use in the CLI and will use in #623. The most unwieldy parts are now:

  1. Constructing downloaders
  2. Easily loading StagesConfig to override values: this is mostly for the CLI. feat(net): test syncing from geth #623 should use the defaults in the stages set. (See last point in Changed)

Example usage can be found in crates/stages/src/lib.rs and bin/reth/src/node/mod.rs.

Added

  • Adds a way to construct a Pipeline using a builder (PipelineBuilder)
  • Adds StageSets that are logical containers of a group of stages
  • Adds a few default StageSets: DefaultStages (all), OnlineStages, OfflineStages, ExecutionStages and HashingStages

Changed

  • Replaces the pipeline event channel with a multi-listener approach (closes PipelineEvent channel should be unbounded #968)
  • Moves some arguments in LinearDownloaderBuilder::build out to their own methods since we would always pass Default::default()
  • Adds defaults to stages where possible - this is intended to supercede the duplicated config in StagesConfig later on
  • Refactored MerkleStage to an enum, I didn't find is_execute: bool to be very clear

Removed

  • Removes StatusUpdater from the headers stage - this was never meant to be in here, currently does nothing for us, and makes it easier to configure the header stage

@onbjerg onbjerg added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) labels Jan 24, 2023
.push(StorageHashingStage { clean_threshold: 500_000, commit_threshold: 100_000 })
// This merkle stage is used only for execute
.push(MerkleStage { is_execute: true });
.add_stages(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses the new stage sets: as mentioned, I want to make either the stage sets or the stages optionally ser/de to replace StagesConfig. This would make this a lot easier. Primary hurdles are:

  1. The downloaders, but it is possible to make them serializable too.
  2. Consensus. This should be determined by the chain spec, so should end up being ser/de too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this abstraction.

/// # Panics
///
/// Panics if the [`Stage`] is not in this set.
fn set<S: Stage<DB> + 'static>(self, stage: S) -> StageSetBuilder<DB> {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just convenience to avoid the .build call if you are only going to override a stage

///
/// This stage should be run with the above two stages, otherwise it is a no-op.
///
/// This stage is split in two: one for calculating hashes and one for unwinding. TODO: Why?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(cc @rakita?) I am not sure why this is split into an execution and an unwind part, but it should be in the docs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is best place to put the docs? merkle stage depends on Hashing stage, it needs to come before Hashing stages for both execution and unwind paths (unwind goes in backwards), this solves unwind ordering that is present in erigon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be in this rustdoc, my question is more precisely: Why does it need to unwind before the hashing stages?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bcs of hashing of stages. Let see a example where plain state Acc1 has a balance1 and it gets updated to balance2

In Execution path this would be:
HashedState calculates hash2 of Acc1 with balance2 (balance2 as it is new state)
MerkleState uses hash2 and removed hash1 (previous hash).

Now for unwind order, MerkleStage comes first but it has only hash2 but it needs hash1. That is why the order needs to be:
HashingState reverts hash2 to hash1
MerkleState uses hash1 and removes hash2 (new hash that needs to be unwinded)

It is called Unwind MerkleStage as it is triggered only on unwind, on execution it does nothing.

Copy link
Member

@Rjected Rjected left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really cool, I like the StageSet abstraction and all the built in StageSets. Along with the PipelineBuilder it should make configuring and running the pipeline very easy, especially in #623

crates/net/downloaders/src/headers/linear.rs Outdated Show resolved Hide resolved
crates/net/downloaders/src/headers/linear.rs Outdated Show resolved Hide resolved
crates/stages/src/pipeline/event.rs Outdated Show resolved Hide resolved
}

/// Run the pipeline in an infinite loop. Will terminate early if the user has specified
/// a `max_block` in the pipeline.
pub async fn run(&mut self, db: Arc<DB>) -> Result<(), PipelineError> {
loop {
let mut state = PipelineState {
events_sender: self.events_sender.clone(),
listeners: self.listeners.clone(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit weird.

but nothing we can do in this PR about it.

looking at self.run_loop(&mut state, seems unnecessary to clone the listeners into the state, at least here, perhaps there are issues when it comes to the stages.

crates/stages/src/pipeline/set.rs Show resolved Hide resolved
crates/stages/src/sets.rs Outdated Show resolved Hide resolved
@onbjerg onbjerg force-pushed the onbjerg/pipeline-builder branch from 52eba2a to 8973d57 Compare January 27, 2023 15:15
@onbjerg onbjerg requested a review from mattsse January 27, 2023 15:22
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm,
the builder abstraction is very useful indeed

crates/stages/src/sets.rs Outdated Show resolved Hide resolved
These were introduced in #978
crates/stages/src/lib.rs Show resolved Hide resolved
crates/stages/src/stages/merkle.rs Outdated Show resolved Hide resolved
Co-authored-by: Georgios Konstantopoulos <[email protected]>
Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it

crates/stages/src/sets.rs Outdated Show resolved Hide resolved
crates/stages/src/sets.rs Outdated Show resolved Hide resolved
crates/stages/src/sets.rs Show resolved Hide resolved
crates/stages/src/sets.rs Show resolved Hide resolved
self
}

/// Disables the given stage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart! when do you think one would use this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For us, probably to disable specific indexing stages when in non-archive mode down the line.

For crate consumers, possibly to replace some stages with their own implementation

crates/stages/src/pipeline/set.rs Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Merging #1017 (6a80568) into main (8cfe240) will decrease coverage by 0.37%.
The diff coverage is 38.48%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #1017      +/-   ##
==========================================
- Coverage   75.14%   74.78%   -0.37%     
==========================================
  Files         313      317       +4     
  Lines       34299    34592     +293     
==========================================
+ Hits        25775    25869      +94     
- Misses       8524     8723     +199     
Flag Coverage Δ
unit-tests 74.78% <38.48%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
bin/reth/src/node/mod.rs 0.00% <0.00%> (ø)
bin/reth/src/stage/mod.rs 0.00% <ø> (ø)
bin/reth/src/test_eth_chain/runner.rs 0.00% <ø> (ø)
crates/stages/src/lib.rs 100.00% <ø> (ø)
crates/stages/src/pipeline/set.rs 0.00% <0.00%> (ø)
crates/stages/src/sets.rs 0.00% <0.00%> (ø)
crates/stages/src/stages/bodies.rs 91.42% <ø> (-1.54%) ⬇️
crates/stages/src/stages/execution.rs 92.32% <ø> (ø)
crates/stages/src/stages/hashing_account.rs 93.67% <0.00%> (-1.21%) ⬇️
crates/stages/src/stages/hashing_storage.rs 95.23% <0.00%> (-0.86%) ⬇️
... and 31 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@onbjerg onbjerg merged commit ba44c15 into main Jan 27, 2023
@onbjerg onbjerg deleted the onbjerg/pipeline-builder branch January 27, 2023 17:21
@onbjerg onbjerg mentioned this pull request Jan 27, 2023
@Rjected Rjected mentioned this pull request Jan 27, 2023
18 tasks
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 5, 2023
Co-authored-by: Georgios Konstantopoulos <[email protected]>
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 5, 2023
Co-authored-by: Georgios Konstantopoulos <[email protected]>
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 5, 2023
Co-authored-by: Georgios Konstantopoulos <[email protected]>
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 5, 2023
Co-authored-by: Georgios Konstantopoulos <[email protected]>
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 6, 2023
Co-authored-by: Georgios Konstantopoulos <[email protected]>
literallymarvellous pushed a commit to literallymarvellous/reth that referenced this pull request Feb 6, 2023
Co-authored-by: Georgios Konstantopoulos <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PipelineEvent channel should be unbounded
6 participants