-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChainSync Pipelined: moar speed. #1615
Conversation
c0a072f
to
76c9f9a
Compare
pipeline (Succ n) respond | natToInt (Succ n) == fromIntegral maxInFlight = | ||
P.CollectResponse Nothing $ collectResponses respond [] n | ||
pipeline n respond = | ||
P.SendMsgRequestNextPipelined $ pipeline (Succ n) respond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use CollectResponse (Just (SendMsgRequestNextPipelined ..)) ...
. This pokes the receiving queue and if we got something it will be available in the next step, so more promptly. You could experiment with some strategies when to use CollectResponse (Just ...) ..
and when SendMsgRequestNextPipelined
. But maybe you're not doing so because of Jörmungandr
support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But maybe you're not doing so because of Jörmungandr support?
Not really. We process blocks by batches through the wallet, to avoid having to trigger the database and create checkpoint after processing every block. The restoration loop is driving the networking side by asking for next blocks; it is in essence similar to how the ChainSync works; except that instead of request a single block, it requests a batch.
So here, we simply leverage pipelining to collect the next batch more rapidly; it queues "maxInFlight" requests, and then collect them all. Before, we would simply call RequestNext
several time, but waiting for each request to complete before calling the next one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see; then you could pipline even more, collect and buffer things as they come from the network (on your side) and send batches for processing. If you do your own buffering you will be able to saturate the network.
c1ef56b
to
0c1a920
Compare
bors try |
tryBuild failed |
Yes, I thought of that but it requires a quite fundamental change in the way our networking layer currently operate. In theory, we could actually have the network layer do its own buffering indeed and, while the wallet engine is busy processing blocks and creating checkpoints, the networking layer could already be pipelining the next queries! Yet, the networking layer is currently driven by the engine on a pull-based mechanism, so it only request the next blocks when asked and this is a synchronous chain of events; I'll try make this a bit more concurrent to see if the complexity is worth adding. Getting below 5 minutes restoration time would be pretty nice :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, and tried restoring with Daedalus Mainnet
5f26e68
to
5172dcd
Compare
bors r+ |
1615: ChainSync Pipelined: moar speed. r=KtorZ a=KtorZ # Issue Number <!-- Put here a reference to the issue this PR relates to and which requirements it tackles --> ADP-270 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - 1e43ecd 📍 **replace the chain sync client with its pipelined version** No pipeline is done yet, this still uses the standard sequential way for sending requests, but now using the 'ChainSyncClientPipelined' which has extra constructor we can now use for pipelining - 461c4d7 📍 **implementing pipelining, pipeline by batches of 1000 requests. It's fast.** This is still work in progress for the 'edge' cases aren't properly handled. Rollbacks are just ignored and, the area near the tip shouldn't be pipelined but is. Yet, this is a starter that gives some idea. I haven't tweaked it much, nor tried different settings but so far, I've restored a wallet in ~6:30 minutes on Mainnet, so more than 10k blocks per second. - 76c9f9a 📍 **only pipeline when it's safe** This also makes sure to properly handle the starting and terminating conditions. This is a bit tricky because, ideally, to know whether we can safely pipeline, we want to check whether any of the pipelined request would be made within k blocks from the tip. The tricky part is "blocks". During rollbacks, we have access to points on the chain which are only referencing slots yet, we need to compare block height! To solve this, I've adopted two strategies: pipeline and one-by-one, defaulting to the latter and only using the former when we have enough information to make a sound decision. This means that the client always starts by restoring a single block from which it can now figure out a blockheight and compare this with the node's tip. In case it's far enough, we go full speed, in steps; checking at each step whether it is still safe to proceed with the next step. The last `k` blocks are therefore fetched one-by-one, which is quite slow but, it's `k` blocks out of millions. We _could_ possibly make this last part even faster by using the same batching strategy as before (where we would basically check after each request whether the node has rolled back or not) but it comes with some added complexity I am not necessarily fond of introducing. This new approach already cuts the overall restoration time by 2.5! e.g. on Mainnet: ``` -- pipelined ----------------- - [2020-05-06 12:12:34.40 UTC] 87a2d760: syncProgress: still restoring (0.00%) - [2020-05-06 12:15:12.80 UTC] 87a2d760: syncProgress: still restoring (50.01%) -- one by one ---------------- - [2020-05-06 12:18:20.34 UTC] 87a2d760: syncProgress: still restoring (99.94%) - [2020-05-06 12:18:37.38 UTC] 87a2d760: syncProgress: restored ------------------------------ 6 minutes 3s / ~11366 block/s ``` # Comments <!-- Additional comments or screenshots to attach if any --> It's fast 🤷♂️ <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Once created, link this PR to its corresponding ticket ✓ Assign the PR to a corresponding milestone ✓ Acknowledge any changes required to the Wiki --> Co-authored-by: KtorZ <[email protected]> Co-authored-by: IOHK <[email protected]>
Build failed |
Interesting. |
No pipeline is done yet, this still uses the standard sequential way for sending requests, but now using the 'ChainSyncClientPipelined' which has extra constructor we can now use for pipelining
…ast. This is still work in progress for the 'edge' cases aren't properly handled. Rollbacks are just ignored and, the area near the tip shouldn't be pipelined but is. Yet, this is a starter that gives some idea. I haven't tweaked it much, nor tried different settings but so far, I've restored a wallet in ~6:30 minutes on Mainnet, so more than 10k blocks per second.
This also makes sure to properly handle the starting and terminating conditions. This is a bit tricky because, ideally, to know whether we can safely pipeline, we want to check whether any of the pipelined request would be made within k blocks from the tip. The tricky part is "blocks". During rollbacks, we have access to points on the chain which are only referencing slots yet, we need to compare block height! To solve this, I've adopted two strategies: pipeline and one-by-one, defaulting to the latter and only using the former when we have enough information to make a sound decision. This means that the client always starts by restoring a single block from which it can now figure out a blockheight and compare this with the node's tip. In case it's far enough, we go full speed, in steps; checking at each step whether it is still safe to proceed with the next step. The last `k` blocks are therefore fetched one-by-one, which is quite slow but, it's `k` blocks out of millions. We _could_ possibly make this last part even faster by using the same batching strategy as before (where we would basically check after each request whether the node has rolled back or not) but it comes with some added complexity I am not necessarily fond of introducing. This new approach already cuts the overall restoration time by 2.5! e.g. on Mainnet: ``` -- pipelined ----------------- - [2020-05-06 12:12:34.40 UTC] 87a2d760: syncProgress: still restoring (0.00%) - [2020-05-06 12:15:12.80 UTC] 87a2d760: syncProgress: still restoring (50.01%) -- one by one ---------------- - [2020-05-06 12:18:20.34 UTC] 87a2d760: syncProgress: still restoring (99.94%) - [2020-05-06 12:18:37.38 UTC] 87a2d760: syncProgress: restored ------------------------------ 6 minutes 3s / ~11366 block/s ```
Going from: ``` [cardano-wallet.network:Debug:16] [2020-05-07 09:44:29.79 UTC] Network node tip block height is 4129973 at hash 3cf1a60a [cardano-wallet.network:Info:17] [2020-05-07 09:31:03.66 UTC] TxParams for tip are: Fee policy: 155381.0 + 43.0x + 0.0y Tx max size: 8192 to: ``` [cardano-wallet.network:Debug:17] [2020-05-07 09:40:30.88 UTC] Network node tip is fe158920-[191.6466#4129961] [cardano-wallet.network:Info:17] [2020-05-07 09:40:30.88 UTC] TxParams for tip are: [Fee policy: 155381.0 + 43.0x + 0.0y, Tx max size: 8192] ``` a) avoid multi-line log output with akward structure for tx params b) Show tip in a consistent way with other loggers (for instance, in the wallet engine).
There are two sides to this commit: a) It fixes the asynchronous exception "catcher" in 'follow' that would fail to catch any exception that occurs while the thread is sleeping. I noticed that sometimes the logs wouldn't display a proper error. b) It also fixes the 'runQuery' combinator for our SQLite implementation to make sure that asynchronous exceptions are masked while executing sql queries. I falsy thought this was already properly handled by 'runSqlConn' from persistent but it seems to be source of most of our 'ErrorBusy' / 'statementAlreadyFinalized' issues. :finger_crossed:
5172dcd
to
feb1ecf
Compare
bors r+ |
1615: ChainSync Pipelined: moar speed. r=KtorZ a=KtorZ # Issue Number <!-- Put here a reference to the issue this PR relates to and which requirements it tackles --> ADP-270 # Overview <!-- Detail in a few bullet points the work accomplished in this PR --> - 441eaf1 📍 **replace the chain sync client with its pipelined version** No pipeline is done yet, this still uses the standard sequential way for sending requests, but now using the 'ChainSyncClientPipelined' which has extra constructor we can now use for pipelining - 178b0db 📍 **implementing pipelining, pipeline by batches of 1000 requests. It's fast.** This is still work in progress for the 'edge' cases aren't properly handled. Rollbacks are just ignored and, the area near the tip shouldn't be pipelined but is. Yet, this is a starter that gives some idea. I haven't tweaked it much, nor tried different settings but so far, I've restored a wallet in ~6:30 minutes on Mainnet, so more than 10k blocks per second. - 37cdeb5 📍 **only pipeline when it's safe** This also makes sure to properly handle the starting and terminating conditions. This is a bit tricky because, ideally, to know whether we can safely pipeline, we want to check whether any of the pipelined request would be made within k blocks from the tip. The tricky part is "blocks". During rollbacks, we have access to points on the chain which are only referencing slots yet, we need to compare block height! To solve this, I've adopted two strategies: pipeline and one-by-one, defaulting to the latter and only using the former when we have enough information to make a sound decision. This means that the client always starts by restoring a single block from which it can now figure out a blockheight and compare this with the node's tip. In case it's far enough, we go full speed, in steps; checking at each step whether it is still safe to proceed with the next step. The last `k` blocks are therefore fetched one-by-one, which is quite slow but, it's `k` blocks out of millions. We _could_ possibly make this last part even faster by using the same batching strategy as before (where we would basically check after each request whether the node has rolled back or not) but it comes with some added complexity I am not necessarily fond of introducing. This new approach already cuts the overall restoration time by 2.5! e.g. on Mainnet: ``` -- pipelined ----------------- - [2020-05-06 12:12:34.40 UTC] 87a2d760: syncProgress: still restoring (0.00%) - [2020-05-06 12:15:12.80 UTC] 87a2d760: syncProgress: still restoring (50.01%) -- one by one ---------------- - [2020-05-06 12:18:20.34 UTC] 87a2d760: syncProgress: still restoring (99.94%) - [2020-05-06 12:18:37.38 UTC] 87a2d760: syncProgress: restored ------------------------------ 6 minutes 3s / ~11366 block/s ``` - 485a324 📍 **Regenerate nix** - 62d4707 📍 **adjust log outputs for MsgNodeTip and MsgTxParameters** Going from: ``` [cardano-wallet.network:Debug:16] [2020-05-07 09:44:29.79 UTC] Network node tip block height is 4129973 at hash 3cf1a60a [cardano-wallet.network:Info:17] [2020-05-07 09:31:03.66 UTC] TxParams for tip are: Fee policy: 155381.0 + 43.0x + 0.0y Tx max size: 8192 ``` to: ``` [cardano-wallet.network:Debug:17] [2020-05-07 09:40:30.88 UTC] Network node tip is fe158920-[191.6466#4129961] [cardano-wallet.network:Info:17] [2020-05-07 09:40:30.88 UTC] TxParams for tip are: [Fee policy: 155381.0 + 43.0x + 0.0y, Tx max size: 8192] ``` a) avoid multi-line log output with akward structure for tx params b) Show tip in a consistent way with other loggers (for instance, in the wallet engine). - feb1ecf 📍 **fix asynchronous exception handling** There are two sides to this commit: a) It fixes the asynchronous exception "catcher" in 'follow' that would fail to catch any exception that occurs while the thread is sleeping. I noticed that sometimes the logs wouldn't display a proper error. b) It also fixes the 'runQuery' combinator for our SQLite implementation to make sure that asynchronous exceptions are masked while executing sql queries. I falsy thought this was already properly handled by 'runSqlConn' from persistent but it seems to be source of most of our 'ErrorBusy' / 'statementAlreadyFinalized' issues. :finger_crossed: # Comments <!-- Additional comments or screenshots to attach if any --> It's fast 🤷♂️ <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Once created, link this PR to its corresponding ticket ✓ Assign the PR to a corresponding milestone ✓ Acknowledge any changes required to the Wiki --> 1635: Prevent bors from merging into any branch except master r=KtorZ a=rvl Another try at #1632. 1639: Post v2020-05-06 updates r=piotr-iohk a=piotr-iohk # Issue Number https://github.com/input-output-hk/cardano-wallet/wiki/Release-Checklist#publication # Overview - 2ed11d1 Update README compat table - 0c2bab7 Update revisions for migration tests # Comments <!-- Additional comments or screenshots to attach if any --> <!-- Don't forget to: ✓ Self-review your changes to make sure nothing unexpected slipped through ✓ Assign yourself to the PR ✓ Assign one or several reviewer(s) ✓ Once created, link this PR to its corresponding ticket ✓ Assign the PR to a corresponding milestone ✓ Acknowledge any changes required to the Wiki --> Co-authored-by: KtorZ <[email protected]> Co-authored-by: IOHK <[email protected]> Co-authored-by: Rodney Lorrimar <[email protected]> Co-authored-by: Piotr Stachyra <[email protected]>
Hydra timing out again. |
Build failed (retrying...) |
This PR was included in a batch with a merge conflict, it will be automatically retried |
22 similar comments
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
This PR was included in a batch with a merge conflict, it will be automatically retried |
Issue Number
ADP-270
Overview
441eaf1
📍 replace the chain sync client with its pipelined version
No pipeline is done yet, this still uses the standard sequential way for sending requests,
but now using the 'ChainSyncClientPipelined' which has extra constructor we can now use
for pipelining
178b0db
📍 implementing pipelining, pipeline by batches of 1000 requests. It's fast.
This is still work in progress for the 'edge' cases aren't properly handled. Rollbacks are just ignored
and, the area near the tip shouldn't be pipelined but is. Yet, this is a starter that gives some idea.
I haven't tweaked it much, nor tried different settings but so far, I've restored a wallet in ~6:30 minutes
on Mainnet, so more than 10k blocks per second.
37cdeb5
📍 only pipeline when it's safe
This also makes sure to properly handle the starting and terminating
conditions. This is a bit tricky because, ideally, to know whether we
can safely pipeline, we want to check whether any of the pipelined
request would be made within k blocks from the tip. The tricky part is
"blocks". During rollbacks, we have access to points on the chain which
are only referencing slots yet, we need to compare block height!
To solve this, I've adopted two strategies: pipeline and one-by-one,
defaulting to the latter and only using the former when we have enough
information to make a sound decision. This means that the client always
starts by restoring a single block from which it can now figure out a
blockheight and compare this with the node's tip. In case it's far
enough, we go full speed, in steps; checking at each step whether it is
still safe to proceed with the next step.
The last
k
blocks are therefore fetched one-by-one, which is quiteslow but, it's
k
blocks out of millions. We could possibly make thislast part even faster by using the same batching strategy as before
(where we would basically check after each request whether the node has
rolled back or not) but it comes with some added complexity I am not
necessarily fond of introducing. This new approach already cuts the
overall restoration time by 2.5!
e.g. on Mainnet:
485a324
📍 Regenerate nix
62d4707
📍 adjust log outputs for MsgNodeTip and MsgTxParameters
Going from:
to:
a) avoid multi-line log output with akward structure for tx params
b) Show tip in a consistent way with other loggers (for instance, in the wallet engine).
feb1ecf
📍 fix asynchronous exception handling
There are two sides to this commit:
a) It fixes the asynchronous exception "catcher" in 'follow' that would
fail to catch any exception that occurs while the thread is sleeping. I
noticed that sometimes the logs wouldn't display a proper error.
b) It also fixes the 'runQuery' combinator for our SQLite implementation
to make sure that asynchronous exceptions are masked while executing
sql queries. I falsy thought this was already properly handled by
'runSqlConn' from persistent but it seems to be source of most of our
'ErrorBusy' / 'statementAlreadyFinalized' issues.
:finger_crossed:
Comments
It's fast 🤷♂️