-
Notifications
You must be signed in to change notification settings - Fork 631
[CBR-471] Fix Restoration Performance Regression #3786
Conversation
1a89260
to
96965af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, though I've not fully checked everything. Some suggestions below.
@@ -379,54 +381,50 @@ restoreWalletHistoryAsync wallet rootId prefilter progress start (tgtHash, tgtSl | |||
startingPoint <- case start of | |||
Nothing -> withNode $ getFirstGenesisBlockHash genesisHash | |||
Just sh -> nextHistoricalHash sh >>= maybe (throwM $ RestorationSuccessorNotFound sh) pure | |||
restore genesisHash startingPoint NoTimingData | |||
restore genesisHash startingPoint 1000 | |||
where | |||
wId :: WalletId | |||
wId = WalletIdHdRnd rootId | |||
|
|||
-- Process the restoration of the block with the given 'HeaderHash'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment would now be out of date right? If I followed correctly, it's restore starting from the block with the given hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 👍
lift (nextHistoricalHash currentHash) >>= \case | ||
Nothing -> throwM (RestorationFinishUnreachable tgtHash currentHash) | ||
Just hh' -> put (hh', False) | ||
lift $ getPrefilteredBlockOrThrow genesisHash currentHash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of a state monad here seems a bit odd to me. This is just keeping the currentHash
as the running value right? I think I'd just write this as a local recursive action with the currentHash
as a local arg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried both approach actually and found the StateMonad the most readable of the two. The "edge" case makes the recursive-function approach a bit ugly :/
tried running the Nix scipt myself and first off, got a few locale related errors:
but it did seem to sync:
but then during wallet restore I got:
|
96965af
to
257f610
Compare
@erikd Yes indeed, my little patch yesterday just made it worse 😅 ... Now fixed properly. |
257f610
to
6fdffdd
Compare
|
||
-- Flush to Acid-State & SQLite | ||
k <- getSecurityParameter (wallet ^. walletNode) | ||
let (blocks, txMetas, slotIds) = unzip3 (catMaybes updates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rewrite this code slightly; after discussing this code with @KtorZ , the catMaybes
here has a dual purpose: any EBBs will be Nothing
, and once we reach the final block halfway the batch, all the elements after that final block will also be Nothing
. I find the style a bit confusing, and would probably prefer an explicit recursion style with an accumulator. That said, I don't think it's wrong what's here; but I think it does at least need a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- I don't have anything major to add on top of what Edsko & Duncan already said. Nice work!
curl --silent --cacert ${stateDir}/tls/client/ca.crt --cert ${stateDir}/tls/client/client.pem -H "Content-Type: application/json" https://${wallet.walletListen}/api/v1/wallets \ | ||
-d '{ | ||
"operation": "restore", | ||
"backupPhrase": ["session", "ring", "phone", "arrange", "notice", "gap", "media", "olympic", "water", "road", "spider", "rate"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully people won't try to use this one for production wallets 😉
(I always feel like we should have some mechanism for preventing mainnet to accept any of the mnemonics we use in tests etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I thought the same things 😶 ... I hope it's hidden enough in the nix entrails that no one will dig this out... right ? 😨
@@ -379,54 +381,51 @@ restoreWalletHistoryAsync wallet rootId prefilter progress start (tgtHash, tgtSl | |||
startingPoint <- case start of | |||
Nothing -> withNode $ getFirstGenesisBlockHash genesisHash | |||
Just sh -> nextHistoricalHash sh >>= maybe (throwM $ RestorationSuccessorNotFound sh) pure | |||
restore genesisHash startingPoint NoTimingData | |||
restore genesisHash startingPoint 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor one -- might be nice to explicitly have a where
clause for this "magic number", to clarify its purpose.
getBatch genesisHash hh batchSize = go [] batchSize hh
where
go !updates !n !currentHash | n <= 0 =
return (updates, currentHash)
go !updates _ !currentHash | currentHash == tgtHash =
getPrefilteredBlockOrThrow genesisHash currentHash >>= \case
Nothing -> go updates 0 currentHash
Just u -> go (updates ++ [u]) 0 currentHash
go !updates !n !currentHash = do
nextHash <- nextHistoricalHash currentHash >>= \case
Nothing -> throwM (RestorationFinishUnreachable tgtHash currentHash)
Just hh' -> return hh'
getPrefilteredBlockOrThrow genesisHash currentHash >>= \case
Nothing -> go updates (n - 1) nextHash
Just u -> go (updates ++ [u]) (n - 1) nextHash Currently trying out this. I found a nice way to put this. A bit afraid about the boundary case so I'll go through a full restoration once again and see :) |
6fdffdd
to
9cb1a58
Compare
Addressed comments above; went for the algorithm described above with accumulators and recursion instead of the |
We use to perform a db transaction per block which turns out to slow down the process quite a lot. Instead, we can apply k (.e.g 1000) updates inside one transaction and move on to the next batch. There's a bit of fiddling to do to get things working, but it turns out to be pretty efficient. The main trick is located in the new `applyHistoricalBlocks` which applies update in sequence, stopping at the first error encountered.
We should have cought the performance regression earlier. We can't realistically perform restoration as part of our unit or main integration tests because of the time it takes. However, it is perfectly realistic to have it ran nightly, and monitor its output to be inside accepted boundaries.
9cb1a58
to
80c4b5b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Bash is bash.
The build timeouts are set in .buildkite/nightly.yml
.
Integration tests still failing for an unrelated reason:
Merging anyway 👍 |
…ration-perf-regression [CBR-471] Fix Restoration Performance Regression
…hk/KtorZ/CBR-471/fix-restoration-perf-regression [CBR-471] Fix Restoration Performance Regression
Description
I am still testing this (I had an issue with the very last block application causing the checkpoints to diverge in a non expected way). I believe it's now fixed but I have to wait full restoration.edit: Now tested, everything seems fine (see comment below)
We use to perform a db transaction per block which turns out to slow down the process quite a lot. Instead, we can apply k (.e.g 1000) updates inside one transaction and move on to the next batch. There's a bit of fiddling to do to get things working, but it turns out to be pretty efficient (~600 block/s on my setup, vs 80 block/s without the patch).
The main trick is located in the new
applyHistoricalBlocks
which applies update in sequence, stopping at the first error encountered.Besides, we should have cought the performance regression earlier. We can't realistically perform restoration as part of our unit or main integration tests because of the time it takes. However, it is perfectly realistic to have it ran nightly, and monitor its output to be inside accepted boundaries.
Linked issue
[CBR-471]
Type of change
Developer checklist
Testing checklist
QA Steps
... then go for a long coffee ☕
Screenshots (if available)
Note: There was a typo in the script when I ran it causing it not to terminate properly once synced. I've verified the log file and no errors related to sync (some lookup network failures but nothing unusual).
Plus, checking the wallet manually at the end: