Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many intermediate checkpoints created in one go #2035

Closed
KtorZ opened this issue Aug 13, 2020 · 3 comments
Closed

Too many intermediate checkpoints created in one go #2035

KtorZ opened this issue Aug 13, 2020 · 3 comments
Assignees
Labels
Bug PRIORITY:MEDIUM A bug that needs to be addressed after ongoing stories and tasks. SEVERITY:LOW Small defects which does not prevent any crucial functionality to work.

Comments

@KtorZ
Copy link
Member

KtorZ commented Aug 13, 2020

Context

Information -
Version v2020-08-03
Platform All
Installation

Steps to Reproduce

  1. Restore a wallet
  2. Wait until the wallet cross the hard-fork (or a protocol update)
  3. Observe checkpoints in the database (e.g. SELECT DISTINCT(slot) FROM checkpoints)

Expected behavior

  • The number of checkpoints is kept small, and the wallet creates them gradually.

Actual behavior

  • The number of checkpoints kept in the database depends on the k parameter. The larger k, the more checkpoints we store. In the Shelley era on Mainnet however, we can still see over 30+ checkpoints.

  • When reaching the area close to the tip of the chain, the wallet may create many checkpoints in a single database transaction. For large wallets, this would mean making the wallet inoperative for a long time. Possibly causing it to require another long time of creation of many small checkpoints.


Resolution


QA

  • Changed the strategy to compute and maintain checkpoints such that no more than 2 checkpoints (and most of the time, only one) would be created by each restoration step. Some parameters of the sparseCheckpoint function have also been made configurable instead of being hard-coded, allowing for tweaking them dynamically.

  • With that, we added two interesting properties:

    1. A property which shows that the size of the edgeSize parameter only influence the short-term checkpoints, and that long-term checkpoints are always the same regardless of the value of that parameter. This is useful to show because we now calculate acceptable in two places with different parameters: In the wallet restoration loop, where we set the edge size to 0, and in the database pruning function where we set it to 5. We do this to avoid the restoration loop to pause for too long to create the short term checkpoints. Yet, since we always store the last checkpoint of a batch of block, when the wallet reaches the tip, it'll start filling the short-term checkpoints automatically (since they won't be pruned by the database). Eventually, long-term and short-term checkpoints will be stored in the database.

    https://github.com/input-output-hk/cardano-wallet/blob/06b34e48408596ef49b7b183f0674a432c2a317f/lib/core/test/unit/Cardano/Wallet/DB/Properties.hs#L924-L937

    1. A more complicated property which mimics the song-and-dance done between the restoration loop and the database pruning. The property shows that the strategy we employ works and ensures that at any time, the wallet won't roll back more than k + gapSize blocks, where gapSize is a setting calculated from k and smaller than k. Knowing that k is about 108 in Shelley, and a hundred blocks are pretty must instantaneous to restore for the wallet, this seems like a fair compromise.

    https://github.com/input-output-hk/cardano-wallet/blob/06b34e48408596ef49b7b183f0674a432c2a317f/lib/core/test/unit/Cardano/Wallet/DB/Properties.hs#L939-L1028

@KtorZ KtorZ added Bug PRIORITY:MEDIUM A bug that needs to be addressed after ongoing stories and tasks. SEVERITY:LOW Small defects which does not prevent any crucial functionality to work. labels Aug 13, 2020
@Anviking
Copy link
Member

Anviking commented Sep 1, 2020

Remove protocol parameters from checkpoints altogether

I was about to in #1872, but had migration problems.

The security parameter k should be very static across all eras though...?

@KtorZ
Copy link
Member Author

KtorZ commented Sep 1, 2020

@Anviking yes indeed.

@hasufell hasufell assigned hasufell and unassigned hasufell Sep 2, 2020
@KtorZ KtorZ self-assigned this Sep 9, 2020
iohk-bors bot added a commit that referenced this issue Sep 16, 2020
2143: remove storage of intermediate checkpoints in restore blocks r=KtorZ a=KtorZ

# Issue Number

<!-- Put here a reference to the issue this PR relates to and which requirements it tackles -->

#2035

# Overview

<!-- Detail in a few bullet points the work accomplished in this PR -->

  This was a perhaps too-early optimization trying to reduce the time of
  rolling back. The `sparseCheckpoint` function return an empty list for
  pretty much the entire restoration, except when reaching the last k
  blocks where it'll return a list of checkpoints to save, sparse for
  older blocks and dense near the tip. With the current parameters,
  blocks are kept if:

  - Their blockheight is not older than (tip - k) or it is 0
  - And their blockheight is a multiple of 100 or, they are near within
    10 blocks from the last known block.

  We currently do this calculation and filtering in two places:

  1. In `restoreBlocks`, to pre-filter checkpoints to store in the database
  2. In `prune` from the wallet DBLayer, to garbage collect old checkpoints

  Yet, what (1) buys us is a very little gain on standard wallet, and a huge
  performance cost on large wallets. So let's analyze the two
  cases:

  A/ Small Wallets

  - The time to create a checkpoint is very small in front of the slot
    length.

  - Restoring blocks is fast, (up to 10K blocks per seconds on empty
    wallets).

  Therefore, rolling back of 2, 3 blocks or, 100 makes pretty much no
  difference. Being able to store more checkpoints near the tip adds
  very little benefits in terms of performances especially, for the
  first restoration.

  B/ Large Wallets

  - The time to create a checkpoint is important in front of the slot
    length (we've seen up to 4s).

  - Restoring blocks is still quite fast (the time needed for processing
    blocks remains quite small in front of the time needed to read and create
    new checkpoints).

  The main problem with large wallets occur when the wallet is almost
  synced and reaches the 10 last blocks of the chain. By trying to store
  intermediate checkpoints, not only does the wallet spent 10* more time
  in `restoreBlocks` than normally, but it also keep the database lock
  for all that duration. Consider the case where the wallet takes 4s to
  read, and 4s to create a checkpoint, plus some additional 2s to prune
  them (these are actual data from large exchanges), by default, 10s is
  spent for creating one checkpoint. And at least 40 more to create the
  intermediate ones. During this time, between 1 and 3 checkpoints have
  been created. So it already needs to prune out the last one it spends
  12s to create and needs already to create new checkpoints right away.

  As a consequence, a lot of other functionalities are made needlessly
  slower than they could be, because for the whole duration of the
  `restoreBlocks` function, the wallet is holding the database lock.

  Now, what happen if we avoid storing the "intermediate" checkpoints in
  restore blocks: blocks near the tip will eventually get stored, but
  one by one. So, when we _just_ reach the top for the first time, we'll
  only store the last checkpoint. But then, the next 9 checkpoints
  created won't be pruned out. So, the worse that can happen is that the
  wallet is asked to rollback right after we've reached the tip and
  haven't created many checkpoints yet. Still, We would have at least
  two checkpoints in the past that are at most 2K blocks from the tip
  (because we fetch blocks by batches of 1000). So it's important that
  the batch size remains smaller than `k` so that we can be sure that
  there's always one checkpoint in the database.


# Comments

<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: KtorZ <[email protected]>
iohk-bors bot added a commit that referenced this issue Sep 18, 2020
2143: remove storage of intermediate checkpoints in restore blocks r=KtorZ a=KtorZ

# Issue Number

<!-- Put here a reference to the issue this PR relates to and which requirements it tackles -->

#2035

# Overview

<!-- Detail in a few bullet points the work accomplished in this PR -->

  This was a perhaps too-early optimization trying to reduce the time of
  rolling back. The `sparseCheckpoint` function return an empty list for
  pretty much the entire restoration, except when reaching the last k
  blocks where it'll return a list of checkpoints to save, sparse for
  older blocks and dense near the tip. With the current parameters,
  blocks are kept if:

  - Their blockheight is not older than (tip - k) or it is 0
  - And their blockheight is a multiple of 100 or, they are near within
    10 blocks from the last known block.

  We currently do this calculation and filtering in two places:

  1. In `restoreBlocks`, to pre-filter checkpoints to store in the database
  2. In `prune` from the wallet DBLayer, to garbage collect old checkpoints

  Yet, what (1) buys us is a very little gain on standard wallet, and a huge
  performance cost on large wallets. So let's analyze the two
  cases:

  A/ Small Wallets

  - The time to create a checkpoint is very small in front of the slot
    length.

  - Restoring blocks is fast, (up to 10K blocks per seconds on empty
    wallets).

  Therefore, rolling back of 2, 3 blocks or, 100 makes pretty much no
  difference. Being able to store more checkpoints near the tip adds
  very little benefits in terms of performances especially, for the
  first restoration.

  B/ Large Wallets

  - The time to create a checkpoint is important in front of the slot
    length (we've seen up to 4s).

  - Restoring blocks is still quite fast (the time needed for processing
    blocks remains quite small in front of the time needed to read and create
    new checkpoints).

  The main problem with large wallets occur when the wallet is almost
  synced and reaches the 10 last blocks of the chain. By trying to store
  intermediate checkpoints, not only does the wallet spent 10* more time
  in `restoreBlocks` than normally, but it also keep the database lock
  for all that duration. Consider the case where the wallet takes 4s to
  read, and 4s to create a checkpoint, plus some additional 2s to prune
  them (these are actual data from large exchanges), by default, 10s is
  spent for creating one checkpoint. And at least 40 more to create the
  intermediate ones. During this time, between 1 and 3 checkpoints have
  been created. So it already needs to prune out the last one it spends
  12s to create and needs already to create new checkpoints right away.

  As a consequence, a lot of other functionalities are made needlessly
  slower than they could be, because for the whole duration of the
  `restoreBlocks` function, the wallet is holding the database lock.

  Now, what happen if we avoid storing the "intermediate" checkpoints in
  restore blocks: blocks near the tip will eventually get stored, but
  one by one. So, when we _just_ reach the top for the first time, we'll
  only store the last checkpoint. But then, the next 9 checkpoints
  created won't be pruned out. So, the worse that can happen is that the
  wallet is asked to rollback right after we've reached the tip and
  haven't created many checkpoints yet. Still, We would have at least
  two checkpoints in the past that are at most 2K blocks from the tip
  (because we fetch blocks by batches of 1000). So it's important that
  the batch size remains smaller than `k` so that we can be sure that
  there's always one checkpoint in the database.


# Comments

<!-- Additional comments or screenshots to attach if any -->

<!-- 
Don't forget to:

 ✓ Self-review your changes to make sure nothing unexpected slipped through
 ✓ Assign yourself to the PR
 ✓ Assign one or several reviewer(s)
 ✓ Once created, link this PR to its corresponding ticket
 ✓ Assign the PR to a corresponding milestone
 ✓ Acknowledge any changes required to the Wiki
-->


Co-authored-by: KtorZ <[email protected]>
@KtorZ KtorZ changed the title Protocol parameters in checkpoint are never updated Too many intermediate checkpoints created in one go Sep 21, 2020
@piotr-iohk
Copy link
Contributor

lgtm.

I tested few wallets on testnet against v2020-09-11 and master. There are less checkpoints being created for master wallet.
E.g.

  • empty wallet on v2020-09-11 has 35 checkpoints after restoration.
  • empty wallet on master has 7 checkpoints after restoration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug PRIORITY:MEDIUM A bug that needs to be addressed after ongoing stories and tasks. SEVERITY:LOW Small defects which does not prevent any crucial functionality to work.
Projects
None yet
Development

No branches or pull requests

4 participants