Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase reorg cache timer limit #3489

Open
deevope opened this issue Nov 10, 2020 · 10 comments
Open

Increase reorg cache timer limit #3489

deevope opened this issue Nov 10, 2020 · 10 comments

Comments

@deevope
Copy link
Contributor

deevope commented Nov 10, 2020

We should increase for every node the reorg_cache timer limit which is set by default to 30 minutes if it affects the mempool, so in case of future large reorg, nodes will rebroadcast automatically all unconfirmed transactions which were confirmed in the stale blocks.

If it only affects our local txs pool, we should consider to add the mempool in the reorg cache for the reason mentioned above.

@deevope deevope changed the title Increase "reorg cache" timer limit? Increase reorg cache timer limit Nov 10, 2020
@antiochp
Copy link
Member

If it only affects our local txs pool, we should consider to add the mempool in the reorg cache for the reason mentioned above.

I don't understand what you mean here by "add the mempool in the reorg cache". Can you clarify?

@deevope
Copy link
Contributor Author

deevope commented Nov 11, 2020

I don't understand what you mean here by "add the mempool in the reorg cache". Can you clarify?

Sorry if this was unclear. I meant to add each transaction from the mempool in the reorg cache as they become confirmed on the blockchain. We will "age them out" after x minutes (x : timer limit). And in case of large reorg with empty block, nodes will rebroadcast them automatically if the cached txs were not contained in the reorg.

@antiochp
Copy link
Member

I meant to add each transaction from the mempool in the reorg cache as they become confirmed on the blockchain.

We actually add txs to the reorg cache at the same time as we add them to the mempool.
If a tx is received and accepted by the mempool, it goes in the reorg cache as well. Its effectively just a record of every tx that we successfully added to the mempool (capped at rolling recent 30 mins currently).
So there should be no entries in the mempool that are not also in the reorg cache.

A fork (or competing forks) will cause the mempool to be reconciled (every tx in the mempool is valid w.r.t current chain state). But txs remain in the reorg cache, independent of current chain state. This is by design, invalid txs may become valid, precisely so we can handle reorg like this.


nodes will rebroadcast them automatically

We should check the current impl. I'm not entirely confident this happens today.
We definitely use the reorg cache to repopulate the local mempool in a reorg scenario. I'm just not sure if the node also rebroadcasts these re-accepted txs.

Edit: Just checked and it does look like we simply add txs back to the mempool directly when reconciling the reorg cache. And this does not cause the node to (re)broadcast these txs to peers.

This is probably a gap in our logic. For every tx in the reorg cache that is successfully added back to the mempool during reconciliation should also be (re)broadcast out to our peers. We should treat these txs as "new" in this scenario.

@antiochp
Copy link
Member

antiochp commented Nov 11, 2020

A 24 hour period had been suggested in keybase for the reorg cache retention period (currently 30 minutes).
Do we have agreement that this is a reasonable value to use?
Bearing in mind we would then need to store (in memory) 24 hours worth of transactions to support this.

In a reorg scenario we would also need to then iterate over this full 24 period of txs as part of the reorg cache reconciliation process. Each tx needs to be reprocessed, attempting to add it back to the mempool.
This could potentially be expensive for 24 hours worth of txs.

@phyro
Copy link
Member

phyro commented Nov 11, 2020

I'm not sure we should set it to 24 hours by default. While it may seem fine now, it is likely because we have mostly empty blocks. A normal full block is 1.4MB so we have 1440 blocks per day which is 1440 * 1.4 = 2016 MB that is potentially just the reorg cache we would need to carry around. I suggest a lower limit and we make it configurable. If we rebroadcast them, then a few nodes having a high limit might be better in order to avoid possible network spam of GB of data?

Edit: actually, I think I was wrong above and it would be GBs of data of network traffic in any case

Note: We need to be careful of full blocks case or when the reorg has more txs than fit in a block - full blocks is the simplest to think about. There's another parameter https://github.com/mimblewimble/grin/blob/master/pool/src/transaction_pool.rs#L106 which regulates the size of the tx pool. If we receive 2 GBs of txs, can we even fit them in our mempool? These two parameters seem related

@deevope
Copy link
Contributor Author

deevope commented Nov 11, 2020

A 24 hour period had been suggested in keybase for the reorg cache retention period (currently 30 minutes).
Do we have agreement that this is a reasonable value to use?

In a reorg scenario we would also need to then iterate over this full 24 period of txs as part of the reorg cache reconciliation process. Each tx needs to be reprocessed, attempting to add it back to the mempool.
This could potentially be expensive for 24 hours worth of txs.

Then we should just increase it to 4-5 hours by default. Some Grin users will drastically increase the timer themselves. And nodes will (re)broadcast the transaction automatically once there's a fix for the txs when reorg cache is reconciled with the mempool . I believe also, as @phyro mentioned, we should also increase the default mempool size, which is set currently to 50 000 txs (~59 full blocks) to prevent especially reorg on stale full blocks in the future.

@antiochp
Copy link
Member

antiochp commented Nov 23, 2020

One thing to consider here is who should be responsible for rebroadcasting txs.
The "reorg_cache" was the simplest solution to the problem and took the position of "everyone is responsible for their local pool".

Common situation: A tx is relayed across majority of the network before being included in a block. On reorg, this tx is present in majority of reorg caches and will be replayed into majority of mempools (no rebroadcast required).
Assuming the tx exists in some number of mining pool reorg caches then it will be replayed into a mempool and included in a subsequent block.

This starts to change if we introduce a rebroadcast step - now the "reorg_cache" is being used to repopulate other pools. And in fact we now have a slightly undesirable situation where every node is now trying to repopulate every pool with everybody rebroadcasting every cached tx. This is likely to result in a lot of redundant network traffic.

Tx participants may want to be responsible for any rebroadcast. If I sent funds a few hours ago, maybe my wallet or my node is able to monitor for reorg and proactively rebroadcast my tx to ensure it up back in the wider mempool.
As a third-party I'm happy to relay txs that I receive, but maybe I don't want to necessarily maintain a massive reorg cache.

@deevope
Copy link
Contributor Author

deevope commented Nov 23, 2020

And in fact we now have a slightly undesirable situation where every node is now trying to repopulate every pool with everybody rebroadcasting every cached tx. This is likely to result in a lot of redundant network traffic.

Right, then I believe we should make the broadcast of transactions issued from reorg cache configurable, and set by default to false. So just nodes that would like to maintain a massive reorg will broadcast transactions again. There should be only few people that might want to enable it (particularly those who drastically increased their reorg cache period). So the situation would come back to few node is now trying to repopulate pool peers.

Edit : As you said everyone is responsible for their local pool and consequently we should just allow miners to increase reorg cache period in the config file if they want it.

@deevope
Copy link
Contributor Author

deevope commented Nov 25, 2020

If we want to still investigate, I propose 2 strategies to re-broadcast transactions after a reorg, for both of them we'll have the option set to false by default in order to minimize network traffic, and have the following rules :

  • reorg depth > X blocks
  • transaction sourced from reorg cache
  • txs more aged than the default reorg cache period

1st strategy :

  • txs from reorg cache will be re-broadcasted directly after the reorg:

    • pros :

      • we are almost sure that every transactions older than the default reorg cache period in stale blocks, will be confirmed after the reorg
    • cons :

      • will need useless network traffic as we don't know what the mining pool reorg cache period

2nd strategy :

  • In a time lapse of Y blocks (or minutes) after the reorg (as we want to let the chance to mining pools), we check if txs in cache have been mined (in a optimized way), and broadcast txs from the index where it was not mined and so on :

    • pros :

      • only essential network traffic
    • cons :

      • allow in a short time lapse to let people double spend if any mining pools don't have these txs in their cache
      • might be expensive for node owner

@phyro
Copy link
Member

phyro commented Jan 7, 2022

@deevope this one can be closed now, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants