feat(sequencer): add ttl and invalid cache to app mempool by lobstergrindset · Pull Request #1138 · astriaorg/astria

lobstergrindset · 2024-06-03T13:41:34Z

Summary

This PR adds to the App's mempool a way to signal to CometBFT when transactions should be removed from the CometBFT mempool.

Background

The CometBFT and App mempools currently get transactions that fail to execute stuck in them. This is because when a transaction fails to execute in prepare_proposal(), it doesn't get removed from the CometBFT mempool. CometBFT only clears out transactions if they either fail handleCheckTx() or are included in a block. Because of this, CometBFT will re-add the failed transaction to the App mempool during its handleCheckTx() maintenance, which will cause it to be fed to prepare_proposal() again.

We also need a way for full nodes to clear out these invalid transactions. Since these nodes don't run prepare_proposal(), we need an additional way, like a tx TTL, to signal when transactions can be dropped.

Changes

Added a transaction cache to the App's mempool which signals to CometBFT when a transaction should be removed.
Tracks in the App's mempool when a transaction is first seen.
Transactions that fail in prepare_proposal() get added to the App's removal cache.
Transactions that are older than 10 minutes are added to the App's removal cache.

Testing

Unit tests and local testing of invalid transactions.

Metrics

Added CHECK_TX_REMOVED_FAILED_EXECUTION counter to the sequencer's metrics.
Added CHECK_TX_REMOVED_EXPIRED counter to the sequencer's metrics.

I can see the advantage of the improved UX of allowing the client to choose the expiry time, but OTOH I agree that if we support that, it should also be signed over... and that could be worse UX since resubmitting would require re-signing by all parties (I know we only have one signatory right now, but I guess that could change?)

We'd probably also need to reject txs which had expiry times too far in the future, since they'd effectively never expire.

okay right, since this doesn't actually add it to the tx type, this is fine. in the future can add it to the actual tx!

Fraser999 · 2024-06-06T16:01:28Z

crates/astria-sequencer/src/app/mod.rs

                        txs_to_readd_to_mempool.push((enqueued_tx, priority));
+                    } else {
+                        // the transaction should be removed from the mempool
+                        self.mempool.track_invalid(enqueued_tx.tx_hash()).await;


Would there be any benefit to including the error (e.g. as format!("{e:#}")) in the InvalidCache along with the hash so it can be provided in the CheckTx response's log?

crates/astria-sequencer/src/mempool.rs

Fraser999 · 2024-06-06T20:59:09Z

crates/astria-sequencer/src/mempool.rs

    },
 };
 use priority_queue::PriorityQueue;
+use tendermint::Time;


I strongly think we should use tokio::time::{Instant, Duration} instead of this and replace our usage of bare integers with Instant and Duration accordingly.

They're aliases for the std::time equivalents, and with that we get the benefit of a guaranteed monotonic clock. (I'm not sure if tendermint::Time guarantees that or not). But we also get type safety (we can't accidentally mix up an instant with a duration) and in tests we can pause and advance time, meaning we can avoid sleeping in tests.

crates/astria-sequencer/src/mempool.rs

Fraser999 · 2024-06-06T21:20:25Z

crates/astria-sequencer/src/mempool.rs

+            ttl: Time::now()
+                .unix_timestamp()
+                .checked_add(ttl)
+                .expect("overflow in enqueued transaction ttl"),


I can see the advantage of the improved UX of allowing the client to choose the expiry time, but OTOH I agree that if we support that, it should also be signed over... and that could be worse UX since resubmitting would require re-signing by all parties (I know we only have one signatory right now, but I guess that could change?)

We'd probably also need to reject txs which had expiry times too far in the future, since they'd effectively never expire.

crates/astria-sequencer/src/mempool.rs

crates/astria-sequencer/src/service/mempool.rs

noot · 2024-06-07T16:23:32Z

crates/astria-sequencer/src/mempool.rs

+    remove_queue: VecDeque<[u8; 32]>,
+    max_size: usize,
+    time_to_live: i64,
+    size: usize,


is this the current size of the cache? could you add a comment for this?

noot · 2024-06-07T16:25:08Z

crates/astria-sequencer/src/mempool.rs

+                <= self.time_added[&tx_hash]
+                    .checked_add(self.time_to_live)
+                    .expect("overflowed ttl add"))
+    }


it might be nicer to store expiry_time instead of time_added when a tx is inserted, that way you don't need to do the add + expect here each time

or maybe have separate functions for is_cached and is_expired?

noot · 2024-06-07T16:26:20Z

crates/astria-sequencer/src/mempool.rs

+            self.cache.remove(&removed_tx);
+            self.time_added.remove(&removed_tx);
+        } else {
+            self.size = self.size.checked_add(1).expect("cache size overflowed");


do we need a self.size variable over just checking self.remove_queue.len() or len of the hashmap?

noot · 2024-06-07T16:27:53Z

crates/astria-sequencer/src/mempool.rs

+        }
+    }
+
+    fn cached(&self, tx_hash: [u8; 32]) -> bool {


this funtion returns true if the tx is invalidated but not expired right? could you add a comment for that or maybe change the function name to cached_and_not_expired for clarity?

is this the right behaviour? based on the logic for adding txs to the invalid cache and when to remove them from cometbft, should this return true if the tx is expired?

the comment was ambiguous, it was returning true if it was expired in the app's mempool but not expired in the cache. refactored to make it more clear

noot · 2024-06-07T16:29:15Z

crates/astria-sequencer/src/mempool.rs

+    queue: Arc<RwLock<MempoolQueue>>,
+    set: Arc<RwLock<HashSet<[u8; 32]>>>,
+    invalid_cache: Arc<RwLock<InvalidCache>>,
+    tx_ttl: i64,


do we need to store this if it's already a constant? it appears to always be set to the same value

noot · 2024-06-07T16:45:31Z

crates/astria-sequencer/src/mempool.rs

+            self.set.write().await.remove(&tx.tx_hash);
+            Some((tx, priority))
+        } else {
+            tx


Suggested change

tx

None

a bit more explicit

lobstergrindset · 2024-06-11T14:27:59Z

Changes made since last reviews:

switched time accounting to tokio::time::{Instant, Duration}
renamed InvalidCache to RemovalCache and renamed Mempool’s use of it to comet_bft_removal_cache to try to better document what the cache is and how it is being used
simplified the RemovalCache by removing the redundant HashSet and size struct members
removed the HashSet added to the mempool in the last write
added tracking to RemovalCache as to why the transaction was added and surface error in CheckTx to user
added AbciErrorCodes and metrics for expired transactions and failing to execute transactions

noot · 2024-06-11T15:00:42Z

crates/astria-sequencer/Cargo.toml

 tendermint-proto = { workspace = true }
 tendermint = { workspace = true }
-tokio = { workspace = true, features = ["rt", "tracing"] }
+tokio = { workspace = true, features = ["rt", "tracing", "test-util"] }


if the test-util feature is just for tests, move it under dev-dependencies

noot · 2024-06-11T15:03:32Z

crates/astria-sequencer/src/mempool.rs

+}
+
+const TX_TTL: Duration = Duration::from_secs(600); // 10 minutes 
+const REMOVAL_CACHE_SIZE: NonZeroUsize = unsafe { NonZeroUsize::new_unchecked(4096) }; // TODO make configuration variable 


why not just use usize?

Fraser recommended this instead I'm guessing for configuration safety purposes, but I can change back to just a usize

noot · 2024-06-11T15:04:43Z

crates/astria-sequencer/src/mempool.rs

+/// `RemovalCache` is a cache used for signaling to `CometBFT` when a
+/// transaction can be removed from the `CometBFT` mempool outside of the
+/// `CheckTx` checks.


this is checked in check_tx right? there isn't a way to tell cometbft to remove something from the mempool without cometbft calling check_tx on it

My b, yeah the comment should read 'other CheckTX checks'. Or, I'll just redo the comment

noot · 2024-06-11T15:06:46Z

crates/astria-sequencer/src/mempool.rs

+/// that failed to execute or to re-add a transaction that has expired.
+#[derive(Clone)]
+pub(crate) struct RemovalCache {
+    cache: HashMap<[u8; 32], (Instant, Arc<anyhow::Error>)>,


for the error, the only two options right now are expired and failed execution right? i'd prefer to change the Arc<anyhow::Error> to an enum, that would be clearer as to which removal reasons exist

I was trying to bubble up the reason for failure to the user, but it gets stuck at 'failed execution' or 'failed stateful checks'. I can make it an enum if you don't think that the CheckTx failure case should be more specific than 'failed execution'

noot · 2024-06-11T15:16:35Z

crates/astria-sequencer/src/mempool.rs

+    /// the reason why it was cached. None is returned if the transaction
+    /// is not cached


Suggested change

/// the reason why it was cached. None is returned if the transaction

/// is not cached

/// the reason why it was cached. None is returned if the transaction

/// is not cached or is not expired

I was considering a tx being 'cached' as being both not expired and in the cache, but I can change

sorry i got confused about the tx removal ttl vs the tx in mempool ttl. this makes sense!

noot · 2024-06-11T15:18:49Z

crates/astria-sequencer/src/mempool.rs

    /// inserts all the given transactions into the mempool
    pub(crate) async fn insert_all(&self, txs: Vec<(EnqueuedTransaction, TransactionPriority)>) {
-        self.inner.write().await.extend(txs);
+        self.queue.write().await.extend(txs);


this isn't the same behaviour as insert as it doesn't preserve the timestamp if it already exists

You're right. I was just relying on the behavior that we're currently only using this function to re-insert transactions that we just popped. Do you have a preference of me adding a comment that explains that or to do a similar process as in insert()?

i think doing a similar process as to insert() makes the most sense!

lobstergrindset · 2024-06-13T12:40:38Z

Logic changes since last review:

made insert_all() preserve timestamp like insert()
made RemovalCache errors into enum

lobstergrindset · 2024-06-14T12:52:23Z

crates/astria-sequencer/src/mempool.rs

 }

+/// inserts or updates the transaction in a timestamp preserving manner
+fn update_or_insert(


@Fraser999 is this an okay place to put a function like this? The rust CLI suggested I move it out of the Mempool struct's implementation because it wasn't using the self variable

Absolutely. Another choice would be to leave it inside the impl Mempool but just don't have self as a parameter. You have to call it like Self::update_or_insert in that case. It can make sense to do that in cases like this where you have more than one struct in scope which could use this function, but it's really only intended to be used by the Mempool struct. But for a private function like this, it's pretty much down to personal preference :)

You could simplify the signature though by just taking queue: &mut PriorityQueue and at callsites passing &mut *self.queue.write().await. Definitely not a big deal though :)

crates/astria-sequencer/src/mempool.rs

Fraser999 · 2024-06-14T13:23:31Z

crates/astria-sequencer/src/mempool.rs

 }

+/// inserts or updates the transaction in a timestamp preserving manner
+fn update_or_insert(


Absolutely. Another choice would be to leave it inside the impl Mempool but just don't have self as a parameter. You have to call it like Self::update_or_insert in that case. It can make sense to do that in cases like this where you have more than one struct in scope which could use this function, but it's really only intended to be used by the Mempool struct. But for a private function like this, it's pretty much down to personal preference :)

You could simplify the signature though by just taking queue: &mut PriorityQueue and at callsites passing &mut *self.queue.write().await. Definitely not a big deal though :)

crates/astria-sequencer/src/mempool.rs

lobstergrindset · 2024-06-17T08:21:26Z

Changes since last reviews:

Removed TTL from RemovalCache.
Moved time_first_seen out of the EnqueuedTransaction struct into the TransactionPriority struct.

* main: chore(bridge-withdrawer): add missing errors and clean up names (#1178) feat(sequencer): add ttl and invalid cache to app mempool (#1138) chore(astria-merkle): add benchmarks (#1179) chore(sequencer-relayer): add timeout to gRPCs to Celestia app (#1191) refactor(core): parse ics20 denoms as ibc or trace prefixed variants (#1181) Mycodecrafting/sequencer seed node (#1188) chore: register all metrics during startup (#1144) feat(charts): option to purge geth mempool (#1182)

github-actions bot added the sequencer pertaining to the astria-sequencer crate label Jun 3, 2024

lobstergrindset force-pushed the lilyjjo/sequencer_app_mempool_cache branch from 0aa41f2 to be74562 Compare June 4, 2024 06:55

lobstergrindset marked this pull request as ready for review June 4, 2024 06:55

lobstergrindset requested a review from a team as a code owner June 4, 2024 06:55

lobstergrindset requested review from Fraser999 and noot June 4, 2024 06:55

noot reviewed Jun 5, 2024

View reviewed changes

Fraser999 reviewed Jun 6, 2024

View reviewed changes

noot reviewed Jun 7, 2024

View reviewed changes

lobstergrindset requested review from Fraser999 and noot June 11, 2024 14:25

noot reviewed Jun 11, 2024

View reviewed changes

lobstergrindset requested a review from noot June 13, 2024 12:40

noot approved these changes Jun 13, 2024

View reviewed changes

lobstergrindset commented Jun 14, 2024

View reviewed changes

Fraser999 reviewed Jun 14, 2024

View reviewed changes

lobstergrindset requested review from Fraser999 and noot June 17, 2024 08:20

Fraser999 approved these changes Jun 18, 2024

View reviewed changes

feat(sequencer): add ttl and removal cache to app/cometbft mempools

b6e3fc7

lobstergrindset force-pushed the lilyjjo/sequencer_app_mempool_cache branch from 055b702 to b6e3fc7 Compare June 18, 2024 16:49

lobstergrindset added this pull request to the merge queue Jun 18, 2024

Merged via the queue into main with commit b6c625c Jun 18, 2024

lobstergrindset deleted the lilyjjo/sequencer_app_mempool_cache branch June 18, 2024 17:10

noot mentioned this pull request Oct 3, 2024

sequencer: ensure failed txs are removed from the mempool #715

Closed

		/// the reason why it was cached. None is returned if the transaction
		/// is not cached

Conversation

lobstergrindset commented Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Testing

Metrics

Related

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

noot Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lobstergrindset commented Jun 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lobstergrindset Jun 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lobstergrindset commented Jun 3, 2024 •

edited

Loading

noot Jun 7, 2024 •

edited

Loading

lobstergrindset Jun 11, 2024 •

edited

Loading

Fraser999 Jun 14, 2024 •

edited

Loading

Fraser999 Jun 14, 2024 •

edited

Loading