Incremental solvable orders cache update #2923

squadgazzz · 2024-08-23T09:38:28Z

Description

Updating solvable orders (ie creating a new auction) currently takes >2s with some pretty heavy outliers (logs)

This makes it hard to bring CoW protocol's auction rate down to one batch per block as simply creating up to date state would take >15% of the total time we have at hand. We should at least be able to half this time (if not getting it down even more)

In order to relieve the situation, it was proposed to introduce incremental solvable orders cache update, which selects all the solvable orders using the old heavy query only at startup, stores the latest received order's creation timestamp in memory, and then makes much faster incremental bounded queries to the orders and additional tables that select fewer data and executes faster.

Changes

Since incremental fetching retrieves orders created/cancelled after the specific timestamps, it is also required now to fetch orders that have any onchain update based on the last fetched block number. Having said that, the data needs to be fetched within a single TX, so there is no way to run all the queries in parallel.

If the current solvable orders cache is empty, execute the original heavy SQL query to fetch all current solvable orders and store them in memory.
Otherwise, fetch full orders that created or cancelled after the last stored timestamp and also find UIDs of the order that have any onchain data updated after the latest observed block number. This includes fetching data from the following tables: trades, ethflow_data, order_execution, invalidations, onchain_order_invalidations, onchain_placed_orders, presignature_events.
Fetch quotes for all the collected orders.
Add all the newly received orders to the cache.
Filter out all the orders that are one of: contain on-chain errors, expired, fulfilled, invalidated.
Calculate the latest observed order creation timestamp.
Continue with the regular auction creation process.

As a result, we now have 3 SQL queries where each executes in ~50ms instead of a single one taking ~2s.

How to test

New DB tests. Existing e2e tests.

Related Issues

Fixes #2831

# Conflicts: # crates/autopilot/src/infra/persistence/mod.rs # crates/autopilot/src/solvable_orders.rs

github-actions · 2024-08-23T09:57:04Z

Reminder: Please update the DB Readme.

Caused by:

database/sql/V069__create_indexes_for_solvable_orders_search.sql

crates/autopilot/src/solvable_orders.rs

crates/autopilot/src/infra/persistence/mod.rs

crates/autopilot/src/solvable_orders.rs

crates/autopilot/src/database/onchain_order_events/ethflow_events.rs

MartinquaXD · 2024-09-04T06:33:52Z

crates/autopilot/src/infra/persistence/mod.rs

+        // Fetch quotes for new orders and also update them for the cached ones since
+        // they could also be updated.


This is also needed because ethflow orders could theoretically be reorged?
We should really find a way to implement ethflow orders in a nicer way to remove all those ugly edge cases. :/

This is also needed because ethflow orders could theoretically be reorged?

Because of this code:

services/crates/autopilot/src/database/onchain_order_events.rs

Lines 329 to 342 in e7de6bf

// We only need to insert quotes for orders that will be included in an

// auction (they are needed to compute solver rewards). If placement

// failed, then the quote is not needed.

insert_quotes(

transaction,

quotes

.clone()

.into_iter()

.flatten()

.collect::<Vec<_>>()

.as_slice(),

)

.await

.context("appending quotes for onchain orders failed")?;

MartinquaXD · 2024-09-04T06:46:37Z

crates/autopilot/src/solvable_orders.rs

@@ -168,31 +172,84 @@ impl SolvableOrdersCache {
    /// the case in unit tests, then concurrent calls might overwrite each
    /// other's results.
    pub async fn update(&self, block: u64) -> Result<()> {
+        const INITIAL_ORDER_CREATION_TIMESTAMP: DateTime<Utc> = DateTime::<Utc>::MIN_UTC;


After thinking more about it do we even need this sentinel value?
The cached order can already be optional and I would expect once we populate the cache there also has to be a non-zero timestamp in there so we can determine by the cache being None that we need to fetch ALL open orders instead of the incremental update.

With None some of the e2e tests fail. It's probably worth a separate PR to fix it.

Let's please understand at least what causes the tests to fail. I don't want us to check in code without understanding why it's actually needed.

That requires refactoring. Some of the code depends on auction always present in the DB, which is used across many of the e2e tests.

services/crates/e2e/src/setup/services.rs

Lines 303 to 317 in fb8dad3

pub async fn get_auction(&self) -> dto::AuctionWithId {

let response = self

.http

.get(format!("{API_HOST}{AUCTION_ENDPOINT}"))

.send()

.await

.unwrap();

let status = response.status();

let body = response.text().await.unwrap();

assert_eq!(status, StatusCode::OK, "{body}");

serde_json::from_str(&body).unwrap()

}

Also:

services/crates/e2e/src/setup/services.rs

Lines 304 to 317 in db3a1b5

async fn wait_until_autopilot_ready(&self) {

let is_up = || async {

let mut db = self.db.acquire().await.unwrap();

const QUERY: &str = "SELECT COUNT(*) FROM auctions";

let count: i64 = sqlx::query_scalar(QUERY)

.fetch_one(db.deref_mut())

.await

.unwrap();

count > 0

};

wait_for_condition(TIMEOUT, is_up)

.await

.expect("waiting for autopilot timed out");

}

Would be great to refactor since the sentinel value really doesn't seem necessary but IMO that can happen after this gets merged so we can test this a bit on staging before merging to prod on Tuesday.

crates/autopilot/src/solvable_orders.rs

# Conflicts: # crates/autopilot/src/infra/persistence/mod.rs

crates/autopilot/src/solvable_orders.rs

m-lord-renkse · 2024-09-04T09:13:11Z

crates/autopilot/src/solvable_orders.rs

@@ -337,6 +340,71 @@ impl SolvableOrdersCache {
            .collect()
    }

+    /// Returns current solvable orders along with the latest order creation
+    /// timestamp.
+    async fn get_solvable_orders(&self) -> Result<(SolvableOrders, DateTime<Utc>)> {


Getting the latest order creation timestamp can be done with the SolvableOrders. I believe it shouldn't be the responsibility of this function.

This function has to return either previous_creation_timestamp or latest_creation_timestamp. Since the former is not used anywhere else, I decided to keep it in this function.

crates/database/src/orders.rs

This reverts commit fb8dad3.

This reverts commit 459bf87.

MartinquaXD

Would like to simplify the code inside SolvableOrdersCache further if possible but if my suggestion doesn't work I'm fine with merging now and refactoring later to already get confidence in the change by running it in staging.

MartinquaXD · 2024-09-05T10:53:59Z

crates/autopilot/src/solvable_orders.rs

@@ -337,6 +340,72 @@ impl SolvableOrdersCache {
            .collect()
    }

+    /// Returns current solvable orders along with the latest order creation
+    /// timestamp.
+    async fn get_solvable_orders(&self) -> Result<(SolvableOrders, DateTime<Utc>)> {


Is it possible to simplify this query a lot by always using solvable_orders_after()?
All you need for that is the list of known orders, latest timestamp and latest block.
If the cache is populated you just take those values and if it's not you should be able to just use defaults for everything, no?

Is it possible to simplify this query a lot by always using solvable_orders_after()?

solvable_orders_after doesn't filter orders in the SQL query, so it would return the whole table at the first start if the creation timestamp/block number contains default values. I tried to explain this in the comment:

services/crates/autopilot/src/solvable_orders.rs

Lines 348 to 358 in 9438fa0

// A new auction should be created regardless of whether new solvable orders are

// found. The incremental solvable orders cache updater should only be

// enabled after the initial full SQL query

// (`persistence::all_solvable_orders`) returned some orders. Until then,

// `MIN_UTC` is used to indicate that no orders have been found yet by

// (`persistence::all_solvable_orders`). This prevents situations where

// starting the service with a large existing DB would cause

// the incremental query to load all unfiltered orders into memory, potentially

// leading to OOM issues because incremental query doesn't filter out

// expired/invalid orders in the SQL query and basically can return the whole

// table when filters with default values are used.

MartinquaXD

Alright, let's get this merged to make strides on the roadmap but let's try to keep simplifying the code further.

squadgazzz added 18 commits August 20, 2024 11:45

Update the cache structure

bee782a

Define new struct

0b8c97d

Fetch functions

22abb5e

update_solvable_orders

ab2f8cb

Fetch in parallel

e6a4a64

Minor

6984418

Naming

ac2f36a

BoxStream

efb7389

Tests

c185443

Naming

3a12bac

Tests

5786d82

Trades test

e7165f6

Indexes

d630e9e

Naming

fe50a81

Redundant index

350191f

Remove expired orders

a2262cb

Merge branch 'main' into 2831/improve-auction-update

298e31a

# Conflicts: # crates/autopilot/src/infra/persistence/mod.rs # crates/autopilot/src/solvable_orders.rs

Fix after merge

cea8850

MartinquaXD reviewed Aug 23, 2024

View reviewed changes

squadgazzz added 7 commits August 23, 2024 13:06

Fix test

8eb3e55

Redundant tx

c883090

Minor

53969d5

Check eth flow timestamp

3dc9243

Update executed amounts properly

2d0813b

Revert back to FullOrder

6d012c5

Query fix

8551f81

squadgazzz force-pushed the 2831/improve-auction-update branch from d8b475c to efc0d58 Compare August 23, 2024 14:03

squadgazzz added 2 commits August 23, 2024 17:03

Refactoring

efc0d58

Docs

c173a33

squadgazzz added 3 commits September 2, 2024 17:24

Minor formatting fix

7176646

Fetch quotes in a single tx

1afe8ed

Naming

dd8b362

MartinquaXD reviewed Sep 4, 2024

View reviewed changes

squadgazzz added 2 commits September 4, 2024 10:22

Leftovers

2c8a45c

Extract to a new function

ea84a28

squadgazzz force-pushed the 2831/improve-auction-update branch from bc42c01 to e661e66 Compare September 4, 2024 08:43

squadgazzz added 2 commits September 4, 2024 12:02

Rename migration

72d8b7a

Merge branch 'main' into 2831/improve-auction-update

55eb516

# Conflicts: # crates/autopilot/src/infra/persistence/mod.rs

squadgazzz force-pushed the 2831/improve-auction-update branch from 35759d0 to 55eb516 Compare September 4, 2024 09:03

m-lord-renkse reviewed Sep 4, 2024

View reviewed changes

crates/autopilot/src/solvable_orders.rs Outdated Show resolved Hide resolved

m-lord-renkse reviewed Sep 4, 2024

View reviewed changes

crates/database/src/orders.rs Outdated Show resolved Hide resolved

squadgazzz added 8 commits September 4, 2024 12:31

Use optional cache

459bf87

Remove autopilot e2e readiness check

fb8dad3

Revert "Remove autopilot e2e readiness check"

db3a1b5

This reverts commit fb8dad3.

Revert "Use optional cache"

4376b22

This reverts commit 459bf87.

Safe casting

bd95564

Test comments

b24d9cc

Unify query

1f94ebe

Merge branch 'main' into 2831/improve-auction-update

c724e35

MartinquaXD reviewed Sep 5, 2024

View reviewed changes

squadgazzz added 2 commits September 5, 2024 16:24

Updated comment

b012d45

Naming

9438fa0

MartinquaXD approved these changes Sep 5, 2024

View reviewed changes

Merge branch 'main' into 2831/improve-auction-update

ac250e7

squadgazzz enabled auto-merge (squash) September 5, 2024 14:17

squadgazzz merged commit ed09b91 into main Sep 5, 2024
10 checks passed

squadgazzz deleted the 2831/improve-auction-update branch September 5, 2024 14:23

github-actions bot locked and limited conversation to collaborators Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental solvable orders cache update #2923

Incremental solvable orders cache update #2923

squadgazzz commented Aug 23, 2024 •

edited

Loading

github-actions bot commented Aug 23, 2024 •

edited

Loading

MartinquaXD Sep 4, 2024

squadgazzz Sep 4, 2024

MartinquaXD Sep 4, 2024

squadgazzz Sep 4, 2024

MartinquaXD Sep 4, 2024

squadgazzz Sep 4, 2024

MartinquaXD Sep 5, 2024

m-lord-renkse Sep 4, 2024

squadgazzz Sep 4, 2024

MartinquaXD left a comment

MartinquaXD Sep 5, 2024

squadgazzz Sep 5, 2024 •

edited

Loading

MartinquaXD left a comment

		// Fetch quotes for new orders and also update them for the cached ones since
		// they could also be updated.

	// We only need to insert quotes for orders that will be included in an
	// auction (they are needed to compute solver rewards). If placement
	// failed, then the quote is not needed.
	insert_quotes(
	transaction,
	quotes
	.clone()
	.into_iter()
	.flatten()
	.collect::<Vec<_>>()
	.as_slice(),
	)
	.await
	.context("appending quotes for onchain orders failed")?;

	pub async fn get_auction(&self) -> dto::AuctionWithId {
	let response = self
	.http
	.get(format!("{API_HOST}{AUCTION_ENDPOINT}"))
	.send()
	.await
	.unwrap();

	let status = response.status();
	let body = response.text().await.unwrap();

	assert_eq!(status, StatusCode::OK, "{body}");

	serde_json::from_str(&body).unwrap()
	}

	async fn wait_until_autopilot_ready(&self) {
	let is_up = \|\| async {
	let mut db = self.db.acquire().await.unwrap();
	const QUERY: &str = "SELECT COUNT(*) FROM auctions";
	let count: i64 = sqlx::query_scalar(QUERY)
	.fetch_one(db.deref_mut())
	.await
	.unwrap();
	count > 0
	};
	wait_for_condition(TIMEOUT, is_up)
	.await
	.expect("waiting for autopilot timed out");
	}

	// A new auction should be created regardless of whether new solvable orders are
	// found. The incremental solvable orders cache updater should only be
	// enabled after the initial full SQL query
	// (`persistence::all_solvable_orders`) returned some orders. Until then,
	// `MIN_UTC` is used to indicate that no orders have been found yet by
	// (`persistence::all_solvable_orders`). This prevents situations where
	// starting the service with a large existing DB would cause
	// the incremental query to load all unfiltered orders into memory, potentially
	// leading to OOM issues because incremental query doesn't filter out
	// expired/invalid orders in the SQL query and basically can return the whole
	// table when filters with default values are used.

Incremental solvable orders cache update #2923

Incremental solvable orders cache update #2923

Conversation

squadgazzz commented Aug 23, 2024 • edited Loading

Description

Changes

How to test

Related Issues

github-actions bot commented Aug 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MartinquaXD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

squadgazzz Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

MartinquaXD left a comment

Choose a reason for hiding this comment

squadgazzz commented Aug 23, 2024 •

edited

Loading

github-actions bot commented Aug 23, 2024 •

edited

Loading

squadgazzz Sep 5, 2024 •

edited

Loading