-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental solvable orders cache update #2923
Conversation
# Conflicts: # crates/autopilot/src/infra/persistence/mod.rs # crates/autopilot/src/solvable_orders.rs
Reminder: Please update the DB Readme. Caused by: |
d8b475c
to
efc0d58
Compare
crates/autopilot/src/database/onchain_order_events/ethflow_events.rs
Outdated
Show resolved
Hide resolved
// Fetch quotes for new orders and also update them for the cached ones since | ||
// they could also be updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also needed because ethflow orders could theoretically be reorged?
We should really find a way to implement ethflow orders in a nicer way to remove all those ugly edge cases. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also needed because ethflow orders could theoretically be reorged?
Because of this code:
services/crates/autopilot/src/database/onchain_order_events.rs
Lines 329 to 342 in e7de6bf
// We only need to insert quotes for orders that will be included in an | |
// auction (they are needed to compute solver rewards). If placement | |
// failed, then the quote is not needed. | |
insert_quotes( | |
transaction, | |
quotes | |
.clone() | |
.into_iter() | |
.flatten() | |
.collect::<Vec<_>>() | |
.as_slice(), | |
) | |
.await | |
.context("appending quotes for onchain orders failed")?; |
@@ -168,31 +172,84 @@ impl SolvableOrdersCache { | |||
/// the case in unit tests, then concurrent calls might overwrite each | |||
/// other's results. | |||
pub async fn update(&self, block: u64) -> Result<()> { | |||
const INITIAL_ORDER_CREATION_TIMESTAMP: DateTime<Utc> = DateTime::<Utc>::MIN_UTC; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking more about it do we even need this sentinel value?
The cached order can already be optional and I would expect once we populate the cache there also has to be a non-zero timestamp in there so we can determine by the cache being None
that we need to fetch ALL open orders instead of the incremental update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With None
some of the e2e tests fail. It's probably worth a separate PR to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's please understand at least what causes the tests to fail. I don't want us to check in code without understanding why it's actually needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires refactoring. Some of the code depends on auction always present in the DB, which is used across many of the e2e tests.
services/crates/e2e/src/setup/services.rs
Lines 303 to 317 in fb8dad3
pub async fn get_auction(&self) -> dto::AuctionWithId { | |
let response = self | |
.http | |
.get(format!("{API_HOST}{AUCTION_ENDPOINT}")) | |
.send() | |
.await | |
.unwrap(); | |
let status = response.status(); | |
let body = response.text().await.unwrap(); | |
assert_eq!(status, StatusCode::OK, "{body}"); | |
serde_json::from_str(&body).unwrap() | |
} |
Also:
services/crates/e2e/src/setup/services.rs
Lines 304 to 317 in db3a1b5
async fn wait_until_autopilot_ready(&self) { | |
let is_up = || async { | |
let mut db = self.db.acquire().await.unwrap(); | |
const QUERY: &str = "SELECT COUNT(*) FROM auctions"; | |
let count: i64 = sqlx::query_scalar(QUERY) | |
.fetch_one(db.deref_mut()) | |
.await | |
.unwrap(); | |
count > 0 | |
}; | |
wait_for_condition(TIMEOUT, is_up) | |
.await | |
.expect("waiting for autopilot timed out"); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to refactor since the sentinel value really doesn't seem necessary but IMO that can happen after this gets merged so we can test this a bit on staging before merging to prod on Tuesday.
bc42c01
to
e661e66
Compare
# Conflicts: # crates/autopilot/src/infra/persistence/mod.rs
35759d0
to
55eb516
Compare
@@ -337,6 +340,71 @@ impl SolvableOrdersCache { | |||
.collect() | |||
} | |||
|
|||
/// Returns current solvable orders along with the latest order creation | |||
/// timestamp. | |||
async fn get_solvable_orders(&self) -> Result<(SolvableOrders, DateTime<Utc>)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting the latest order creation timestamp can be done with the SolvableOrders
. I believe it shouldn't be the responsibility of this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has to return either previous_creation_timestamp
or latest_creation_timestamp
. Since the former is not used anywhere else, I decided to keep it in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to simplify the code inside SolvableOrdersCache
further if possible but if my suggestion doesn't work I'm fine with merging now and refactoring later to already get confidence in the change by running it in staging.
@@ -337,6 +340,72 @@ impl SolvableOrdersCache { | |||
.collect() | |||
} | |||
|
|||
/// Returns current solvable orders along with the latest order creation | |||
/// timestamp. | |||
async fn get_solvable_orders(&self) -> Result<(SolvableOrders, DateTime<Utc>)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to simplify this query a lot by always using solvable_orders_after()
?
All you need for that is the list of known orders, latest timestamp and latest block.
If the cache is populated you just take those values and if it's not you should be able to just use defaults for everything, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to simplify this query a lot by always using solvable_orders_after()?
solvable_orders_after
doesn't filter orders in the SQL query, so it would return the whole table at the first start if the creation timestamp/block number contains default values. I tried to explain this in the comment:
services/crates/autopilot/src/solvable_orders.rs
Lines 348 to 358 in 9438fa0
// A new auction should be created regardless of whether new solvable orders are | |
// found. The incremental solvable orders cache updater should only be | |
// enabled after the initial full SQL query | |
// (`persistence::all_solvable_orders`) returned some orders. Until then, | |
// `MIN_UTC` is used to indicate that no orders have been found yet by | |
// (`persistence::all_solvable_orders`). This prevents situations where | |
// starting the service with a large existing DB would cause | |
// the incremental query to load all unfiltered orders into memory, potentially | |
// leading to OOM issues because incremental query doesn't filter out | |
// expired/invalid orders in the SQL query and basically can return the whole | |
// table when filters with default values are used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, let's get this merged to make strides on the roadmap but let's try to keep simplifying the code further.
Description
In order to relieve the situation, it was proposed to introduce incremental solvable orders cache update, which selects all the solvable orders using the old heavy query only at startup, stores the latest received order's creation timestamp in memory, and then makes much faster incremental bounded queries to the orders and additional tables that select fewer data and executes faster.
Changes
Since incremental fetching retrieves orders created/cancelled after the specific timestamps, it is also required now to fetch orders that have any onchain update based on the last fetched block number. Having said that, the data needs to be fetched within a single TX, so there is no way to run all the queries in parallel.
As a result, we now have 3 SQL queries where each executes in ~50ms instead of a single one taking ~2s.
How to test
New DB tests. Existing e2e tests.
Related Issues
Fixes #2831