Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Improve time to create Auction #2831

Closed
fleupold opened this issue Jul 24, 2024 · 4 comments · Fixed by #2923
Closed

chore: Improve time to create Auction #2831

fleupold opened this issue Jul 24, 2024 · 4 comments · Fixed by #2923
Assignees
Labels
E:6.1 One auction per block See https://github.com/cowprotocol/pm/issues/35 for details

Comments

@fleupold
Copy link
Contributor

Background

Updating solvable orders (ie creating a new auction) currently takes >2s with some pretty heavy outliers (logs)

This makes it hard to bring CoW protocol's auction rate down to one batch per block as simply creating up to date state would take >15% of the total time we have at hand. We should at least be able to half this time (if not getting it down even more)

Details

The first step should be to get a better understanding on which of the different parts in the auction creation takes the most time.

It's very likely that this is related to us computing the auction always "from scratch". We should see if we can switch to a more incremental way where only the necessary "on chain state" is queried after the block for the next auction has been processed, but all other things are kept ready and warm throughout the 12s (e.g. whatever we do when a new order is placed should happen as soon as we notice a new order)

Acceptance criteria

p95 time to create a new auction is <1s

@fleupold fleupold added the E:6.1 One auction per block See https://github.com/cowprotocol/pm/issues/35 for details label Jul 24, 2024
@sunce86
Copy link
Contributor

sunce86 commented Jul 25, 2024

I looked into this. Here are some numbers for production database.

Orders table has ~3.2M orders overall.
OPEN_ORDERS query returns currently solvable/open orders and takes around 3.5s to execute fully.

Two most costly filters in OPEN_ORDERS query are on orders table:

  1. filter by valid_to
  2. filter by status, where the joining with trades table is done to check if the order is fulfilled.

If we start with a simplified OPEN_ORDERS query where we take all 3.2M orders from orders and filter them only by valid_to -> query takes ~300ms.
(this reduces 3.2M orders to ~140k orders)

If we add filtering by status on top of ☝️ -> query now takes 2.2s.
(this reduces ~140k orders to ~20k orders)

This means that we have to eliminate these two conditions and cache them somehow, so we could work with reduced list of 20k orders. Possible solutions:

  1. Add new field to table orders that will cache status -> it carries an information if order IS DEFINITELY CLOSED. This field wouldn't have to be refreshed on each trade but rather occasionally (once a day for example), and it would serve us to reduce number of orders 3.2M -> 20k together with valid_to which is already a part of orders table.

  2. Create a new table closed_orders that would cache the status information. This is less performant option as we still need to match 140k orders by order id (orders table -> closed_orders table). AFAIS matching 140k rows from two tables adds 200-400ms which is significant.

  3. Split orders table in two tables: closed_orders table (3.18M orders) and maybe_open_orders 🤡 table (20k orders). Have a periodic maintenance function that would move open orders to closed orders. OPEN_ORDERS query would only look in maybe_open_orders so it would be super fast.

  4. Simply create additional table open_orders that serves solely to return solvable/open orders. Whenever a new order is posted, it's inserted into both orders and open_orders tables. open_orders table is periodically filtered and closed orders are simply deleted. So open_orders is a subset (duplicated) data of orders table.

  5. Any other idea?

I think I would be in favor of (4) as simplest and fastest solution.

@fleupold
Copy link
Contributor Author

I wonder if the btree index on the trades table is sufficient, or if - specifically for this join - we should add another order_uid index?

I wouldn't persist any additional information but simply move to an incremental solvable orders implementation which is managed in memory:

  1. Upon startup we fetch all orders and trades (separately) to build the book of open orders. It's ok if this takes a bit since it only happens once.
  2. We store the last seen timestamp for orders, and last seen block number for trades
  3. When refetching, we only query orders with creation/cancellation timestamp > last seen and trades with block > last seen
  4. From this information we build an updated solvable orders table, and update the last seen timestamps/block numbers

We might need some additional btree indices for efficient range queries but I believe this should speed up things quite significantly.

@squadgazzz squadgazzz assigned squadgazzz and unassigned squadgazzz Aug 8, 2024
@squadgazzz squadgazzz self-assigned this Aug 19, 2024
@squadgazzz
Copy link
Contributor

Before submiting a PR, just wanted to clarify the idea.

Upon startup we fetch all orders and trades (separately)

I assume only all solvable orders are implied here since the orders table size is about 4gb. Also, I don't think trades need to be fetched on the startup. The last block number used from the trades table could be added to the original query. Having all of those, we have a solvable orders cache, then fetch new orders and new trades. Based on the new trades data, remove non-relevant orders from the cache and add new ones. Would that work?

@fleupold
Copy link
Contributor Author

You could potentially even use the trade events that we index in the autopilot directly to update the cache (instead from first writing and then reading to disk), but I'd be surprised if this is necessary. Generally I think your idea sounds good.

squadgazzz added a commit that referenced this issue Sep 5, 2024
# Description
> Updating solvable orders (ie creating a new auction) currently takes
>2s with some pretty heavy outliers
([logs](https://production-6de61f.kb.eu-central-1.aws.cloud.es.io/app/r/s/ALsEK))
> 
> This makes it hard to bring CoW protocol's auction rate down to one
batch per block as simply creating up to date state would take >15% of
the total time we have at hand. We should at least be able to half this
time (if not getting it down even more)

In order to relieve the situation, it was proposed to introduce
incremental solvable orders cache update, which selects all the solvable
orders using the old heavy query only at startup, stores the latest
received order's creation timestamp in memory, and then makes much
faster incremental bounded queries to the orders and additional tables
that select fewer data and executes faster.

# Changes

Since incremental fetching retrieves orders created/cancelled after the
specific timestamps, it is also required now to fetch orders that have
any onchain update based on the last fetched block number. Having said
that, the data needs to be fetched within a single TX, so there is no
way to run all the queries in parallel.

1. If the current solvable orders cache is empty, execute the original
heavy SQL query to fetch all current solvable orders and store them in
memory.
2. Otherwise, fetch full orders that created or cancelled after the last
stored timestamp and also find UIDs of the order that have any onchain
data updated after the latest observed block number. This includes
fetching data from the following tables: trades, ethflow_data,
order_execution, invalidations, onchain_order_invalidations,
onchain_placed_orders, presignature_events.
3. Fetch quotes for all the collected orders.
4. Add all the newly received orders to the cache.
5. Filter out all the orders that are one of: contain on-chain errors,
expired, fulfilled, invalidated.
6. Calculate the latest observed order creation timestamp.
7. Continue with the regular auction creation process.

As a result, we now have 3 SQL queries where each executes in ~50ms
instead of a single one taking ~2s.

## How to test
New DB tests. Existing e2e tests.

## Related Issues

Fixes #2831
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E:6.1 One auction per block See https://github.com/cowprotocol/pm/issues/35 for details
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants