Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solver+Driver Co-Location #230

Closed
nlordell opened this issue May 25, 2022 · 11 comments
Closed

Solver+Driver Co-Location #230

nlordell opened this issue May 25, 2022 · 11 comments

Comments

@nlordell
Copy link
Contributor

nlordell commented May 25, 2022

Issue for discussing design for solver and driver co-location.

Intro

Currently, we have a single "driver" process called solver which is responsible for cutting new auctions, invoking each solver implementation, and then submitting the transaction on-chain. There are however, a few issues with this model. The main one being that solvers are responsible themselves for paying failed transaction costs, even if they aren't actually
in control of the logic that actually gets these transactions on-chain. Furthermore, there is some "burden of history" with the current HTTP solver API as it was designed ad-hoc around our internal optimization solvers.

Overview

The main design idea came from dicussions in #175 where we would split the solver into an "api driver" (henceforth referred to as the autopilot) which has protocol responsibilities (cutting auctions, invoking configured solvers, performing and recording settlement competition) and a driver (collects liquidity if needed, invokes internal or HTTP solver, mines transaction if it is selected as the winner).

With this design, we would continue to operate the autopilot as a protocol infrastructure piece, but would be able to co-locate drivers with external solvers so that they can make operational decisions based on the Ethereum network conditions (do they filter out "volatile" tokens, what private network submission do they use, do they use the mempool, how much additional priority fee do they use, do they turn off their solver, etc.).

Another bonus of this design is it allows us to shift all "logic" that needs to be done by a single actor into the autopilot allowing the orderbook API to scale horizontally and be able to spin up as many instances as it needs for handling incomming traffic (quoting, order placement, etc.).

Preparing an Auction

Building the auction is no longer done in the orderbook, this is to avoid any inconsistencies around differing token price estimates which are used for computing the objective value if we have multiple orderbook instances running at the same time. We should still build this on a background fiber. Basically, the SolvableOrdersCache should move into the autopilot.

sequenceDiagram
  participant autopilot
  participant database
  participant driver0
  participant driver1

  loop Background fiber
    Note over autopilot,database: Read open orders
    autopilot->>database: select open orders
    database->>autopilot: orders[]

    Note over autopilot,driver1: Request native token prices for open orders
    par foreach token
      par driver0
        autopilot->>driver0: POST quote
        activate driver0
        driver0->>autopilot: quote
        deactivate driver0
      and driver1
        autopilot->>driver1: POST quote
        activate driver1
        driver1->>autopilot: quote
        deactivate driver1
      end
      
      autopilot->>autopilot: compute native token price
    end

    Note over autopilot: Store the prepared<br/>auction so solving<br/>can start instantly
    autopilot->>autopilot: in-memory cache for current auction
  end
Loading

Auction Run-Loop

Since auctions are being prepared in a background thread, once its time to solve the autopilot would cut an auction. This "cut auction" needs to be recorded in the database. Each driver is requested to solve for the auction and submit a proposal. Note that proposals don't contain calldata, and only the objective value that they computed so we can do the competition. This allows solvers using RFQ systems (like 0x) to also be able to participate. This also, currently, would rely on accurate gas estimates, but since we plan on removing gas costs from the objective in the near future, this will be less critical.

Once all the settlement proposals are collected and stored in the database (in order to provide a historical record of the auction as well as status information for clients to consume), a winner is selected. The autopilot then gives a "green-light" to the winner to start executing the settlement on-chain. The driver then signs an attestation, agreeing that it received the request to execute a settlement on-chain by a certain block deadline. It can also return an HTTP error code indicating that it can no longer execute its proposed settlement on-chain, or that it doesn't agree with the deadline. In this case, we would fall through to the next runner up. If a driver refuses to attest too many settlements, we can manually remove it from the list of drivers that are participating. As a future improvement, we can also implement "penalty" periods where they are excluded from subsequent batches for misbehaving this way. The reason for this attestation, and why its important, is that its a "bonding contract" indicating intent to execute a settlement. If attested account nonce isn't a settlement interaction with the attested settlement details until the block deadline, then they will receive a penalty in the next solver payout.

In theory, as a future improvement, we can also start the next run-loop optimistically excluding the orders that are in-flight to increase settlement throughput.

sequenceDiagram
  participant autopilot
  participant database
  participant driver0
  participant driver1
  participant ethereum

  Note over autopilot,database: Cut and commit the auction to<br/>the database, this way the api<br/>can provide auction status<br/>information
  autopilot->>autopilot: cut auction
  autopilot->>database: commit auction
  
  Note over autopilot,driver1: Gather settlement proposals from each driver
  par driver0
    autopilot->>driver0: POST solve(auction)
    activate driver0
    driver0->>autopilot: proposal
    deactivate driver0
  and driver1
    autopilot->>driver1: POST solve(auction)
    activate driver1
    driver1->>autopilot: proposal
    deactivate driver1

    autopilot->>database: commit proposal
  end
  
  autopilot->>autopilot: select winner
  
  Note over autopilot,ethereum: tell the winner to execute settlement on-chain
  opt e.g. driver1 is the winner
    autopilot->>driver1: POST execute(proposal_id, objective, deadline)
    activate driver1
    
    driver1->>autopilot: attestation
    autopilot->>database: commit attestation

    par driver executes transaction
      driver1->>ethereum: eth_sendTransaction
      ethereum->>driver1: transaction_hash
      loop wait for transaction to mine
        driver1->>ethereum: eth_getTransactionReceipt(transaction_hash)
        ethereum->>driver1: transaction_receipt
      end
    and autopilot waits for account+nonce to increment
      loop wait for nonce to increment
        autopilot->>ethereum: eth_getTransactionCount(driver_account)
        ethereum->>autopilot: nonce
      end
    end
  end
Loading

Co-located Driver API

Some very first drafts for the API of the co-located drivers. These can evolve ad-hoc as we implement the co-location.

Quoting

This basically puts our current PriceEstimating interface behind an API - so each co-located driver can provide price estimates based on their solving strategy. Hopefully this makes it so we get price estimators "for free" when we add new solvers. Meaning solvers for specialized liquidity or with PMMs will be able to advertise their prices.

POST /quote

>> {
>>   sellToken: "0xDEf1CA1fb7FBcDC777520aa7f396b4E015F497aB",
>>   buyToken: "0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2",
>>   buyAmount: "1000000000000000000",
>>   // Or, alternatively, for sell quotes:
>>   //sellAmount: "200000000000000000",
>> }

<< {
<<   sellToken: "0xDEf1CA1fb7FBcDC777520aa7f396b4E015F497aB",
<<   buyToken: "0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2",
<<   sellAmount: "198107012937109273",
<<   buyAmount: "1000000000000000000",
<<   gas: "200000"
<< }

Solution Proposal

Solving is very "private". You are given a batch and compute the objective value you promise to make. This allows solvers to keep private liquidity and other privileged information hidden until the transaction makes it on-chain.

Note that the solution proposal needs to be signed. This allows the protocol to prove that a solution was proposed for a specific auction.

POST /solve

>> {
>>   orders: [...],
>>   externalPrices: {
>>     "0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2": "1000000000000000000",
>>     "0xDEf1CA1fb7FBcDC777520aa7f396b4E015F497aB": "198107012937109273",
>>   },
>> }

<< {
<<   auction: "1337",
<<   objective: "10000000000000000000000000000000000000000000000000000",
<<   signature: "0x...",
<< }

Execution and Attestation

POST /execute

>> {
>>   auction: "1337",
>>   objective: "10000000000000000000000000000000000000000000000000000",
>>   deadline: 1653496272, // block timestamp
>> }

<< {
<<   auction: "1337",
<<   objective: "10000000000000000000000000000000000000000000000000000",
<<   deadline: 1653496272,
<<   account: "0x4242424242424242424242424242424242424242",
<<   nonce: "1337",
<<   clearingPrices: {
<<     "0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2": "1000000000000000000",
<<     "0xDEf1CA1fb7FBcDC777520aa7f396b4E015F497aB": "198107012937109273",
<<   },
<<   trades: [
<<     { uid: "0x..." },
<<     { uid: "0x...", executedAmount: "..." }, // partially fillable
<<   ],
<<   internalizedInteractions: [
<<     {
<<       calldata: "0x...",
<<       inputs: [
<<         { tokens: "0x...", amount: "..." },
<<         { tokens: "0x...", amount: "..." },
<<       ],
<<       outputs: [
<<         { tokens: "0x...", amount: "..." },
<<       ],
<<   ],
<<   calldata: "0x...",
<<   signature: "0x....", // signed attestations - proves that execution was accepted
<< }

Implementation

I believe that most of the autopilot and driver implementation will just be moving code around, as most of the logic we need is already there:

  • Quoting is just our existing PriceEstimating interface behind an HTTP endpoint
  • Objective value computation already exists
  • Drivers will need to validate their solutions. They can be penalized for not following the rules
    • Logic for validation already exists!
    • Do we need to setup automated checking? I imagine we are already working on this.
  • Driver can internalize out existing "built-in" solvers (Baseline, Naive, 1Inch, ParaSwap, 0x) and just put it behind an API.
    • What's cool is this allows our DexAgg solver to start solving right away without waiting for liquidity to be fetched - since it doesn't use it.
  • Driver uses the existing HTTP API to call out to Quasimodo, MIP and external solvers
    • Again, we can forego including liquidity for external solvers that don't care about it.

The biggest change, IMO is the attestation. We do need some sort of system if we co-locate the driver with the solver.

@MartinquaXD
Copy link
Contributor

Role of the current driver implementation

It's unclear to me what future driver development should look like. You mentioned that integrators would want to tweak/optimize how they submit settlements (because they are paying for it) but that means that each driver would probably become unique eventually.
With that in mind would our work on the driver focus on what quasimodo and other in-house solvers need?
Would we keep the driver open source as a very opinionated starting point for external integrators to use?

Convenience

Drivers will need to validate their solutions. They can be penalized for not following the rules

It would be cool to publish a crate which implements the checks for the latest rules of the game so external integrators don't have to reinvent the wheel every time (in case they want to build their own driver).

Security

If we make external parties responsible for submitting their solutions we should be able to enforce that they submit exactly what they signed. Otherwise they could just win the competition and shift the surplus in their favor afterwards.
It seems crucial that we get that right. I imagine that would require the smart contract to verify a signature of some sort.
Maybe /execute could additionally return the smart contract call data signed by the autopilot. Now we know the autopilot agreed because of the signature and the external driver agreed because it tries to bring the transaction on-chain.

[autopilot_signature, ...normal_call_data]

Either we could redeploy an updated settlement contract or place a new validation contract in front and make the settlement contract only accept solutions coming from the validation contract.
There are probably better ways to accomplish the same. I just want to get the discussion going before we design a system which doesn't have the security properties we want/need.

@sunce86
Copy link
Contributor

sunce86 commented May 30, 2022

Will have more comments on this, but just to drop a quick thought:

One negative consequence of giving solvers more freedom to decide when and how to submit txs and when to start/stop could mean that, during high volatile network periods all solvers decide to stop working (because the cost are way higher then incentives), while on the other hand, we as a protocol, are probably ready to take the temporary loss on ourselves in order to avoid downtime.

Maybe we can fix this with dynamically adjusted (COW) incentives to cover the losses during high volatile network periods.

@nlordell
Copy link
Contributor Author

One negative consequence of giving solvers more freedom to decide when and how to submit txs and when to start/stop could mean that, during high volatile network periods all solvers decide to stop working

In the new rules of the game, solvers that propose a settlement, win, but don't execute the transaction will be penalized.

are probably ready to take the temporary loss on ourselves in order to avoid downtime.

I think Gnosis run solvers could still do this.

Maybe we can fix this with dynamically adjusted (COW) incentives to cover the losses during high volatile network periods.

🤔 - that is a neat idea.

@tbosman
Copy link

tbosman commented May 30, 2022

Will have more comments on this, but just to drop a quick thought:

One negative consequence of giving solvers more freedom to decide when and how to submit txs and when to start/stop could mean that, during high volatile network periods all solvers decide to stop working (because the cost are way higher then incentives), while on the other hand, we as a protocol, are probably ready to take the temporary loss on ourselves in order to avoid downtime.

Maybe we can fix this with dynamically adjusted (COW) incentives to cover the losses during high volatile network periods.

Just a quick comment as solver builder: this is definitely true empirically right now. Typically high gas fees, high slippage and therefore high failure rates are correlated. Sometimes you see brief periods where the gas cost per transaction might be 10 times the solver reward, while 10-20% of the transactions are failing.

One way to deal with this would be to dynamically change the cow rewards.

Another approach would be to let solvers risk manage on their own. They could increase the slippage parameters for the DEX transactions, while also clearing at a lower surplus for the users, to make sure they don't get penalized for the slippage that's bound to occur. This decentralizes the logic for adjusting to market conditions, and it doesn't affect transactions that are settled in a cow.

In the current setup, there is no incentive for solvers to implement this though; the gnosis solvers don't comply and hence will win all the batches at a loss.

@nlordell
Copy link
Contributor Author

With that in mind would our work on the driver focus on what quasimodo and other in-house solvers need?

Yes. Also, its GPL3, so if they make improvements to transaction submission, external teams should share those improvements back with us (at which point, we can choose to integrate it or not).

It would be cool to publish a crate which implements the checks for the latest rules of the game so external integrators don't have to reinvent the wheel every time (in case they want to build their own driver).

Yes. It is the plan to include this with the reference driver implementation.

Otherwise they could just win the competition and shift the surplus in their favor afterwards.

We plan on slash this kind of behaviour. Proposing a solution should bind you to executing it if it wins. For solvers that don't want to participate in a specific batch (because of network conditions like in @tbosman's example) they shouldn't propose a solution.

@nlordell
Copy link
Contributor Author

Another approach would be to let solvers risk manage on their own.

This is the main motivation for co-locating drivers and solvers. It gives them more control to do risk management themselves.

@MartinquaXD
Copy link
Contributor

We plan on slash this kind of behaviour.

I assume it would give people more trust in the protocol if we could enforce the correct behavior/prevent the bad behavior by design. Monitoring each settlement to detect bad behavior seems like a lot of hassle.
Maybe @fedgiac could give us an estimation on how much it would cost enforcing that on a smart contract level so we can weigh the pros and cons?

@fedgiac
Copy link
Contributor

fedgiac commented May 30, 2022

If we make external parties responsible for submitting their solutions we should be able to enforce that they submit exactly what they signed. Otherwise they could just win the competition and shift the surplus in their favor afterwards.

an estimation on how much it would cost enforcing that on a smart contract level

Just to sketch how it could look like:

  1. When the drivers respond to the autopilot on POST solve, the solver commits to a salted hash of the calldata.
  2. When the autopilot calls POST execute, it includes a signature of the hash by the autopilot.
  3. On execution, the driver calls a solver contract that takes the execution calldata, the salt and the signature. This contracts computes the hash and then verifies the signature. (We also probably want to add a validTo timestamp for the signature.)

Then the execution overhead would be overall more calldata, an extra contract call, hash computation and signature verification. I think the latter is the most gas expensive operation at 3k gas, but probably the extra call could cost another 1.5k. So my first guess would be about 5k gas (that is 1$/settlement @ 100Gwei/gas, 2k$/ETH).

The drawbacks of this approach is its centralization (requires trusted authority to prepare the signature). This is in a way intrinsic to the fact that an authority (the autopilot) decides what the best solution is.

@vkgnosis
Copy link
Contributor

vkgnosis commented Jun 1, 2022

Good writeup. Still mulling it over in my head. So far it makes sense to me.

@nlordell
Copy link
Contributor Author

Monitoring each settlement to detect bad behavior seems like a lot of hassle.

I think we will have to do this regardless.

  1. When the drivers respond to the autopilot on POST solve, the solver commits to a salted hash of the calldata.

The issue with this is that solvers themselves might not have the calldata at this point (if they use an RFQ system for example, they would only build the calldata right before submitting).

Copy link

This issue has been marked as stale because it has been inactive a while. Please update this issue or it will be automatically closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants