-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove race condition in postTransactionOld
where pending (spent) UTxO could be selected as inputs
#2827
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, thanks @HeinrichApfelmus!
I think the lock variable should be in WalletLayer
not ApiLayer
though.
Ideally, our database layer could also enforce constraints to prevent marking pending UTxO as pending, but I think that would first require a renovation of the DBLayer.
-- Back in the old days, hotel concierges used to give out keys. | ||
-- But after the cryptocurrency revolution, they give out locks. :) | ||
-- (The term /lock/ is standard terminology in concurrent programming.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like you're drawing a long bow with this Concierge name. 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂 I have a sweet spot for figurative names. I briefly considered "Locksmith", but that suggested that the locks were being created on the fly instead of being rented out. The drawback of figurative names is that they take a moment of explanation, but the drawback of generic names like "lock collection", "lock factory", "lock registry", … is that it's easy to forget what they do.
@@ -1073,14 +1077,21 @@ data ApiLayer s (k :: Depth -> Type -> Type) | |||
(TransactionLayer k) | |||
(DBFactory IO s k) | |||
(WorkerRegistry WalletId (DBLayer IO s k)) | |||
(Concierge IO WalletLock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to put this lock variable into the WalletLayer
type, right?
Then we wouldn't need a Concierge.
As I understand the problem, the lock we need is a spending coin selection lock. While there is a coin selection running, if the resulting TxIns will be marked as spent (pending), then we must prevent other threads from selecting and spending any UTxO from the same wallet.
Unfortunately, as you previously pointed out, much of postTransactionOld
should be in the Wallet layer not API server layer. But it should still be possible to put this lock variable in the Wallet layer where it belongs.
Also, a future version of constructTransaction
(the new transaction API) will have a query parameter to mark the UTxO which were selected as spent (pending). So we will need exactly the same coin selection locking there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to put this lock variable into the WalletLayer type, right?
Then we wouldn't need a Concierge.
Actually, it's not a single lock for the entire layer, but one lock for each wallet id. 😅 The first approach that comes to mind for managing these locks would be to create each one lock at the beginning of the lifecycle of each wallet id. However, I think that it's less intrusive to use this Concierge thing instead, because it doesn't need to know about the lifecycle.
Unfortunately, as you previously pointed out, much of postTransactionOld should be in the Wallet layer not API server layer. But it should still be possible to put this lock variable in the Wallet layer where it belongs.
My current thinking is that these locks around postTransactionOld
are supposed to be a quick fix for an issue that Binance is having, but not more. Imposing an ordering on the POST transactions
endpoint seemed like an adequate solution for that, and that is something I would attribute to the ApiLayer
, even though I (perhaps paradoxically) agree that postTransactionOld
should be part of the wallet layer.
On the Wallet layer, the possibility to do selectAssets
and submitTx
separately makes it possible to circumvent the lock. Hence, on the wallet layer, I think that postTransactionOld
should come without a lock. I think that for this layer, we need a more principled solution, especially because it is possible to do a coin selection and then discard the result without ever submitting to the chain, e.g. as estimateFee
does. I would prefer a solution in terms of pure data dependencies: A user can do as many coin selections as they like, but each of these will reference the wallet state from which the selection was made. Submitting one selection to the wallet will render the others invalid, because they do not fit to the new wallet state; this could result in a helpful error message.
TL;DR: My preference would be more pure code (later), less locks. 😅 Hence this "throwaway" solution to locking the POST transactions endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I appreciate the quick fix approach, because as the name suggests, postTransactionOld
is not going to be around for long.
I probably should have said it explicitly - there is one WalletLayer per WalletId, so it seems to me like the natural home for this lock variable. The lock would be needed in the medium term, even after postTransactionOld
has gone.
I think we pretty much agree but perhaps don't have a common vocabulary for this thing. I differentiate between "coin selections" and "spending coin selections" - the latter is where the TxIns to be spent must be immediately marked as Used with a Pending transaction. The former don't need a lock, the latter do.
It would be preferable for the wallet to run spending coin selections sequentially, rather than potentially return error messages about conflicts, invalid txins, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably should have said it explicitly - there is one WalletLayer per WalletId, so it seems to me like the natural home for this lock variable.
Oh. I did not know that. This fact was not apparent to me, because all the functions in Cardano.Wallet
take a WalletId
and go through great pains to throw an ErrNoSuchWallet
exception if that Id is invalid, hence I assumed that the WalletLayer
cannot be tailored to be a specific WalletId
, for then there would be no reason to throw this exception. Oh my. Can I vote to put this on the "to renovate" wishlist? 😅
Hm. For the sake of argument, let me try to find a way to get a lock into the WalletLayer
. A WalletLayer
is created only with hoistResource
applied to an ApiLayer
:
This is used in withWorkerCtx
which is called in pretty much every a REST API endpoint:
Oh my. This means that in order to create a lock in the WalletLayer
, I would have to create it in the ApiLayer
first, and then percolate it to the right wallet Id. In other words, the collection of locks (the equivalent of the Concierge
thing) will not go away, instead, now it also has to be propagated to each WalletLayer
. And I would have to think about the lifecycle: When do I create the lock, e.g. call newMVar
? I suppose in registerWorker
, but I'm not entirely sure actually.
Hm. I think the easiest solution is the following: If you feel that the WalletLayer
should at least know about the locks, then I can add a Concierge
field to the WalletLayer
and propagate a vanilla copy from the ApiLayer
. But adding a lifecycle for the lock seems to be more trouble than it's worth at the moment; I would rather work on the DB layer renovation which will allow us to make the code much more pure, which in turn makes most of these problems disappear. 😅
c9d0f28
to
ff7d311
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think my suggestion is possible and better, but this looks like a fix, and it's good quality too. Thanks 👍
I rebased and squashed... bors r+ |
2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus ### Issue Number ADP-780 ### Overview This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend. To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints. The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`. ### Progress - [x] Introduce `Concierge` utility for managing a collection of locks. - [x] Make critical section atomic. - [x] Unit tests for `Concierge` - [x] The `Concierge` actually makes things atomic - [x] The lock will be release upon an exception. ### Comments Co-authored-by: Heinrich Apfelmus <[email protected]>
We use the monad classes from `io-classes` in order to: * Make concurrency testable with QuickCheck * Better integrate with the monad transformers where `atomically` is used
We use a`Concierge` for locks of type `WalletLock` to enforce sequential ordering.
ff7d311
to
5b3e7ba
Compare
I'm not sure what happened to the bors job here. It seems to have been cancelled or something. Rebased again... bors r+ |
2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus ### Issue Number ADP-780 ### Overview This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend. To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints. The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`. ### Progress - [x] Introduce `Concierge` utility for managing a collection of locks. - [x] Make critical section atomic. - [x] Unit tests for `Concierge` - [x] The `Concierge` actually makes things atomic - [x] The lock will be release upon an exception. ### Comments Co-authored-by: Heinrich Apfelmus <[email protected]>
bors r+ |
2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus ### Issue Number ADP-780 ### Overview This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend. To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints. The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`. ### Progress - [x] Introduce `Concierge` utility for managing a collection of locks. - [x] Make critical section atomic. - [x] Unit tests for `Concierge` - [x] The `Concierge` actually makes things atomic - [x] The lock will be release upon an exception. ### Comments Co-authored-by: Heinrich Apfelmus <[email protected]>
Build failed: Cached failure, oops. After restarting the failed builds, the jobset has succeeded Also weeder failed later on Buildkite, but it's not related to this PR. #duplicate |
Issue Number
ADP-780
Overview
This pull request removes a race condition in
postTransactionOld
where concurrent calls to thePOST transactions
endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend.To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in
postTransactionOld
. Calls with different wallet IDs still run concurrently, and so do all other endpoints.The main idea of this pull request is to introduce a small utility
Control.Concurrent.Concierge
that keeps track of a collection of locks. It provides a functionatomicallyWith
that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monadio-sim
.Progress
Concierge
utility for managing a collection of locks.Concierge
Concierge
actually makes things atomicComments