Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs #2827

HeinrichApfelmus · 2021-08-16T15:56:29Z

Issue Number

Overview

This pull request removes a race condition in postTransactionOld where concurrent calls to the POST transactions endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend.

To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in postTransactionOld. Calls with different wallet IDs still run concurrently, and so do all other endpoints.

The main idea of this pull request is to introduce a small utility Control.Concurrent.Concierge that keeps track of a collection of locks. It provides a function atomicallyWith that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad io-sim.

Progress

Introduce Concierge utility for managing a collection of locks.
Make critical section atomic.
Unit tests for Concierge
- The Concierge actually makes things atomic
- The lock will be release upon an exception.

Comments

rvl

Good idea, thanks @HeinrichApfelmus!

I think the lock variable should be in WalletLayer not ApiLayer though.

Ideally, our database layer could also enforce constraints to prevent marking pending UTxO as pending, but I think that would first require a renovation of the DBLayer.

rvl · 2021-08-17T01:01:11Z

lib/core/src/Control/Concurrent/Concierge.hs

+-- Back in the old days, hotel concierges used to give out keys.
+-- But after the cryptocurrency revolution, they give out locks. :)
+-- (The term /lock/ is standard terminology in concurrent programming.)


Sounds like you're drawing a long bow with this Concierge name. 😛

😂 I have a sweet spot for figurative names. I briefly considered "Locksmith", but that suggested that the locks were being created on the fly instead of being rented out. The drawback of figurative names is that they take a moment of explanation, but the drawback of generic names like "lock collection", "lock factory", "lock registry", … is that it's easy to forget what they do.

rvl · 2021-08-17T01:21:24Z

lib/core/src/Cardano/Wallet/Api.hs

@@ -1073,14 +1077,21 @@ data ApiLayer s (k :: Depth -> Type -> Type)
        (TransactionLayer k)
        (DBFactory IO s k)
        (WorkerRegistry WalletId (DBLayer IO s k))
+        (Concierge IO WalletLock)


We should be able to put this lock variable into the WalletLayer type, right?
Then we wouldn't need a Concierge.

As I understand the problem, the lock we need is a spending coin selection lock. While there is a coin selection running, if the resulting TxIns will be marked as spent (pending), then we must prevent other threads from selecting and spending any UTxO from the same wallet.

Unfortunately, as you previously pointed out, much of postTransactionOld should be in the Wallet layer not API server layer. But it should still be possible to put this lock variable in the Wallet layer where it belongs.

Also, a future version of constructTransaction (the new transaction API) will have a query parameter to mark the UTxO which were selected as spent (pending). So we will need exactly the same coin selection locking there.

We should be able to put this lock variable into the WalletLayer type, right?
Then we wouldn't need a Concierge.

Actually, it's not a single lock for the entire layer, but one lock for each wallet id. 😅 The first approach that comes to mind for managing these locks would be to create each one lock at the beginning of the lifecycle of each wallet id. However, I think that it's less intrusive to use this Concierge thing instead, because it doesn't need to know about the lifecycle.

Unfortunately, as you previously pointed out, much of postTransactionOld should be in the Wallet layer not API server layer. But it should still be possible to put this lock variable in the Wallet layer where it belongs.

My current thinking is that these locks around postTransactionOld are supposed to be a quick fix for an issue that Binance is having, but not more. Imposing an ordering on the POST transactions endpoint seemed like an adequate solution for that, and that is something I would attribute to the ApiLayer, even though I (perhaps paradoxically) agree that postTransactionOld should be part of the wallet layer.

On the Wallet layer, the possibility to do selectAssets and submitTx separately makes it possible to circumvent the lock. Hence, on the wallet layer, I think that postTransactionOld should come without a lock. I think that for this layer, we need a more principled solution, especially because it is possible to do a coin selection and then discard the result without ever submitting to the chain, e.g. as estimateFee does. I would prefer a solution in terms of pure data dependencies: A user can do as many coin selections as they like, but each of these will reference the wallet state from which the selection was made. Submitting one selection to the wallet will render the others invalid, because they do not fit to the new wallet state; this could result in a helpful error message.

TL;DR: My preference would be more pure code (later), less locks. 😅 Hence this "throwaway" solution to locking the POST transactions endpoint.

OK - I appreciate the quick fix approach, because as the name suggests, postTransactionOld is not going to be around for long.

I probably should have said it explicitly - there is one WalletLayer per WalletId, so it seems to me like the natural home for this lock variable. The lock would be needed in the medium term, even after postTransactionOld has gone.

I think we pretty much agree but perhaps don't have a common vocabulary for this thing. I differentiate between "coin selections" and "spending coin selections" - the latter is where the TxIns to be spent must be immediately marked as Used with a Pending transaction. The former don't need a lock, the latter do.

It would be preferable for the wallet to run spending coin selections sequentially, rather than potentially return error messages about conflicts, invalid txins, etc.

I probably should have said it explicitly - there is one WalletLayer per WalletId, so it seems to me like the natural home for this lock variable.

Oh. I did not know that. This fact was not apparent to me, because all the functions in Cardano.Wallet take a WalletId and go through great pains to throw an ErrNoSuchWallet exception if that Id is invalid, hence I assumed that the WalletLayer cannot be tailored to be a specific WalletId, for then there would be no reason to throw this exception. Oh my. Can I vote to put this on the "to renovate" wishlist? 😅

Hm. For the sake of argument, let me try to find a way to get a lock into the WalletLayer. A WalletLayer is created only with hoistResource applied to an ApiLayer:

https://github.com/input-output-hk/cardano-wallet/blob/3f9513b926f0c9ebb7884da386e7419e8da56e27/lib/core/src/Cardano/Wallet/Api.hs#L1090-L1095

This is used in withWorkerCtx which is called in pretty much every a REST API endpoint:

https://github.com/input-output-hk/cardano-wallet/blob/3f9513b926f0c9ebb7884da386e7419e8da56e27/lib/core/src/Cardano/Wallet/Api/Server.hs#L3109-L3140

Oh my. This means that in order to create a lock in the WalletLayer, I would have to create it in the ApiLayer first, and then percolate it to the right wallet Id. In other words, the collection of locks (the equivalent of the Concierge thing) will not go away, instead, now it also has to be propagated to each WalletLayer. And I would have to think about the lifecycle: When do I create the lock, e.g. call newMVar? I suppose in registerWorker, but I'm not entirely sure actually.

https://github.com/input-output-hk/cardano-wallet/blob/3f9513b926f0c9ebb7884da386e7419e8da56e27/lib/core/src/Cardano/Wallet/Api/Server.hs#L3071-L3102

Hm. I think the easiest solution is the following: If you feel that the WalletLayer should at least know about the locks, then I can add a Concierge field to the WalletLayer and propagate a vanilla copy from the ApiLayer. But adding a lifecycle for the lock seems to be more trouble than it's worth at the moment; I would rather work on the DB layer renovation which will allow us to make the code much more pure, which in turn makes most of these problems disappear. 😅

rvl

I still think my suggestion is possible and better, but this looks like a fix, and it's good quality too. Thanks 👍

rvl · 2021-08-19T06:09:19Z

I rebased and squashed...

bors r+

2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus ### Issue Number ADP-780 ### Overview This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend. To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints. The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`. ### Progress - [x] Introduce `Concierge` utility for managing a collection of locks. - [x] Make critical section atomic. - [x] Unit tests for `Concierge` - [x] The `Concierge` actually makes things atomic - [x] The lock will be release upon an exception. ### Comments Co-authored-by: Heinrich Apfelmus <[email protected]>

We use the monad classes from `io-classes` in order to: * Make concurrency testable with QuickCheck * Better integrate with the monad transformers where `atomically` is used

We use a`Concierge` for locks of type `WalletLock` to enforce sequential ordering.

rvl · 2021-08-19T07:14:24Z

I'm not sure what happened to the bors job here. It seems to have been cancelled or something. Rebased again...

bors r+

2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus ### Issue Number ADP-780 ### Overview This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend. To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints. The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`. ### Progress - [x] Introduce `Concierge` utility for managing a collection of locks. - [x] Make critical section atomic. - [x] Unit tests for `Concierge` - [x] The `Concierge` actually makes things atomic - [x] The lock will be release upon an exception. ### Comments Co-authored-by: Heinrich Apfelmus <[email protected]>

iohk-bors · 2021-08-19T08:29:18Z

Build failed:

ci/hydra-build:required

[ 5 of 62] Compiling Cardano.Wallet.Api.Server.TlsSpec ( test/unit/Cardano/Wallet/Api/Server/TlsSpec.hs, dist/build/unit/unit-tmp/Cardano/Wallet/Api/Server/TlsSpec.o )
[ 6 of 62] Compiling Cardano.Wallet.ApiSpec ( test/unit/Cardano/Wallet/ApiSpec.hs, dist/build/unit/unit-tmp/Cardano/Wallet/ApiSpec.o )
iserv-proxy: {handle: <file descriptor: 27>}: GHCi.Message.remoteCall: end of file
remote-iserv.exe: {handle: <socket: 152>}: GHCi.Message.remoteCall: end of file
/nix/store/szr7mz98k78c1ivdhvnn0b393bhh5bk7-stdenv-linux/setup: line 1313:   104 Killed                  $SETUP_HS build test:unit -j$(($NIX_BUILD_CORES > 4 ? 4 : $NIX_BUILD_CORES)) --ghc-option=-fexternal-interpreter --ghc-option=-pgmi --ghc-option=/nix/store/8fsmnh6757axnmiiv5bx6bbqgsy2mhwz-iserv-wrapper/bin/iserv-wrapper --ghc-option=-L/nix/store/m4vpyw0cc16hnndaj78lwz154xdamr7b-mingw-w64-x86_64-w64-mingw32-6.0.0-pthreads-x86_64-w64-mingw32/lib --ghc-option=-L/nix/store/m4vpyw0cc16hnndaj78lwz154xdamr7b-mingw-w64-x86_64-w64-mingw32-6.0.0-pthreads-x86_64-w64-mingw32/bin --ghc-option=-L/nix/store/nkzf7225nhqhpljnh73r20wvg3qjxz9y-gmp-6.2.1-x86_64-w64-mingw32/lib
builder for '/nix/store/7y53xpksyrjybzafv25slfdi15sm5h17-cardano-wallet-core-test-unit-x86_64-w64-mingw32-2021.8.11.drv' failed with exit code 137

Seems like a build process was killed, not sure why. Monitoring shows that it wasn't OOM.

#2703

rvl · 2021-08-20T01:46:30Z

bors r+

2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus ### Issue Number ADP-780 ### Overview This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend. To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints. The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`. ### Progress - [x] Introduce `Concierge` utility for managing a collection of locks. - [x] Make critical section atomic. - [x] Unit tests for `Concierge` - [x] The `Concierge` actually makes things atomic - [x] The lock will be release upon an exception. ### Comments Co-authored-by: Heinrich Apfelmus <[email protected]>

iohk-bors · 2021-08-20T02:29:27Z

Build failed:

ci/hydra-build:required

Cached failure, oops. After restarting the failed builds, the jobset has succeeded

Also weeder failed later on Buildkite, but it's not related to this PR.

#duplicate

HeinrichApfelmus marked this pull request as draft August 16, 2021 15:56

rvl reviewed Aug 17, 2021

View reviewed changes

HeinrichApfelmus marked this pull request as ready for review August 17, 2021 13:00

rvl assigned HeinrichApfelmus Aug 18, 2021

rvl force-pushed the HeinrichApfelmus/ADP-780/coin-selection-lock branch from c9d0f28 to ff7d311 Compare August 19, 2021 06:06

rvl approved these changes Aug 19, 2021

View reviewed changes

HeinrichApfelmus added 3 commits August 19, 2021 15:13

Add Control.Concurrent.Concierge for locks

1f5f3d0

We use the monad classes from `io-classes` in order to: * Make concurrency testable with QuickCheck * Better integrate with the monad transformers where `atomically` is used

Make postTransactionOld atomic for each wallet

fc717b0

We use a`Concierge` for locks of type `WalletLock` to enforce sequential ordering.

Add unit tests for Control.Concurrent.Concierge

5b3e7ba

rvl force-pushed the HeinrichApfelmus/ADP-780/coin-selection-lock branch from ff7d311 to 5b3e7ba Compare August 19, 2021 07:14

rvl merged commit 0306078 into master Aug 20, 2021

rvl deleted the HeinrichApfelmus/ADP-780/coin-selection-lock branch August 20, 2021 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs #2827

Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs #2827

HeinrichApfelmus commented Aug 16, 2021 •

edited

Loading

rvl left a comment

rvl Aug 17, 2021

HeinrichApfelmus Aug 17, 2021

rvl Aug 17, 2021

HeinrichApfelmus Aug 17, 2021

rvl Aug 18, 2021

HeinrichApfelmus Aug 18, 2021

rvl left a comment

rvl commented Aug 19, 2021

rvl commented Aug 19, 2021

iohk-bors bot commented Aug 19, 2021 •

edited by Anviking

Loading

rvl commented Aug 20, 2021

iohk-bors bot commented Aug 20, 2021 •

edited by Anviking

Loading

Remove race condition in postTransactionOld where pending (spent) UTxO could be selected as inputs #2827

Remove race condition in postTransactionOld where pending (spent) UTxO could be selected as inputs #2827

Conversation

HeinrichApfelmus commented Aug 16, 2021 • edited Loading

Issue Number

Overview

Progress

Comments

rvl left a comment

Choose a reason for hiding this comment

rvl Aug 17, 2021

Choose a reason for hiding this comment

HeinrichApfelmus Aug 17, 2021

Choose a reason for hiding this comment

rvl Aug 17, 2021

Choose a reason for hiding this comment

HeinrichApfelmus Aug 17, 2021

Choose a reason for hiding this comment

rvl Aug 18, 2021

Choose a reason for hiding this comment

HeinrichApfelmus Aug 18, 2021

Choose a reason for hiding this comment

rvl left a comment

Choose a reason for hiding this comment

rvl commented Aug 19, 2021

rvl commented Aug 19, 2021

iohk-bors bot commented Aug 19, 2021 • edited by Anviking Loading

rvl commented Aug 20, 2021

iohk-bors bot commented Aug 20, 2021 • edited by Anviking Loading

Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs #2827

Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs #2827

HeinrichApfelmus commented Aug 16, 2021 •

edited

Loading

iohk-bors bot commented Aug 19, 2021 •

edited by Anviking

Loading

iohk-bors bot commented Aug 20, 2021 •

edited by Anviking

Loading