Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove race condition in postTransactionOld where pending (spent) UTxO could be selected as inputs #2827

Merged
merged 3 commits into from
Aug 20, 2021

Conversation

HeinrichApfelmus
Copy link
Contributor

@HeinrichApfelmus HeinrichApfelmus commented Aug 16, 2021

Issue Number

ADP-780

Overview

This pull request removes a race condition in postTransactionOld where concurrent calls to the POST transactions endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend.

To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in postTransactionOld. Calls with different wallet IDs still run concurrently, and so do all other endpoints.

The main idea of this pull request is to introduce a small utility Control.Concurrent.Concierge that keeps track of a collection of locks. It provides a function atomicallyWith that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad io-sim.

Progress

  • Introduce Concierge utility for managing a collection of locks.
  • Make critical section atomic.
  • Unit tests for Concierge
    • The Concierge actually makes things atomic
    • The lock will be release upon an exception.

Comments

@HeinrichApfelmus HeinrichApfelmus marked this pull request as draft August 16, 2021 15:56
Copy link
Contributor

@rvl rvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, thanks @HeinrichApfelmus!

I think the lock variable should be in WalletLayer not ApiLayer though.

Ideally, our database layer could also enforce constraints to prevent marking pending UTxO as pending, but I think that would first require a renovation of the DBLayer.

Comment on lines +46 to +51
-- Back in the old days, hotel concierges used to give out keys.
-- But after the cryptocurrency revolution, they give out locks. :)
-- (The term /lock/ is standard terminology in concurrent programming.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like you're drawing a long bow with this Concierge name. 😛

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😂 I have a sweet spot for figurative names. I briefly considered "Locksmith", but that suggested that the locks were being created on the fly instead of being rented out. The drawback of figurative names is that they take a moment of explanation, but the drawback of generic names like "lock collection", "lock factory", "lock registry", … is that it's easy to forget what they do.

@@ -1073,14 +1077,21 @@ data ApiLayer s (k :: Depth -> Type -> Type)
(TransactionLayer k)
(DBFactory IO s k)
(WorkerRegistry WalletId (DBLayer IO s k))
(Concierge IO WalletLock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to put this lock variable into the WalletLayer type, right?
Then we wouldn't need a Concierge.

As I understand the problem, the lock we need is a spending coin selection lock. While there is a coin selection running, if the resulting TxIns will be marked as spent (pending), then we must prevent other threads from selecting and spending any UTxO from the same wallet.

Unfortunately, as you previously pointed out, much of postTransactionOld should be in the Wallet layer not API server layer. But it should still be possible to put this lock variable in the Wallet layer where it belongs.

Also, a future version of constructTransaction (the new transaction API) will have a query parameter to mark the UTxO which were selected as spent (pending). So we will need exactly the same coin selection locking there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to put this lock variable into the WalletLayer type, right?
Then we wouldn't need a Concierge.

Actually, it's not a single lock for the entire layer, but one lock for each wallet id. 😅 The first approach that comes to mind for managing these locks would be to create each one lock at the beginning of the lifecycle of each wallet id. However, I think that it's less intrusive to use this Concierge thing instead, because it doesn't need to know about the lifecycle.

Unfortunately, as you previously pointed out, much of postTransactionOld should be in the Wallet layer not API server layer. But it should still be possible to put this lock variable in the Wallet layer where it belongs.

My current thinking is that these locks around postTransactionOld are supposed to be a quick fix for an issue that Binance is having, but not more. Imposing an ordering on the POST transactions endpoint seemed like an adequate solution for that, and that is something I would attribute to the ApiLayer, even though I (perhaps paradoxically) agree that postTransactionOld should be part of the wallet layer.

On the Wallet layer, the possibility to do selectAssets and submitTx separately makes it possible to circumvent the lock. Hence, on the wallet layer, I think that postTransactionOld should come without a lock. I think that for this layer, we need a more principled solution, especially because it is possible to do a coin selection and then discard the result without ever submitting to the chain, e.g. as estimateFee does. I would prefer a solution in terms of pure data dependencies: A user can do as many coin selections as they like, but each of these will reference the wallet state from which the selection was made. Submitting one selection to the wallet will render the others invalid, because they do not fit to the new wallet state; this could result in a helpful error message.

TL;DR: My preference would be more pure code (later), less locks. 😅 Hence this "throwaway" solution to locking the POST transactions endpoint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I appreciate the quick fix approach, because as the name suggests, postTransactionOld is not going to be around for long.

I probably should have said it explicitly - there is one WalletLayer per WalletId, so it seems to me like the natural home for this lock variable. The lock would be needed in the medium term, even after postTransactionOld has gone.

I think we pretty much agree but perhaps don't have a common vocabulary for this thing. I differentiate between "coin selections" and "spending coin selections" - the latter is where the TxIns to be spent must be immediately marked as Used with a Pending transaction. The former don't need a lock, the latter do.

It would be preferable for the wallet to run spending coin selections sequentially, rather than potentially return error messages about conflicts, invalid txins, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have said it explicitly - there is one WalletLayer per WalletId, so it seems to me like the natural home for this lock variable.

Oh. I did not know that. This fact was not apparent to me, because all the functions in Cardano.Wallet take a WalletId and go through great pains to throw an ErrNoSuchWallet exception if that Id is invalid, hence I assumed that the WalletLayer cannot be tailored to be a specific WalletId, for then there would be no reason to throw this exception. Oh my. Can I vote to put this on the "to renovate" wishlist? 😅

Hm. For the sake of argument, let me try to find a way to get a lock into the WalletLayer. A WalletLayer is created only with hoistResource applied to an ApiLayer:

https://github.com/input-output-hk/cardano-wallet/blob/3f9513b926f0c9ebb7884da386e7419e8da56e27/lib/core/src/Cardano/Wallet/Api.hs#L1090-L1095

This is used in withWorkerCtx which is called in pretty much every a REST API endpoint:

https://github.com/input-output-hk/cardano-wallet/blob/3f9513b926f0c9ebb7884da386e7419e8da56e27/lib/core/src/Cardano/Wallet/Api/Server.hs#L3109-L3140

Oh my. This means that in order to create a lock in the WalletLayer, I would have to create it in the ApiLayer first, and then percolate it to the right wallet Id. In other words, the collection of locks (the equivalent of the Concierge thing) will not go away, instead, now it also has to be propagated to each WalletLayer. And I would have to think about the lifecycle: When do I create the lock, e.g. call newMVar? I suppose in registerWorker, but I'm not entirely sure actually.

https://github.com/input-output-hk/cardano-wallet/blob/3f9513b926f0c9ebb7884da386e7419e8da56e27/lib/core/src/Cardano/Wallet/Api/Server.hs#L3071-L3102

Hm. I think the easiest solution is the following: If you feel that the WalletLayer should at least know about the locks, then I can add a Concierge field to the WalletLayer and propagate a vanilla copy from the ApiLayer. But adding a lifecycle for the lock seems to be more trouble than it's worth at the moment; I would rather work on the DB layer renovation which will allow us to make the code much more pure, which in turn makes most of these problems disappear. 😅

@HeinrichApfelmus HeinrichApfelmus marked this pull request as ready for review August 17, 2021 13:00
@rvl rvl force-pushed the HeinrichApfelmus/ADP-780/coin-selection-lock branch from c9d0f28 to ff7d311 Compare August 19, 2021 06:06
Copy link
Contributor

@rvl rvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think my suggestion is possible and better, but this looks like a fix, and it's good quality too. Thanks 👍

@rvl
Copy link
Contributor

rvl commented Aug 19, 2021

I rebased and squashed...

bors r+

iohk-bors bot added a commit that referenced this pull request Aug 19, 2021
2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus

### Issue Number

ADP-780

### Overview

This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend.

To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints.

The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`.

### Progress

- [x] Introduce `Concierge` utility for managing a collection of locks.
- [x] Make critical section atomic.
- [x] Unit tests for `Concierge`
  - [x] The `Concierge` actually makes things atomic
  - [x] The lock will be release upon an exception.

### Comments


Co-authored-by: Heinrich Apfelmus <[email protected]>
We use the monad classes from `io-classes` in order to:
* Make concurrency testable with QuickCheck
* Better integrate with the monad transformers where `atomically` is used
We use a`Concierge` for locks of type `WalletLock` to enforce sequential ordering.
@rvl rvl force-pushed the HeinrichApfelmus/ADP-780/coin-selection-lock branch from ff7d311 to 5b3e7ba Compare August 19, 2021 07:14
@rvl
Copy link
Contributor

rvl commented Aug 19, 2021

I'm not sure what happened to the bors job here. It seems to have been cancelled or something. Rebased again...

bors r+

iohk-bors bot added a commit that referenced this pull request Aug 19, 2021
2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus

### Issue Number

ADP-780

### Overview

This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend.

To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints.

The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`.

### Progress

- [x] Introduce `Concierge` utility for managing a collection of locks.
- [x] Make critical section atomic.
- [x] Unit tests for `Concierge`
  - [x] The `Concierge` actually makes things atomic
  - [x] The lock will be release upon an exception.

### Comments


Co-authored-by: Heinrich Apfelmus <[email protected]>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Aug 19, 2021

Build failed:

[ 5 of 62] Compiling Cardano.Wallet.Api.Server.TlsSpec ( test/unit/Cardano/Wallet/Api/Server/TlsSpec.hs, dist/build/unit/unit-tmp/Cardano/Wallet/Api/Server/TlsSpec.o )
[ 6 of 62] Compiling Cardano.Wallet.ApiSpec ( test/unit/Cardano/Wallet/ApiSpec.hs, dist/build/unit/unit-tmp/Cardano/Wallet/ApiSpec.o )
iserv-proxy: {handle: <file descriptor: 27>}: GHCi.Message.remoteCall: end of file
remote-iserv.exe: {handle: <socket: 152>}: GHCi.Message.remoteCall: end of file
/nix/store/szr7mz98k78c1ivdhvnn0b393bhh5bk7-stdenv-linux/setup: line 1313:   104 Killed                  $SETUP_HS build test:unit -j$(($NIX_BUILD_CORES > 4 ? 4 : $NIX_BUILD_CORES)) --ghc-option=-fexternal-interpreter --ghc-option=-pgmi --ghc-option=/nix/store/8fsmnh6757axnmiiv5bx6bbqgsy2mhwz-iserv-wrapper/bin/iserv-wrapper --ghc-option=-L/nix/store/m4vpyw0cc16hnndaj78lwz154xdamr7b-mingw-w64-x86_64-w64-mingw32-6.0.0-pthreads-x86_64-w64-mingw32/lib --ghc-option=-L/nix/store/m4vpyw0cc16hnndaj78lwz154xdamr7b-mingw-w64-x86_64-w64-mingw32-6.0.0-pthreads-x86_64-w64-mingw32/bin --ghc-option=-L/nix/store/nkzf7225nhqhpljnh73r20wvg3qjxz9y-gmp-6.2.1-x86_64-w64-mingw32/lib
builder for '/nix/store/7y53xpksyrjybzafv25slfdi15sm5h17-cardano-wallet-core-test-unit-x86_64-w64-mingw32-2021.8.11.drv' failed with exit code 137

Seems like a build process was killed, not sure why. Monitoring shows that it wasn't OOM.
image

#2703

@rvl
Copy link
Contributor

rvl commented Aug 20, 2021

bors r+

iohk-bors bot added a commit that referenced this pull request Aug 20, 2021
2827: Remove race condition in `postTransactionOld` where pending (spent) UTxO could be selected as inputs r=rvl a=HeinrichApfelmus

### Issue Number

ADP-780

### Overview

This pull request removes a race condition in `postTransactionOld` where concurrent calls to the `POST transactions` endpoint could result in UTxOs being selected twice as inputs, resulting in an attempted double-spend.

To remove the race condition, for each wallet ID we enforce sequential execution of the critical section in `postTransactionOld`. Calls with different wallet IDs still run concurrently, and so do all other endpoints.

The main idea of this pull request is to introduce a small utility `Control.Concurrent.Concierge` that keeps track of a collection of locks. It provides a function `atomicallyWith` that does what it names suggests. Being polymorphic, this utility can be tested with unit tests using the IO simulation monad `io-sim`.

### Progress

- [x] Introduce `Concierge` utility for managing a collection of locks.
- [x] Make critical section atomic.
- [x] Unit tests for `Concierge`
  - [x] The `Concierge` actually makes things atomic
  - [x] The lock will be release upon an exception.

### Comments


Co-authored-by: Heinrich Apfelmus <[email protected]>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Aug 20, 2021

Build failed:

Cached failure, oops. After restarting the failed builds, the jobset has succeeded

Also weeder failed later on Buildkite, but it's not related to this PR.

#duplicate

@rvl rvl merged commit 0306078 into master Aug 20, 2021
@rvl rvl deleted the HeinrichApfelmus/ADP-780/coin-selection-lock branch August 20, 2021 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants