Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesigning bdk_chain around a "ChainOracle" #895

Closed
Tracked by #50
LLFourn opened this issue Mar 16, 2023 · 7 comments
Closed
Tracked by #50

Redesigning bdk_chain around a "ChainOracle" #895

LLFourn opened this issue Mar 16, 2023 · 7 comments
Assignees
Milestone

Comments

@LLFourn
Copy link
Contributor

LLFourn commented Mar 16, 2023

BDK currently depends heavily on a SparseChain to tell it whether transactions are in the chain or not. This proposal is to redesign it around a more generic ChainOracle trait which can tell you whether a (height, block_hash) is in the chain or not and use this information to decide whether a transaction is in the chain. We can do this because for each transaction we can always know a block header and hash that implies that it is in the chain.

A SparseChain can also fill the role of a ChainOracle but this change would allow us use other sources of chain data to fulfil the need. For example with our pending nakamoto cbf implementation we could use the on-chain block headers as the ChainOracle.

This proposal is to remove SparseChain as a core mechanism and instead pass in a impl ChainOracle into methods that need to know about the current state of the chain.

trait ChainOracle {
    type Error;
    fn is_in_best_chain(&self, block_height: u32, block_hash: BlockHash) -> Result<bool, Self::Error>;
}

Removing transactions from SparseChain

I think we should remove transactions from the sparse chain. Without needing to "re-org" the chain there is no real need to have transaction location so closely coupled to the blocks.

Instead of storing the "position" of a transaction within SparseChain we could easily store it in TxGraph in a monotone way. Imagine a field like this in TxGraph:

positions: HashSet<(Txid, (u32, BlockHash))>

Then when you want to figure out if txid is in the chain you can query the heights and hashes that it has been seen in and see if any of those blocks are in the canonical chain.

Note that we don't need this position to be the blockhash the transaction is in. We could call this the "anchor block" -- if it's in the chain then this txid is in the chain. This means we don't need to make additional queries to electrum/esplora to figure out the block hash that it's actually in.

So maybe it could be more like this:

positions: HashSet<(Txid, A)>

where A does the Anchor trait which has one method to return the block hash and height whose presence implies the existence -- it may be the height the transaction is actually at or it may be simple the tip that implies its existence. This allows different applications to store different data about their transaction's positions.

Indexing the chain data

Now that the chain is out of the way and outside of the transaction system we can put the indexing of transactions next to the graph.

pub struct IndexedTxGraph<I> {
    graph: TxGraph<T>,
    indexer: I
}

// Where I does a trait like this
pub trait TxIndexer {
    // index a whole tx
    fn index_tx(&mut self, tx: &Transaction);
    fn index_txout(&mut self, outpoint: OutPoint, txout: &TxOut);
}

Getting a UTXO set

How do we get a a UTXO set from a set of TxOuts (and outpoints) owned by the wallet, a transaction graph and information about the blocks in the longest chain. The first thing to note is that we can built a predicate is_in_chain(chain: &_, tx_graph: &TxGraph, txid) from the information about the longest chain. We simply iterate through all positions of the txid (stored in TxGraph) and return true if one of
those blocks exist in the longest chain. Keep in mind that the longest chain can be modelled a SparseChain or could be a header chain etc.

The naive algorithm

Given is_in_chain we can easily create a predicate is_unspent for any OutPoint. With is_unspent we can just call it on every OutPoint in that we own to find a utxo set.

fn is_unspent(chain, tx_graph, txout) {
     if !is_in_chain(chain, tx_graph, txid_of_txout) {
           return false;
     }
     for spending_txid in tx_graph.spends(outpoint_of_txout) {
         if is_in_chain(chain, tx_graph, spending_txid) {
             return false;
         }
     }
    true
}

Note that this algorithm has complexity of O(T) where T is the number of txouts you've ever owned. It also makes multiple calls to is_in_chain per outpoint. This might be slow especially if you have to do disk access or query an external service. To make it a bit faster you could cache the result of is_in_chain and apply positive results to each of the transaction's parents (a child being in chain implies a parent).

Optimized algorithms for later development

There should be faster algorithms for discovering this that don't require iterating over the entire txout set. For example, the TxGraph could index the set of UTXOs. From there you can visit each one and check if the tx containing it is in the chain. If so, add it to the UTXO set, if not add the parent outputs of the transaction to the candidates list.

How to handle unconfirmed transactions

We can model unconfirmed transactions as those with no anchors to blocks that are in the chain. We of course need to distinguish between those unconfirmed transactions that are invalid because they double spend (or their parents double spend) a tx that is in the chain and those that could be in the mempool.

To the naive algorithm above, when you encounter a output that is not in the chain you can add it to "unconfirmed" list and then from the unconfirmed list decide which of these outputs can exist in the mempool by resolving conflicts in the graph.

Work plan

While writing this document I discovered how I felt about priorities:

  1. Create a type that incorporates a TxGraph but also has the "anchor" positions. It could just be done in TxGraph itself (e.g. TxGraph<Anchor>).
  2. Create a ChainOracle trait which has a single method to return whether a tx is confirmed given a (block hash, height).
  3. Add *_in_chain functions that exist on ChainGraph to TxGraph but take a something implementing the ChainOracle trait.
  4. Create a type that has a TxGraph and something implementing the TxIndexer trait.
    Every time a tx is inserted into this type it will add it to the underlying graph and the TxIndexer (e.g. IndexedTxGraph<Anchor, Indexer>). Implement TxIndexer for KeychainTxOutIndex and SpkTxOutIndex.
  5. Move the functionality from KeychainTracker to an impl of IndexedTxGraph<Anchor, KeychainTxOutIndex>. wd
    For things that need chain data like full_utxos you'll need to pass in a ChainOracle.
  6. Create a new type that is SparseChain without the transactions (SparseChain2). Implement ChainOracle for SparseChain2.
  7. Make esplora and electrum return a TxGraph<A> and a SparseChain2 as their update
  8. Design bdk's Wallet around this
  9. Remove SparseChain (or rather rename SparseChain2 to SparseChain), ChainGraph and KeychainTracker

Downsides

The downsides of this idea are:

  • It's harder to use the information about what's in the mempool from esplora and electrum. Right now we are able to evict transactions that conflict with things that we find in the mempool from the SparseChain. With this change we don't really have a way to use this information so which of the conflicting unconfirmed txs you choose as canonical will be somewhat arbitrary. To fix this we could consider having the "last-time-seen" in mempool as part of the TxGraph position for that. This can be monotone so whenever you see it again the in the mempool you replace the existing value if it's larger. Then when you are trying to figure out who's in the mempool and who isn't (from your limited view) you can choose the one that was seen most recently between two conflicts.

  • It's hard to see how this improves Wallet. This gives much greater flexibility to users of bdk_chain but I'm not sure if we'll be able to realize this improvements for users of Wallet. To do so we'd have to not model the canonical chain inside wallet and leave it to the user to provide a impl ChainOracle. The biggest problem here is that if the user was using a SparseChain they'd have to persist changes to it separately which muddies the "batteries included" design goal of Wallet .

  • Some ChainOracle's will have to return Result<bool, _> rather than just bool for whether the block hash and height are in the chain. Some of them might even want to be async! So we're kind of back with offering multiple method for each method that needs to know what's in chain e.g. balance, try_balance and try_balance_async!

@LLFourn LLFourn changed the title Redesigning bdk around a "ChainOracle" Redesigning bdk_chain around a "ChainOracle" Mar 16, 2023
@RCasatta
Copy link
Member

RCasatta commented Mar 16, 2023

I am not following closely lately so I may miss some pieces.

I would like to understand if the following "sync mode" could be supported by this design.

I have a completely offline device, I physically bring the full utxo-set at some height (eg. the file produced by bitcoin-cli dumptxoutset utxo.bin). I sync my wallet against the utxo set, and I get the balance and my utxos but txs history and utxo confirmation height are unknown. I can create/sign a PSBT and bring it to an online device to broadcast.
Obviously, the txs created may be invalid if utxos have been reorged or spent later than when the utxo snapshot has been made, but that's out of scope.

I don't think not knowing the height is a big issue, but wanted to share it so that could be considered at this design stage.

@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 16, 2023

Hey @RCasatta. So in this case you already have a utxo set so all of this tooling is irrelevant. The thing you want is to be able to build transactions without having to use bdk::Wallet or anything from bdk_chain. This is something on the main mental roadmap we have to do before releasing bdk 1.0.0 properly.

This issue which will soon be ported to the bdk repo is probably the best description of how this will work: LLFourn/bdk_core_staging#40

You'll probably need to provide a descriptor for each UTXO though. I think the coin selection stuff will just require you to know the witness weight.

@evanlinjin
Copy link
Member

evanlinjin commented Mar 20, 2023

I have another usecase, similar to what @RCasatta has mentioned, but more tailored for the bdk_chain structures.

If we are running a pruned node, the initial wallet scan will need to depend on bitcoin-cli scantxoutset. The output of this provides heights of unspents, but not blockhashes in which those unspents reside. We can just use getblockhash, but what if a reorg happens just in time so that we get the wrong blockhash associated with our unspent?

Does bitcoind have a way where we can get the blockhash of height, given the "best block hash" (a.k.a. block hash at a previous tip)?


I jsut thought of the solution. The solution to this is to get the tip's blockhash before the scan, and ensure the blockhash still exists after our scan + after we have all the blockhashes that we need.

Is there a better way to do this?

@LLFourn
Copy link
Contributor Author

LLFourn commented Mar 21, 2023

I jsut thought of the solution. The solution to this is to get the tip's blockhash before the scan, and ensure the blockhash still exists after our scan + after we have all the blockhashes that we need.

Is there a better way to do this?

Nope I think this is what we have to do. The correct long term solution IMO is to fix bitcoind and electrum because height by itself is very ambiguous.

danielabrozzoni added a commit that referenced this issue Apr 28, 2023
b799a57 [bdk_chain_redesign] Add tests for `IndexedTxGraph` with `LocalChain` (rajarshimaitra)
8cd0328 [bdk_chain_redesign] Implement `OwnedIndexer` for indexers (rajarshimaitra)
911af34 [bdk_chain_redesign] Fix calculation bugs. (rajarshimaitra)
e536307 [bdk_chain_redesign] Fix `tx_graph::Additions::append` logic (志宇)
f101dde [bdk_chain_redesign] Fix `tx_graph::Additions::append` logic (志宇)
1b15264 [bdk_chain_redesign] Change `insert_relevant_txs` method (志宇)
ecc74ce [bdk_chain_redesign] Docs for `is_mature` and `is_confirmed_and_spendable` (志宇)
ac336aa [bdk_chain_redesign] Make `insert_relevant_txs` topologically-agnostic (志宇)
165b874 [bdk_chain_redesign] Add test for `insert_relevant_txs` (志宇)
f3e7b67 [bdk_chain_redesign] Various tweaks and fixes (志宇)
03c1283 [bdk_chain_redesign] Revert changes to `SparseChain` (志宇)
34a7bf5 [bdk_chain_redesign] Rm unnecessary code and premature optimisation (志宇)
6c49570 [bdk_chain_redesign] Rm `HashSet` from `TxGraph::relevant_heights` (志宇)
1003fe2 [bdk_chain_redesign] Test `LocalChain` (志宇)
7175a82 [bdk_chain_redesign] Add tests for `TxGraph::relevant_heights` (志宇)
8e36a2e [bdk_chain_redesign] Remove incomplete logic (志宇)
81436fc [bdk_chain_redesign] Fix `Anchor` definition + docs (志宇)
001efdd Include tests for new updates of TxGraph (rajarshimaitra)
10ab77c [bdk_chain_redesign] MOVE `TxIndex` into `indexed_chain_graph.rs` (志宇)
7d92337 [bdk_chain_redesign] Remove `IndexedTxGraph::last_height` (志宇)
a7fbe0a [bdk_chain_redesign] Documentation improvements (志宇)
ee1060f [bdk_chain_redesign] Simplify `LocalChain` (志宇)
611d2e3 [bdk_chain_redesign] Consistent `ChainOracle` (志宇)
bff80ec [bdk_chain_redesign] Improve `BlockAnchor` docs (志宇)
24cd8c5 [bdk_chain_redesign] More tweaks and renamings (志宇)
ddd5e95 [bdk_chain_redesign] Modify signature of `TxIndex` (志宇)
da4cef0 [bdk_chain_redesign] Introduce `Append` trait for additions (志宇)
89cfa4d [bdk_chain_redesign] Better names, comments and generic bounds (志宇)
6e59dce [bdk_chain_redesign] `chain_oracle::Cache` (志宇)
a7eaebb [bdk_chain_redesign] Add serde support for `IndexedAdditions` (志宇)
c09cd2a [bdk_chain_redesign] Added methods to `LocalChain` (志宇)
7810059 [bdk_chain_redesign] `TxGraph` tweaks (志宇)
a63ffe9 [bdk_chain_redesign] Simplify `TxIndex` (志宇)
a1172de [bdk_chain_redesign] Revert some API changes (志宇)
8c90617 [bdk_chain_redesign] Make default anchor for `TxGraph` as `()` (志宇)
468701a [bdk_chain_redesign] Initial work on `LocalChain`. (志宇)
34d0277 [bdk_chain_redesign] Rm anchor type param for structs that don't use it (志宇)
3440a05 [bdk_chain_redesign] Add docs (志宇)
236c50f [bdk_chain_redesign] `IndexedTxGraph` keeps track of the last synced height (志宇)
e902c10 [bdk_chain_redesign] Fix `apply_additions` logic for `IndexedTxGraph`. (志宇)
313965d [bdk_chain_redesign] `mut_index` should be `index_mut` (志宇)
db7883d [bdk_chain_redesign] Add balance methods to `IndexedTxGraph` (志宇)
d0a2aa8 [bdk_chain_redesign] Add `apply_additions` to `IndexedTxGraph` (志宇)
6cbb18d [bdk_chain_redesign] MOVE: `IndexedTxGraph` into submodule (志宇)
784cd34 [bdk_chain_redesign] List chain data methods can be try/non-try (志宇)
43b648f [bdk_chain_redesign] Add `..in_chain` methods (志宇)
61a8606 [bdk_chain_redesign] Introduce `ChainOracle` and `TxIndex` traits (志宇)
5ae5fe3 [bdk_chain_redesign] Introduce `BlockAnchor` trait (志宇)

Pull request description:

  ### Description

  This is part of #895

  The initial `bdk_chain` structures allowed updating to be done without blocking IO (alongside many other benefits). However, the requirement to have transactions "perfectly positioned" in our `SparseChain` increased complexity of the syncing API. Updates needed to be meticulously crafted to properly connect with the original `SparseChain`. Additionally, it forced us to keep a local copy of the "best chain" data (which may not always be needed depending on the chain source).

  The redesigned structures, as introduced by this PR, fixes these shortcomings.

  1. Instead of `SparseChain`, we introduce the ability to `Anchor` a transaction to multiple blocks that may or may not be in the same chain history. We expand `TxGraph` to records these anchors (while still maintaining the *monotone* nature of `TxGraph`). When updating our new `TxGraph`, we don't need to complicated *are-we-connected* checks that we need for `SparseChain`.
  2. The chain source, in combination with our new`TxGraph` is all that we need to determine the "chain position" of a transaction. The chain source only needs to implement a single trait `ChainOracle`. This typically does not need to be updated (as it is remote), although we can have a special `ChainOracle` implementation that is stored locally. This only needs block height and hash information, reducing the scope of *non-monotine* structures that need to be updated.

  **What is done:**

  * `TxGraph` includes anchors (ids of blocks that the tx is seen in) and last-seem unix timestamp (for determining which non-confirmed tx we should consider as part of "best chain" if there are conflicts). This structure continues to be fully "monotone"; we can introduce data in any manner and not risk resulting in an inconsistent state.
  * `LocalChain` includes the "checkpoint" logic of `SparseChain` but removes the `txid` data. `LocalChain` implements the `ChainOracle` trait. Any blockchain-source can also implement the `ChainOracle` trait.
  * `IndexedTxGraph` is introduced and contains two fields; an internal `TxGraph` struct and a `TxIndex` implementation. These two fields will be updated atomically and can replace the functionality of `keychain::Tracker`.

  **What is in-progress:**

  * ~@evanlinjin: The `TxIndex` trait should not have `is_{}_relevant` methods as we cannot guarantee this across all transaction indexes. We should introduce extension traits for these (#926 (comment)
  * ~@evanlinjin: `BlockAnchor` should be defined as "if this block is in chain, then this tx must be in chain". Therefore, the anchor does not provide us with the exact confirmation height of the tx. We need to introduce an extension trait `ExactConfirmationHeightAnchor` for certain operations (#926 (comment)

  **What will be done external to this PR:**

  * @rajarshimaitra: Persisting `indexed_tx_graph::Additions` (#937).
  * @notmandatory: Update examples to use redesigned structures.
  * Update `bdk::Wallet` to use the redesigned structures.

  ### Changelog notice

  * Initial implementation of the `bdk_chain` redesign (as mentioned in #895). Currently, only the `bdk_chain` core structures are implemented. Persistence and updates to the current examples and `bdk::Wallet` will be done in separate PRs.

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits
  * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md)
  * [x] I ran `cargo fmt` and `cargo clippy` before committing

  #### New Features:

  * [x] I've added tests for the new feature
  * [x] I've added docs for the new feature

ACKs for top commit:
  rajarshimaitra:
    ACK b799a57
  danielabrozzoni:
    Partial ACK b799a57 - good job! I haven't looked at the tests or at the methods implementations in depth, but I have reviewed the architecture and it looks good. I have a few non-blocking questions.

Tree-SHA512: 8c386354cbd02f0701b5134991b65d206a54d19a2e78ab204e6ff1fa78a18f16299051bc0bf4ff4d2f5a0adab9b15658fa53cd0de2ca16969f4bf6a485225082
@notmandatory notmandatory added this to BDK May 4, 2023
@notmandatory notmandatory removed this from the 1.0.0 milestone May 4, 2023
evanlinjin added a commit that referenced this issue May 10, 2023
4963240 Add more `impl`s for `Append` and docs for file store `magic` (志宇)
2aa08a5 [persist_redesign] Introduce redesigned `persist` types (志宇)

Pull request description:

  ### Description

  This is part of #895 and #971

  * Introduce a more generic version of the `keychain::persist::*` structures that only needs a single generic for the changeset type.

  Additional changes:

  * The `Append` trait has a new method `is_empty`.
  * Introduce `Store` structure for `bdk_file_store` (which implements `PersistBackend`).

  ### Changelog notice

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits
  * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md)
  * [x] I ran `cargo fmt` and `cargo clippy` before committing

  #### New Features:

  * [x] I've added tests for the new feature
  * [x] I've added docs for the new feature

Top commit has no ACKs.

Tree-SHA512: 0211fbe7d7e27805d3ed3a80b42f184cdff1cebb32fd559aa9838e4a7f7c7e47b6c366b6ef68e299f876bafed549b8d1d8b8cc0366bf5b61db079504a565b9b4
@notmandatory notmandatory moved this to In Progress in BDK May 23, 2023
@notmandatory notmandatory added this to the 1.0.0-alpha.1 milestone May 23, 2023
@notmandatory notmandatory moved this from In Progress to Needs Review in BDK Jun 2, 2023
@notmandatory
Copy link
Member

Should be fixed in #976

@notmandatory
Copy link
Member

@LLFourn can we close this issue and/or move remaining redesign into separate issues?

@LLFourn
Copy link
Contributor Author

LLFourn commented Jun 20, 2023

fixed in #976

@LLFourn LLFourn closed this as completed Jun 20, 2023
@github-project-automation github-project-automation bot moved this from Needs Review to Done in BDK Jun 20, 2023
fanquake added a commit to bitcoin/bitcoin that referenced this issue Jul 28, 2024
…o scantxoutset output

17845e7 rpc: add utxo's blockhash and number of confirmations to scantxoutset output (Luis Schwab)

Pull request description:

  This PR resolves #30478 by adding two fields to the `scantxoutset` RPC:
  - blockhash: the blockhash that an UTXO was created
  - confirmations: the number of confirmations an UTXO has relative to the chaintip.

  The rationale for the first field is that a blockhash is a much more reliable identifier than the height:
  > When using the scantxoutset RPC, the current behaviour is to show the block height of the UTXO. This is not optimal, as block height is ambiguous, especially in the case of a block reorganization happening at the same instant of the query. In this case, an UTXO that does not exist would be assumed to exist, unless the chain's tip hash is recorded before the scan, and make sure it still exists after, as per bitcoindevkit/bdk#895 (comment) comment by evanlinjin.

  The second one was suggested by maflcko, and I agree it's useful for human users:
  > While touching this, another thing to add could be the number of confirmations? I understand that this wouldn't help machine consumers of the interface, but human callers may find it useful?

  This will yield an RPC output like so:

  ```diff
  bitcoin-cli scantxoutset start "[\"addr(bc1q5q9344vdyjkcgv79ve3tldz4jmx4lf7knmnx6r)\"]"
  {
    "success": true,
    "txouts": 185259116,
    "height": 853622,
    "bestblock": "00000000000000000002e97d9be8f0ddf31829cf873061b938c10b0f80f708b2",
    "unspents": [
      {
        "txid": "fae435084345fe26e464994aebc6544875bca0b897bf4ce52a65901ae28ace92",
        "vout": 0,
        "scriptPubKey": "0014a00b1ad58d24ad8433c56662bfb45596cd5fa7d6",
        "desc": "addr(bc1q5q9344vdyjkcgv79ve3tldz4jmx4lf7knmnx6r)#smk4xmt7",
        "amount": 0.00091190,
        "coinbase": false,
        "height": 852741,
  +     "blockhash": "00000000000000000002eefe7e7db44d5619c3dace4c65f3fdcd2913d4945c13",
  +     "confirmations": 882
      }
    ],
    "total_amount": 0.00091190
  }
  ```

ACKs for top commit:
  sipa:
    utACK 17845e7
  Eunovo:
    ACK 17845e7
  tdb3:
    ACK 17845e7

Tree-SHA512: 02366d0004e5d547522115ef0efe6794a35978db53dda12c675cfae38197bf43f0bf89ca99a3d79e3d2cff95186015fe1ab764abb8ab82bda440ae9302ad973b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants