Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddressIndex improvements: LastUnused, FirstUnused, and get_batch_unused_addresses() #546

Closed
wants to merge 5 commits into from

Conversation

nickfarrow
Copy link
Contributor

@nickfarrow nickfarrow commented Feb 16, 2022

Description

  • Change AddressIndex::LastUnused to look back further than current_index
  • Add AddressIndex::FirstUnused
  • Add get_batch_unused_addresses

Notes to the reviewers

Builds upon #522

Currently BDK supports address indexing via LastUnused, which will return the address with current_index if it is unused, otherwise it will return a New address.

With this current logic, if you get two new addresses A1 and A2 and use A2, then LastUnused will give you a New address rather than the unused A1.

In order to more consistently utilize unused addresses i've added a new function get_unused_key_indexes(keychain) which returns a vector of indexes for the unused addresses in that keychain. Making use of this function, LastUnused now returns the most recent address that has not yet been used, and New if all addresses have been used.

In some cases it may be desirable to utilize unused addresses that reside earlier in the keychain. i.e. AddressIndex::FirstUnused in this PR.

FirstUnused has the same caveat as LastUnused: that if the wallet has not yet detected an address has been used, it could return a used address.

Additionally a new public function get_batch_unused_addresses allows for retrieval of N unused addresses at once. Prioritizing unused addresses first, then populating the remaining with New addresses (like FirstUnused).

For example: if a wallet builds a transaction involving many New internal addresses but that transaction is never broadcast, then all of these addresses can now easily be used in a later transaction via get_batch_unused_addresses.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature
  • I've updated CHANGELOG.md

@LLFourn
Copy link
Contributor

LLFourn commented Mar 15, 2022

approach ACK. Needs rebase.

@nickfarrow nickfarrow force-pushed the first-unused branch 2 times, most recently from 1f4da05 to 8bf61de Compare March 17, 2022 07:33
@notmandatory notmandatory self-requested a review April 4, 2022 05:38
@notmandatory notmandatory added the new feature New feature or request label Apr 4, 2022
@notmandatory
Copy link
Member

Code changes look great but you'll need to do another rebase and can you add a signing key to Github and sign your commits when you do the rebase also?

@nickfarrow nickfarrow force-pushed the first-unused branch 4 times, most recently from f8875d8 to 05aeb23 Compare April 10, 2022 07:06
@nickfarrow
Copy link
Contributor Author

force push updated CI

Copy link
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK 05aeb23

Below are some suggested modification..

src/wallet/mod.rs Outdated Show resolved Hide resolved
src/wallet/mod.rs Show resolved Hide resolved
src/wallet/mod.rs Show resolved Hide resolved
src/wallet/mod.rs Show resolved Hide resolved
Comment on lines +3993 to +3956
// use the first address
crate::populate_test_db!(
wallet.database.borrow_mut(),
testutils! (@tx ( (@external descriptors, 0) => 25_000 ) (@confirmations 1)),
Some(100),
);

assert_eq!(
wallet.get_address(FirstUnused).unwrap().to_string(),
"tb1q4er7kxx6sssz3q7qp7zsqsdx4erceahhax77d7"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see the test situation here where we extract multiple addresses, use some of them and get back a previous unused one when called again.. That would correctly test the intended behavior.. Right now its just testing the vanilla situation..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I actually don't know a better test to write than this one? With the batch unused you can write a more complicated test but with FristUnused there's not much you can do.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like derive a bunch of address.. Only use some of them selectively so the address gaps are simulated.. Then check if the first unused is returned correctly.. Am I missing some details why that can't work??

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can work but I don't get why the gaps would effect the algorithm that finds the first unused. I mean I don't think that this will likely find a problem with the algorithm that this test wouldn't find.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not that the gaps would affect the algorithm, but to confirm that the behavior we are intending here is actually happening.. And this can be checked in single test for both first and last unused.. Once the behavior is pinned, we can decide later which one to use when or to keep both..

Copy link
Contributor

@LLFourn LLFourn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nickfarrow. Tests LGTM. See comments.

src/wallet/mod.rs Outdated Show resolved Hide resolved
src/wallet/mod.rs Outdated Show resolved Hide resolved
src/wallet/mod.rs Outdated Show resolved Hide resolved
src/wallet/mod.rs Outdated Show resolved Hide resolved
Comment on lines +3993 to +3956
// use the first address
crate::populate_test_db!(
wallet.database.borrow_mut(),
testutils! (@tx ( (@external descriptors, 0) => 25_000 ) (@confirmations 1)),
Some(100),
);

assert_eq!(
wallet.get_address(FirstUnused).unwrap().to_string(),
"tb1q4er7kxx6sssz3q7qp7zsqsdx4erceahhax77d7"
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I actually don't know a better test to write than this one? With the batch unused you can write a more complicated test but with FristUnused there's not much you can do.

@notmandatory
Copy link
Member

Hi, please rebase to pickup changes in #596. Thanks!

* get_batch_unused_addresses loops through address indexes `from_front = true` (for firstUnused) or `false` (for lastUnused).
* Relies on database having up to date script_pubkeys in such a manner
  that script_pks.len() == self.fetch_index(keychain)
* 1 script pubkey per address index?
* Must work with current_address_index = 0

Signed-off-by: nickfarrow <[email protected]>
.list_transactions(true)?
.iter()
// Return whether this address has been used in a transaction
fn has_address_been_used(&self, script_pk: &Script) -> bool {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually checking a Script..

let check_indexes = if from_front {
(0..=current_address_index).collect::<Vec<_>>()
} else {
(0..=current_address_index).rev().collect::<Vec<_>>()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better way to do this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return a impl DoubleEndedIterator from the method instead.

Copy link
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not comfortable with the current state and approach of the PR. Some behaviour are not maintained, like get_batch(n, false, keychain) will give the list of addresses in reverse order. Not in ascending order of indexes.

Also I am thinking isn't it better to mark used addresses directly in the database? Knowing which one is used and not and saving the data seems to me more useful than figuring it out by transaction matching with the entire tx list, everytime we ask for an unused..

This will also simplify the LastUnused and FirstUnused fetching logic..

@LLFourn @afilini do you have any thought on this??

@@ -389,6 +387,62 @@ where
Ok(new_addresses_cached)
}

/// Return vector of n unused addresses from the [`KeychainKind`].
/// If less than n unused addresses are returned, the rest will be populated by new addresses.
/// The unused addresses returned are in order of oldest in keychain first, with increasing index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not as per impl right now.. if from_front is set false, the addresses are returned in reverse order..

.list_transactions(true)?
.iter()
// Return whether this address has been used in a transaction
fn has_address_been_used(&self, script_pk: &Script) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guy is better named as is_scriptpubkey_used..

Comment on lines 393 to 398
pub fn get_batch_unused_addresses(
&self,
n: usize,
from_front: bool,
keychain: KeychainKind,
) -> Result<Vec<AddressInfo>, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not feeling comfortable with the API. get_batched_unused should not be concerned with fornt or back. Thats an impl detail for LastUnused or FirstUnused. And should not be exposed in public API..

This is also breaking the doc above. The order is not maintained anymore..

Better to handle the handle the first or last logic in in their respective functions itself than to handle in the batch function which is more generic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct thing is to return a impl DoubleEndedIterator over unused addresses I think.

Comment on lines 413 to 438
for i in check_indexes {
// if we have made a pubkey at this index, check whether the address has been used.
if i < script_pubkeys.len() {
let script_pk = &script_pubkeys[i];
if self.has_address_been_used(script_pk) {
continue;
}
}
if let Ok(unused_address) = self
.get_descriptor_for_keychain(keychain)
.as_derived(i as u32, &self.secp)
.address(self.network)
.map(|address| AddressInfo {
address,
index: i as u32,
keychain,
})
.map_err(|_| Error::ScriptDoesntHaveAddressForm)
{
unused_addresses.push(unused_address);
}

if unused_addresses.len() >= n {
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using some rust list comprehensions with iters and maps. Much of this code can be simplified..

Comment on lines 3983 to 4008
assert_eq!(
wallet
.get_batch_unused_addresses(3, true, KeychainKind::External)
.unwrap(),
vec![
AddressInfo {
index: 0,
address: Address::from_str("tb1q6yn66vajcctph75pvylgkksgpp6nq04ppwct9a")
.unwrap(),
keychain: KeychainKind::External,
},
AddressInfo {
index: 2,
address: Address::from_str("tb1qzntf2mqex4ehwkjlfdyy3ewdlk08qkvkvrz7x2")
.unwrap(),
keychain: KeychainKind::External,
},
AddressInfo {
index: 3,
address: Address::from_str("tb1q32a23q6u3yy89l8svrt80a54h06qvn7gnuvsen")
.unwrap(),
keychain: KeychainKind::External,
}
]
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to assert that FirstUnused and LastUnused are working as intended..

  • Get 5 new addresses
  • Use only index 0 and 3
  • get_batch(3) should return index 0, 2, 4. current index should still be at 4.
  • get_first_unused() should return 0
  • get_last_unused() should return 4
  • get_batch(4) should return 0,2,4,5, and current index should be at 5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a check of the derivation index here would be good.

// if we have made a pubkey at this index, check whether the address has been used.
if i < script_pubkeys.len() {
let script_pk = &script_pubkeys[i];
if self.has_address_been_used(script_pk) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here for each spk we are iterating over the entire transaction list. For wallets with large transaction this will can cause massive overhead.

Instead a better way would be to handle Vec<Script> in the has_address_been_used function. Call list_transactions only once, filter out all the spks that haven't been used and return then as a Vec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep good idea, note this would then also check more addresses than necessary (not breaking early when finding n).
If this was stored in the database it could also break early if it's just a fast read

@rajarshimaitra
Copy link
Contributor

@nickfarrow also try to rebase on top of master instead of fetching and merging specific commit next time.. :)
That makes the commit history much cleaner and also applies your changes on top of current master..
just do git rebase master from the PR branch..

@notmandatory notmandatory removed this from the Release 0.19.0 Feature Freeze milestone May 11, 2022
@notmandatory
Copy link
Member

I had to push this PR to the next release so the team can focus on #593, and after that one you'll probably need to rebase again. But then I promise we'll work on getting this one in. :-)

@nickfarrow
Copy link
Contributor Author

Yep I think this one needs some bigger discussion first around whether it is worthwhile to mark used addresses directly in the database as @rajarshimaitra suggested.

If this were the case, this could be simplified to a function get_unused_addresses that (quickly) gets an iterator over unused addresses in the database up to the current addressindex. With FirstUnused and LastUnused addresses at either ends.

This PR's get_batch_unused_addresses (currently fetches n) would be superseded by get_unused_addresses where the user gets all the unused addresses and handles them how they desire.

Not sure what changed with old commits, possible I added signoff lines to the previous commits by mistake which may have updated them sorry. Will take care with next.

@rajarshimaitra
Copy link
Contributor

Upon further thoughts on this I have this rough idea of how it can be done:

  • Add a new column in the script_pubkey table named used. Which will have binary value 0/1, defaulting to 0.
  • At each sync when we add utxos to the Utxo table, mark the pubkey as used too in the script_pubkey table.
  • Make a generic get_batch_unused() that will return a list of unused addresses, checking the used flag in the database.
  • Make FirstUnused as the front pop of the list, and LastUnused as the back pop of the list..

Pro:
Much more scalable than transaction list scanning for wallets with large list of transactions.

Cons:
This is going to change the DB structure and the BatchDatabase API.

I am willing to work on fleshing an impl out if this has Approach Acks..

@afilini
Copy link
Member

afilini commented Jun 28, 2022

Keep in mind we also have the key/value db, we don't have tables and columns there. We can add a flag to mark a script as used (similarly to how I suggested adding a flag for scripts that we've already setup rather than relying on just the derivation index), but getting the list of unused addresses will still require scanning

Copy link
Contributor

@LLFourn LLFourn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is almost ready and provides something that is pretty useful. The main bits of work here are:

  1. To restore the previous (someone nonsensical) behaviour of LastUnused
  2. To add tests to check the wallets derivation index is correct after calling batch unused (can just try and get a new address after and check its index)

As @rajarshimaitra mentions the best way to implement is to index things properly which in bdk is currently done in the database backend. In bdk_core I've done indexing of unused addresses. Since I was the one who requested this feature and I'm focused on bdk_core we could simply close this PR and wait until it lands. Does anyone else want this feature presently? @nickfarrow what do you think?

src/wallet/mod.rs Show resolved Hide resolved
Comment on lines 393 to 398
pub fn get_batch_unused_addresses(
&self,
n: usize,
from_front: bool,
keychain: KeychainKind,
) -> Result<Vec<AddressInfo>, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct thing is to return a impl DoubleEndedIterator over unused addresses I think.

let check_indexes = if from_front {
(0..=current_address_index).collect::<Vec<_>>()
} else {
(0..=current_address_index).rev().collect::<Vec<_>>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return a impl DoubleEndedIterator from the method instead.

Comment on lines 3983 to 4008
assert_eq!(
wallet
.get_batch_unused_addresses(3, true, KeychainKind::External)
.unwrap(),
vec![
AddressInfo {
index: 0,
address: Address::from_str("tb1q6yn66vajcctph75pvylgkksgpp6nq04ppwct9a")
.unwrap(),
keychain: KeychainKind::External,
},
AddressInfo {
index: 2,
address: Address::from_str("tb1qzntf2mqex4ehwkjlfdyy3ewdlk08qkvkvrz7x2")
.unwrap(),
keychain: KeychainKind::External,
},
AddressInfo {
index: 3,
address: Address::from_str("tb1q32a23q6u3yy89l8svrt80a54h06qvn7gnuvsen")
.unwrap(),
keychain: KeychainKind::External,
}
]
);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a check of the derivation index here would be good.

@notmandatory notmandatory removed this from the Release 0.20.0 Feature Freeze milestone Jul 6, 2022
@rajarshimaitra
Copy link
Contributor

ACK on @LLFourn that this can go in as it is without much further changes.. I will check the behavior once again.. If this lands through bdk_core eventually then we might not wanna do the DataBase way for now, and just keep things simple..

* remove batch getting `n` unused addresses, just get them all (much
  simpler)
* use next_back() and next() for last and first unused
* test more cases for get_unused_address_indexes
* inline functions and simplified next addr
* create HashSet of txn scripts before checking unused
* add firstunused testcase for repeated unused
@notmandatory
Copy link
Member

Is this one OK to add this to the 0.22.0 milestone? looks like it's about ready and I don't want it to be overlooked.

@LLFourn
Copy link
Contributor

LLFourn commented Aug 10, 2022

IMO this PR is suboptimal because of #701. I think it should be fixed first to make the code in this PR make sense. @nickfarrow?

@nickfarrow
Copy link
Contributor Author

Yep agree it may as well wait for improvements to fetch_index, that function is relied upon a few times here

@danielabrozzoni
Copy link
Member

Hey, we are in the process of releasing BDK 1.0, which will under the hood work quite differently from the current BDK. For this reason, I'm closing all the PRs that don't really apply anymore. If you think this is a mistake, feel free to rebase on master and re-open!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants