-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark addresses as used while discovering them. #2033
Conversation
8ba5a3f
to
eed460c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wholly approve of the change to mark sequential addresses as used once they are discovered, so that this information is easily known when listing addresses.
I also like the idea of streaming responses, but the topic needs further discussion in a separate PR.
-- separator), and decoding them as a JSON list. | ||
decodeOne = Aeson.decodeStrict . mconcat | ||
decodeStream chunks = do | ||
xs <- traverse (Aeson.decodeStrict @Aeson.Value . B8.init) chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you can necessarily assume that the chunks are split nicely at the newline.
The OS might interrupt the socket read early, and you will have a line in two pieces, neither of which are well-formed json.
This streaming change is worthy of its own PR, where the details can be sorted out.
I like the idea of streaming in general, because it removes one case of unbounded memory usage.
But on the other hand, if you have 1 million addresses, that would be perhaps 80MB of encoded json, as an indication of size. So the memory cost of not streaming is smallish. Without streaming there is a delay while the server is reading the list from sqlite, but unless the clients also have stream processing, so what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rvl actually.. there's one test failing because of this which shows that sometimes, there are chunks that only contains half of an address (it fails on the "import 15.000 addresses and list them" scenario, I haven't noticed this below 100 addresses).
So yeah, this require some additional thoughts maybe. I'll split the streaming approach into a separate PR.
The main issue of the current list address handler is that it loads in memory and crawls the ENTIRE transaction history of the wallet. This is done merely to know whether addresses have been used or not. This is unnecessary for random addresses because used and unused addresses are already separated in two different Maps. For sequential wallets it's a bit more subtle but, we can mark addresses as "Used" during discovery and store this information in the database.
… of IsOurs on the address state
02b126e
to
b94f9f8
Compare
@rvl I ended up adding a simple migration for this. Reviewing the migration approach of the wallet can be done as a separate task / debt. I've tested the query with a hand-crafted database with 100k address and txout rows arbitrarily generated (with however existing relationships between tables) and the query is executed almost instantly. |
In the end, migrating an existing database was quite easy given the right SQL incantation. We need to mark any 'known' address as used where a known address is simply an address that appear in one of the wallet known txout. We don't care in the end if it's an outgoing or incoming transaction. Just seeing an address there immediately qualifies it as 'used'.
29ed552
to
dc6db76
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migration looks OK
1486920
to
3a01713
Compare
bors r+ |
Build succeeded |
Issue Number
#2032
Overview
dd3cff5
📍 mark addresses as used as they are discovered
The main issue of the current list address handler is that it loads in memory and crawls the ENTIRE transaction history of the wallet. This is done merely to know whether addresses have been used or not. This is unnecessary for random addresses because used and unused addresses are already separated in two different Maps. For sequential wallets it's a bit more subtle but, we can mark addresses as "Used" during discovery and store this information in the database.
61bbd7b
📍 add two properties for Random & Sequential wallets showing the effect of IsOurs on the address state
eed460c
📍 fix random address import to not mark every imported address as 'Used'
Comments
NOTE 1: This change will require a database migration forcing wallets to re-synchronize from scratch.
NOTE 2: I haven't yet fixed unit tests which need some adjustment since some of the interfaces have changed.NOTE 3: I'll try measuring and comparing the behavior with a stress-wallet that owns many addresses.