Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wtxmgr package #217

Merged
merged 4 commits into from
Apr 25, 2015
Merged

Add wtxmgr package #217

merged 4 commits into from
Apr 25, 2015

Conversation

jrick
Copy link
Member

@jrick jrick commented Mar 31, 2015

This change implements a scalable wallet transaction database with integrated spend tracking, based on walletdb.

Integration is being tracked in another PR: #234

This is not the final database change related to transactions. In the future, wtxmgr will be an internal package used by waddrmgr to store transactions and manage output spend tracking per account. However, in the interest of improving the current code on master in a timely manner, wtxmgr is currently being given its own namespace. This matches with how the old transaction store was managed independent of addresses and accounts.

Known issues are marked with TODO comments. I believe all blockers have been implemented or fixed at this point, and the only remaining TODOs deal with performance issues which can be improved later (such as creating more new byte slices than necessary or parsing more details than needed).

Checklist for merge:

  • Add equivalent of txstore.Store.FindPreviousCredits (needed for votingpool to fetch PkScripts)
  • Make upgrade logic match that of newer waddrmgr just nuke the bucket when this moves to waddrmgr
  • Increase test coverage
  • Add testable, runnable examples

@jrick
Copy link
Member Author

jrick commented Apr 1, 2015

I'm currently adding APIs to query for all recorded details for a single transaction. This is done under a single view. At the moment the API looks like this:

type TxDetails struct {
        TxRecord
        Block   BlockMeta
        Credits []Credit
        Debits  []Debit
}

// TxDetails returns all saved details regarding a transaction.  In case of a
// hash collision, the most recent transaction with a matching hash is returned.
func (s *Store) TxDetails(txHash *wire.ShaHash) (*TxDetails, error) {
        var details *TxDetails
        err := scopedView(s.namespace, func(ns walletdb.Bucket) error {
                var err error
                details, err = s.txDetails(ns, txHash)
                return err
        })
        return details, err
}

func (s *Store) txDetails(ns walletdb.Bucket, txHash *wire.ShaHash) (*TxDetails, error) {
        // First, check whether there exists an unmined transaction with this
        // hash.  Use it if found.
        v := existsRawUnmined(ns, txHash[:])
        if v != nil {
                return s.unminedTxDetails(ns, txHash, v)
        }

        // Otherwise, if there exists a mined transaction with this matching
        // hash, skip over to the newest and begin fetching all details.
        k, v := latestTxRecord(txHash)
        if v == nil {
                // not found
                return nil, nil
        }

        // Read k/v, lookup all matching credits, debits.

        return nil, nil
}

func (s *Store) unminedTxDetails(ns walletdb.Bucket, txHash *wire.ShaHash, v []byte) (*TxDetails, error) {
        // ...
}

This will only fetch details for one transaction at a time and is not suitable for fetching entire ranges of transactions, but perhaps some of this code can be reused for that.

@jrick
Copy link
Member Author

jrick commented Apr 1, 2015

Since the Credit type already implemented includes details which are duplicates of the TxRecord, I'm going to add some new types:

type CreditRecord struct {
        Index  uint32
        Spent  bool
        Change bool
}

type DebitRecord struct {
        Amount btcutil.Amount
        Index  uint32
}

type TxDetails struct {
        TxRecord
        Block   BlockMeta
        Credits []CreditRecord
        Debits  []DebitRecorg
}

Further details about the transaction inputs and outputs can then be accessed by indexing the wire.MsgTx slices.

@jrick
Copy link
Member Author

jrick commented Apr 1, 2015

API added to lookup rich details regarding a single transaction. I'll update the integration branch to use this for the gettransaction RPC.

Longer term, we'll want to move ToJSON out of this package and use these types instead (but not the new method, since we know the block or if it's unmined). In fact, it may make sense to do that now, considering ToJSON is known broken when passed an unconfirmed transaction.

@jrick
Copy link
Member Author

jrick commented Apr 2, 2015

All of the remaining parts in the integration branch deal with the RPC server returning details about transaction history over ranges of transactions. I am thinking of reusing the TxDetails type for an API to handle these situations as well, but instead of requesting one transaction at a time, a []TxDetails will be returned, one block at a time, over some specified range of blocks.

I have something like this in mind:

func (s *Store) RangeTransactions(begin, end int32, func([]TxDetails)) error

err := w.TxStore.RangeTransactions(0, 300000, func(details []TxDetails) {
        // ...
})

This will allow iteration over transactions one block at a time. The height range is inclusive, and the special height -1 may be used as a high bound to include unmined transactions. The db view will be held for the entire duration of the call to RangeTransactions. If necessary, I may also pass a context type to the function so it can return early without error.

One downside here is TxDetails includes a BlockMeta, which will be the same for every returned detail at a time. This is inefficient and cumbersome to use, so maybe I'll split out the BlockMeta from TxDetails and also return a BlockMeta for the Store.TxDetails method. Another downside is that the view may be held to for a long time as almost every detail for every transaction in this range is returned and then processed by the caller. While multiple views can run concurrently, writes will block, and this will stop all syncing until all the views return. However, the alternative of returning details from both before and after a change is probably worse.

@davecgh
Copy link
Member

davecgh commented Apr 2, 2015

I'll take a look at this tomorrow.

@jrick
Copy link
Member Author

jrick commented Apr 2, 2015

@davecgh informed me that, with bolt, a single Update may begin while Views are active. This is because pages are COW. So a long running view isn't as much of a problem as I explained above.

@jrick
Copy link
Member Author

jrick commented Apr 3, 2015

APIs settled (for now :)) and integration branch is compiling.

@davecgh
Copy link
Member

davecgh commented Apr 3, 2015

Probably not for the PR, but we did discuss moving the whole legacy folder under internal so it's no longer externally accessible.

EDIT: Actually, I'll just make a separate issue for it now.

// key/value pairs and nested buckets in forward or backward order.
//
// This function is part of the walletdb.Bucket interface implementation.
func (b *bucket) Cursor() walletdb.Cursor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we break all of this Cursor functionality into a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep I'll split it out, let's get that merged first.

@davecgh
Copy link
Member

davecgh commented Apr 4, 2015

One thing I noticed in the db code is that it's using a ton of magic numbers. They are commented so it's fairly easy to follow, but I suspect it could lead to being a pain to upgrade if a new field gets added and such so that, say the 44 in the block records needs to become 48. I'd make a few constants like const blkRecTxOffset = 44.

@jrick
Copy link
Member Author

jrick commented Apr 4, 2015

I considered that approach as well but then I have to keep a thousand different names in my head, instead of just consulting a single comment and calculating offsets in my head (which I don't find hard, perhaps others disagree).

edit: I'll also mention that those magic numbers are only used in the code directly under the comment, and not off in other files, so when there needs to be an upgrade, only one section of the code needs to be considered for the new serializations.

seek := make([]byte, 4)
byteOrder.PutUint32(seek, ^uint32(0))
c := ns.Bucket(bucketBlocks).Cursor()
return blockIterator{c: c}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return blockIterator{c: c, seek: seek}

@davecgh
Copy link
Member

davecgh commented Apr 4, 2015

Since the code is making assumptions about the size of the hash everywhere, I think it might be worthwhile to add an init panic if it's not. Something like:

func init() {
    if wire.HashSize != 32 {
        panic("hash size is not 32 bytes as expected.")
    }
}

That way the code can assume 32-byte hashes (which aren't likely to change any time soon) without having any mysterious failures if the hash size ever changes in the future.

@jrick
Copy link
Member Author

jrick commented Apr 4, 2015

I prefer static assertions for these cases, but sure, an init check is fine.

@davecgh
Copy link
Member

davecgh commented Apr 4, 2015

Agreed a static assertion would be better. This should do the trick:

var _ [32]byte = wire.ShaHash{}

var v []byte
if rec.SerializedTx == nil {
txSize := rec.MsgTx.SerializeSize()
v = make([]byte, 8, 8+txSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice way here to avoid an extra copy.

@davecgh
Copy link
Member

davecgh commented Apr 4, 2015

I've finished reviewing the database code and it looks really solid overall. As mentioned before, I think it's a little heavy on the magic numbers, but the comments regarding the indices are good enough to follow what's going on.

From an implementation perspective, it provides really nice properties for highly scalable access. In particular, I like the how the transaction, credits, and debit record keys share the same prefix so prefix scans can be done to efficiently find all relevant records across buckets.

// wallet's spendable balance) and are modeled using the Debit structure.
//
// Besides just saving transactions, bidirectional spend tracking is also
// performed on each credit and debit. Unlike packages such as btcdb, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true now, but, as you know, btcdb is currently being rewritten where it doesn't do any spend tracking on its own. Might want to avoid mentioning it here since this will no longer be true in a relatively short while.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file was lifted from the old implementation and needs to be rewritten.

@davecgh
Copy link
Member

davecgh commented Apr 4, 2015

I'd personally prefer the JSON bits to live outside of the txstore. That is specific to the RPC server.

@davecgh
Copy link
Member

davecgh commented Apr 4, 2015

This will need a README.md before we merge it as well.

@jrick
Copy link
Member Author

jrick commented Apr 4, 2015

The JSON stuff will move. It's here now because that's how txstore did it. The code in the PR currently needed it in this package so additional details could be queried out of the db as necessary, but in my current tree I've switched this to be a method of TxDetails and it no longer needs the Store. Since fields in the TxDetails are all exported, I plan on moving this totally out of this package and into rpcserver.go, or perhaps the wallet package if it's needed there (that package also includes too much stuff specific to the rpc server).

@davecgh
Copy link
Member

davecgh commented Apr 4, 2015

I noticed the same thing about wallet yesterday while looking through it again as well (too much RPC server stuff)

@jrick
Copy link
Member Author

jrick commented Apr 6, 2015

@gsalgado What's the plan for txstore and wtxmgr in votingpool?

@jrick
Copy link
Member Author

jrick commented Apr 17, 2015

Rebased over hashing API changes.

// // Handle error
// }
//
// The elem's Spent field is not set to true if the credits is spent by an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

credits -> credit

@davecgh
Copy link
Member

davecgh commented Apr 25, 2015

This looks ready to me. OK.

@conformal-deploy conformal-deploy merged commit 0087d38 into btcsuite:master Apr 25, 2015
@jrick jrick deleted the jrick_wtxmgr branch April 25, 2015 05:04
bucketBlocks, 44, len(v))
return nil, storeError(ErrData, str, nil)
}
newv := append(v[:len(v):len(v)], txHash[:]...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be simplified to append(v, ...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would not be guaranteed to be new in that case.

http://play.golang.org/p/aWoBXnZ-em

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, got it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary because otherwise you may be writing to invalid memory. The v parameter in this case will often be the value bytes directly returned from bolt. And since slices returned from bolt have caps greater than their lengths, by not limiting the cap during an append, you would end up writing to who-knows-where. Best case scenario here would be a segfault. Worst case would be silently continuing while corrupting the DB.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah I realize now the third index is for limiting the cap.

alexlyp added a commit to alexlyp/btcwallet that referenced this pull request May 27, 2016
alexlyp added a commit to alexlyp/btcwallet that referenced this pull request Jun 17, 2016
alexlyp added a commit to alexlyp/btcwallet that referenced this pull request Jun 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants