Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor/fix: use sort_unstable instead of sort, more efficient with_* methods for TransactionBody #2564

Merged
merged 1 commit into from
Feb 14, 2019

Conversation

JeremyRubin
Copy link
Contributor

Previously, the codebase uses sort. In rust, sort performs a stable sort, which is slow and allocates memory. On the other hand, sort_unstable is supposedly faster and does not require the allocation of memory, so I switch to using unstable sort as we don't need sort stability (our vecs should already be or will be made unique).

According to RFC documentation, the ordinary sort may be faster in the following case:

A: Yes, there are a few cases where it is faster.
For example, if the slice consists of several pre-sorted sequences concatenated one after another, then slice::sort will most probably be faster. 
Another case is when using costly comparison functions, e.g. when sorting strings. slice::sort optimizes the number of comparisons very well, while pdqsort optimizes for fewer writes to memory at expense of slightly larger number of comparisons.
But other than that, slice::sort should be generally slower than pdqsort.

As we're generally not sorting inputs from remote peers (we usually are validating they sent it sorted) this should be ok -- and in the worst case, it's not much slower.

I also switch the with_* functions to insert after a failed binary search for the kernel, input, or output rather than unconditionally insert. This has a improved runtime of O(n + log(n)) v.s. O(n log n), but I don't think we use it in any performance sensitive context, so the major change is the behavioral change. This is a minor breaking change -- previously, adding two of the same output, input, or kernel would result in two of that item being inserted. I think this is almost always a bug, so it may be worth handling this case differently (e.g., not returning anything on duplicate add). In theory, this is also a little bit less safe if a user passes in a TransactionBody with an unsorted vec, the with_* will do something unspecified. We'll get an error at verify_sorted_and_unique in such a case.

@antiochp
Copy link
Member

I think this is almost always a bug.

Agreed. Not so much a bug as undefined behavior. But should never happen.
either way its going to fail at one layer or another. Here at verify_sorted_and_unique, but if we did end up with a duplicate in there it would not sum as expected.

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Some minor comments but looks good. Thanks for this!

self.out_full.sort();
self.kern_full.sort();
self.kern_ids.sort();
self.out_full.sort_unstable();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 These are good.

@@ -501,7 +501,7 @@ impl TransactionBody {
kernels: Vec<TxKernel>,
verify_sorted: bool,
) -> Result<TransactionBody, Error> {
let body = TransactionBody {
let mut body = TransactionBody {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Originally wrote this to avoid making the body mut unless we needed it to be (i.e. if we needed to sort it).
But it definitely reads better like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was torn on this one -- technically doesn't need to be a part of this PR, but I felt it was a bit less noisy.

generally agree though that only mut binding where absolutely needed is better.

inputs: new_ins,
..self
}
pub fn with_input(mut self, input: Input) -> TransactionBody {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are actually (potentially) breaking changes.
The original intent was for these to be factories that left the original body unchanged, creating new instances each time.
I'm not sure if we actually needed this guarantee though (to leave the original one unchanged).

👍 If you are confident this is ok to change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm -- I don't think that's quite true -- the contract of these functions is a pass by ownership (I'm only touching the mutability).

Because the self passed in is an owned self, the passed in instance is consumed so I figured it's OK to trash it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're right - we're consuming the instance here each time. 👍

self.inputs
.binary_search(&input)
.err()
.map(|e| self.inputs.insert(e, input));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really torn here - it feels wrong to use the err() here as control flow, but it does seem to be a pattern used (and recommended) in Rust...

Result<usize, usize> looks and feels like a nasty hack to me...

One suggestion for readability - can we use idx instead of e? Its technically an "error" but in practice this is really just the idx to insert at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! I agree 100% -- one of my least favorite things in the rust API.

Haskell's result type is the value-neutral 'Either'
http://hackage.haskell.org/package/base-4.12.0.0/docs/Data-Either.html where Left and Right don't really convey anything.

In Rust, we have a hatred/annoyance for the Right Err type, and using it in a normal control flow feels gross. Rust also used to have Either for this purpose rust-lang/rust#11149

In this case, this is the exact intended use of binary_search as per the docs:

If the value is found then Result::Ok is returned, containing the index of the matching element.
If there are multiple matches, then any one of the matches could be returned.
If the value is not found then Result::Err is returned,
containing the index where a matching element could be inserted while maintaining sorted order.

We could also use map_err if you prefer, but I prefer to use err() because it actively discards the Ok result, which feels safer in general (map_err leaves it untouched which might cause you to later forget that the err() was the desired type)

@antiochp antiochp added this to the 1.0.2 milestone Feb 14, 2019
@antiochp antiochp merged commit d4540f3 into mimblewimble:master Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants