Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: allocation free cut through algorithms #2567

Merged

Conversation

JeremyRubin
Copy link
Contributor

In our current algorithms for cut_through and verify_cut_through, we allocate memory and use some relatively inefficient APIs (e.g., retain).

The algorithms presented should be more time-efficient (this is untested), but are certainly more space efficient.

Opening a PR now to get feedback on the approach.

I also am unclear that the sort_unstable_by_key I am doing in the beginning is correct -- is sorting by the commitments here the same order that the sort at the end of the function was previously doing? It seems like that's not accurate, but nothing came up running tests (perhaps a good test case to add). I think it just turns out that everywhere we call cut_through, we also pass false on verify_sort to TransactionBody::init, so it doesn't matter if we're mis-sorted.

let mut outs_insert_idx = 0..;

let mut ncut = 0;
while let (Some(i), Some(i2)) = (inps.peek(), outs.peek()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read through this a few times an I still don't fully understand what we're doing here...

Lets say -

  • inputs: [A,B,F,H]
  • outputs: [B,C,H,J]

After cut-through we should see -

  • inputs: [A,F]
  • outputs: [C,J]
  • (removed: [B,H])

Reading through the impl here -

  • We sort inputs and outputs (and check outputs are unique)
  • then we iterate over peekable inputs and outputs
    • if they are equal we next() both and increment ncut
    • if not equal we next() one or the other
  • Then we drain slices of inputs and outputs using ncut

Given outputs [B,C,H,J] how do end up with [C,J] using a single drain with a single slice using ncut?
What does ncut do here? I'm not following this (and missing something obvious).

Copy link
Contributor Author

@JeremyRubin JeremyRubin Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup! Let's go through step by step:
^ means the _insert_idx will next yield this location,
* means the inps/outs will next yield this location

Step 1:
[^*A,B,F,H]
[^*B,C,H,J]
ncut = 0
A < B

Step 2:
[A,^*B,F,H]
[^*B,C,H,J]
ncut = 0
B == B

Step 3:
[A,^B,*F,H]
[^B,*C,H,J]
ncut = 1
F > C

Step 4:
[A,^B,*F,H]
[C,^C,*H,J]
ncut = 1
F < H


Step 5:
[A,F,^F,*H]
[C,^C,*H,J]
ncut = 1
H == H


Step 6:
[A,F,^F,H] *
[C,^C,H,*J]
ncut = 2
vec 1 has None to compare

Step 7:
vec1.drain(^..^+ncut)
[A,F,_,_]
[A,F]
vec2.drain(^..^+ncut)
[C,_,_,*J]
[C,J]

I think I wrote this out correctly... if there are any errors, let me know.

Essentially what we are doing is walking both lists in sorted order with four fingers, and backshifting anything that is not in the other list, which we are guaranteed via the sort order. When we finish, the drain of that particular range cuts out only the ones we've already inserted into the list earlier. It leaves (and then backshifts) the range after which we've yet to look at (but don't need to, because it is greater than the greatest element in the other list).

Technically I think you can compute ncut as (*-^), but I think there is some edge condition, so tracking ncut seemed more straightforward.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks for the explanation. Looks good.

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also am unclear that the sort_unstable_by_key I am doing in the beginning is correct -- is sorting by the commitments here the same order that the sort at the end of the function was previously doing? It seems like that's not accurate, but nothing came up running tests (perhaps a good test case to add). I think it just turns out that everywhere we call cut_through, we also pass false on verify_sort to TransactionBody::init, so it doesn't matter if we're mis-sorted.

I think we are good here.
You added an explicit sort to both inputs and outputs at the start of cut_through right?

The magic (and somewhat obscure) hashable_ord! macro on Inputs/Outputs/TxKernels ensures we sort them consistently by their hash().

Edit: So actually, you're right, the sort is different here.

outputs.sort_unstable_by(|v1, v2| v1.commitment().0.cmp(&v2.commitment().0));

should really be -

outputs.sort_unstable() (and let our hashable_ord! handle the cmp logic).

Same for inputs.

So actually similar to what we're doing in #2564.

@JeremyRubin
Copy link
Contributor Author

I'm not sure that's right -- I think they need to be sorted by the commitment because we need to cut the commitments, and the order of the inputs.hash() will be uncorrelated with the order of the outputs.hash()?

This is a problem for the verify_cut_through method, which was assuming sort based on commitment...

@JeremyRubin
Copy link
Contributor Author

Actually it seems that the hash of an input and the hash of an output are indeed the same, not just the commitment -- the only additional field is the Output Features flag!

Therefore sorting the inbound on either the commitment or the hash should be correct -- makes sense why I wasn't able to get errors on either case.

That the OutputFeatures flags match for matching commitments is checked in verify_features.

@JeremyRubin JeremyRubin force-pushed the allocation-free-cut-through branch from f650b9c to 64d658b Compare February 15, 2019 01:57
@JeremyRubin
Copy link
Contributor Author

JeremyRubin commented Feb 15, 2019

I pushed up an updated version of the algorithm.

In this version I:

  1. Use the hashable_ord! implementation of sort
  2. Got rid of the previous use of peekable iterator in favor of regular indexes (bit more verbose, but much easier to read IMO)
  3. Added hash caching to the main comparison loop (should use approximately half the hashing, as we don't recompute the hash for both entries at every step).
  4. edit: I also added a copy trait to Input -- there didn't seem to be a reason not to, as Output already implements it, and it implements Clone anyways

@antiochp antiochp added this to the 1.0.2 milestone Feb 15, 2019
@antiochp
Copy link
Member

Just testing this out locally - but looks good so far. 👍

added a copy trait to Input

Yes. Makes sense.

Added hash caching to the main comparison loop (should use approximately half the hashing, as we don't recompute the hash for both entries at every step).

We've actually discussed how useful it would be to have some kind of generic solution to this. There are a lot of places where we a lot of hashing (appending to MMRs for example). This is good here but we should revisit this at some point when we have a better idea of how to do this in a single place (if we even can).

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge conflict from the other PR but 👍 once it builds and the tests all pass.

Tested it locally and against mainnet and it looks fine.

@JeremyRubin JeremyRubin force-pushed the allocation-free-cut-through branch 3 times, most recently from ae2a68a to ea18c1a Compare February 15, 2019 19:36
@JeremyRubin
Copy link
Contributor Author

Rebased

@JeremyRubin JeremyRubin changed the title [WIP] refactor: allocation free cut through algorithms refactor: allocation free cut through algorithms Feb 15, 2019
@ignopeverell
Copy link
Contributor

To be honest I'm a little torn. I do agree this will likely do less copying and hence use less memory. But we're not sure. And we have no idea by how much. LLVM compiler optimizations can be pretty good too.

Now this replaces 25 lines of somewhat idiomatic functional Rust that's easy to read by 75 lines of iterative-looking, much harder to follow code.

I agree these functions are likely hot code. But I can't tell if it's a premature optimization or not and it bothers me.

@JeremyRubin
Copy link
Contributor Author

I could introduce a PR that tries to bench the existing function -- it's hard to get a 'whole system' picture though. It's unclear if it is faster, but I think it is a pretty safe assumption that it's not slower. I think the main time draw of an algorithm like this in place of the previous one is predictability from a DoS perspective. A specially crafted set of transactions should perform roughly equivalently to any other irregardless of cut through. Whereas in the prior code, bigger cut through can yield O(N^2) behavior (n^2 retain). I don't really think it's the kind of thing a compiler can optimize.

It's possible to make this code more rust idiomatic and still get a pretty good win I think, and give our compiler a much better chance of optimizing it (the clone is only needed here to make the borrow checker happy...). I don't particularly find the below version less offensive though (iterators with peekable are tidy, but a bit harder to reason about IMO), and it has the extra copy. (I haven't tested the below)

 fn verify_cut_through(&self) -> Result<(), Error> {
                   let mut inputs = self.inputs.iter().map(|x| x.hash()).peekable();
                   let mut outputs = self.outputs.iter().map(|x| x.hash()).peekable();
                   loop {
                           match (inputs.peek(), outputs.peek()) {
                                   (Some(ih), Some(oh)) => match ih.cmp(oh) {
                                           Ordering::Less => {
                                                   inputs.next();
                                           }
                                           Ordering::Greater => {
                                                   outputs.next();
                                           }
                                           Ordering::Equal => {
                                                   return Err(Error::CutThrough);
                                           }
                                   },
                                   _ => {
                                           return Ok(());
                                   }
                           }
                   }
           }

pub fn cut_through(inputs: &mut Vec<Input>, outputs: &mut Vec<Output>) -> Result<(), Error> {
         // assemble output commitments set, checking they're all unique
         outputs.sort_unstable();
         if outputs.windows(2).any(|pair| pair[0] == pair[1]) {
                 return Err(Error::AggregationError);
         }
         inputs.sort_unstable();
         let inputs_cpy = inputs.clone();
         let outputs_cpy = outputs.clone();
         let mut input_hashes = inputs_cpy.iter().map(|x| (x, x.hash())).peekable();
         let mut output_hashes = outputs_cpy.iter().map(|x| (x, x.hash())).peekable();
         let mut insert_input = inputs.iter_mut().enumerate();
         let mut insert_output = outputs.iter_mut().enumerate();
         let mut ncut = 0;
         loop {
                 match (input_hashes.peek(), output_hashes.peek()) {
                         (Some((&i, ih)), Some((&o, oh))) => match ih.cmp(oh) {
                                 Ordering::Less => {
                                         *insert_input.next().unwrap().1 = i;
                                         input_hashes.next();
                                 }
                                 Ordering::Greater => {
                                         *insert_output.next().unwrap().1 = o;
                                         output_hashes.next();
                                 }
                                 Ordering::Equal => {
                                         ncut += 1;
                                         output_hashes.next();
                                         input_hashes.next();
                                 }
                         },
                         _ => {
                                 // Cut elements that have already been copied
                                 if let Some((drain_idx, _)) = insert_output.next() {
                                         outputs.drain(drain_idx..drain_idx + ncut);
                                 }
 
                                 if let Some((drain_idx, _)) = insert_input.next() {
                                         inputs.drain(drain_idx..drain_idx + ncut);
                                 }
                                 return Ok(());
                         }
                 }
         }
 }

@JeremyRubin
Copy link
Contributor Author

p.s. do we have any benchmarking/profiling tools currently?

@antiochp
Copy link
Member

antiochp commented Feb 16, 2019

I wonder if we punted on the hash caching for now (we know its a more general issue) - would this make the changes in this PR cleaner?

Related - #2584.

@JeremyRubin
Copy link
Contributor Author

It's a little shorter -- if we think we'll solve hash caching elsewhere, then the below is very tolerable

 pub fn cut_through(inputs: &mut Vec<Input>, outputs: &mut Vec<Output>) -> Result<(), Error> { 
         // assemble output commitments set, checking they're all unique
         outputs.sort_unstable();
         if outputs.windows(2).any(|pair| pair[0] == pair[1]) { 
                 return Err(Error::AggregationError);
         }
         inputs.sort_unstable();
         let mut inputs_idx = 0;
         let mut outputs_idx = 0;
         let mut inputs_insert_idx = 0;
         let mut outputs_insert_idx = 0;
         let mut ncut = 0;
         while inputs_idx < inputs.len() && outputs_idx < outputs.len() { 
                 // if our cached hashes are stale
                 match inputs[inputs_idx].hash().cmp(&outputs[outputs_idx].hash()) { 
                         Ordering::Less => { 
                                 inputs[inputs_insert_idx] = inputs[inputs_idx];
                                 inputs_idx += 1;
                                 inputs_insert_idx += 1;
                         }
                         Ordering::Greater => { 
                                 outputs[outputs_insert_idx] = outputs[outputs_idx];
                                 outputs_idx += 1;
                                 outputs_insert_idx += 1;
                         }
                         Ordering::Equal => { 
                                 ncut += 1;
                                 inputs_idx += 1;
                                 outputs_idx += 1;
                         }
                 }
         }
         // Cut elements that have already been copied
         outputs.drain(outputs_insert_idx..outputs_insert_idx + ncut);
         inputs.drain(inputs_insert_idx..inputs_insert_idx + ncut);
         Ok(())
 }

For posterity and completeness, here's what a caching version looks like with iterators and no copying, using RefCell to fight the borrowchecker (can probably be cleaned up further).

pub fn cut_through(inputs: &mut Vec<Input>, outputs: &mut Vec<Output>) -> Result<(), Error> {
           // assemble output commitments set, checking they're all unique
           outputs.sort_unstable();
           if outputs.windows(2).any(|pair| pair[0] == pair[1]) {
                   return Err(Error::AggregationError);
           }
           inputs.sort_unstable();
           let mut inputs_r = RefCell::new(vec![]);
           let mut outputs_r = RefCell::new(vec![]);
           std::mem::swap(inputs_r.get_mut(), inputs);
           std::mem::swap(outputs_r.get_mut(), outputs);
           assert!(inputs.len() == 0);
           let (drain_idx_inputs, drain_idx_outputs) = {
                   let mut ncut = 0;
                   let mut drain_idx_inputs = 0;
                   let mut drain_idx_outputs = 0;
                   let inputs = &inputs_r;
                   let outputs = &outputs_r;
                   let mut input_hashes = (0..inputs.borrow().len())
                           .map(|x| (x, inputs.borrow()[x].hash()))
                           .peekable(); 
                   let mut insert_input = (0..).map(|x| {
                           move |y| {
                                   let mut inputs = inputs.borrow_mut();
                                   inputs[x] = inputs[y];
                                   x
                           }
                   });
   
                   let mut output_hashes = (0..outputs.borrow().len())
                           .map(|x| (x, outputs.borrow()[x].hash()))
                           .peekable(); 
                   let mut insert_output = (0..).map(|x| {
                           move |y| {
                                   let mut outputs = outputs.borrow_mut();
                                   outputs[x] = outputs[y];
                                   x
                           }
                   });
                   while let (Some((get_i, ih)), Some((get_o, oh))) =
                           (input_hashes.peek(), output_hashes.peek())
                   {
                           match ih.cmp(oh) {
                                   Ordering::Less => {
                                           drain_idx_inputs = insert_input.next().unwrap()(*get_i);
                                           input_hashes.next();
                                   }
                                   Ordering::Greater => {
                                           drain_idx_outputs = insert_output.next().unwrap()(*get_o);
                                           output_hashes.next();
                                   }
                                   Ordering::Equal => {
                                           ncut += 1; 
                                           output_hashes.next();
                                           input_hashes.next();
                                   }
                           }
                   }
                   (
                           drain_idx_inputs..(drain_idx_inputs + ncut),
                           drain_idx_outputs..(drain_idx_outputs + ncut),
                   )
           };
           // Cut elements that have already been copied
           std::mem::swap(outputs_r.get_mut(), outputs);
           std::mem::swap(inputs_r.get_mut(), inputs);
           outputs.drain(drain_idx_outputs);
           inputs.drain(drain_idx_inputs);
           Ok(())
   }

@antiochp
Copy link
Member

antiochp commented Feb 18, 2019

I say we go with that most recent example, the then the below is very tolerable example.

Thanks for doing this exploratory work to look into different options here.
Premature optimization or not - I think this reads pretty cleanly and its short and does one thing.

👍

And its not like the "convert them into sets and find the intersection" approach was particularly intuitive in terms of knowing what it was doing at a glance anyway.

@JeremyRubin JeremyRubin force-pushed the allocation-free-cut-through branch from ea18c1a to b809be0 Compare February 18, 2019 20:32
@JeremyRubin
Copy link
Contributor Author

Ok. I pushed up new code for both, which I think is even a little simpler.

  1. for verify_cut_through, I kept the iterable version because it is kind of the easiest to write/read, and there is no ownership issue with the inputs and outputs vec. This version caches inherently because of peekable, and it is the idomatic way to write it.
  2. for cut_through, I got rid of the insert_idx variables and just compute the offsets using ncut. This is a bit simpler to comprehend because there are fewer variables to think about, and reads a little bit better (rather than arbitrary insert_idx, we are inserting the number back that we've cut).

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@antiochp antiochp merged commit eafe730 into mimblewimble:master Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants