-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: allocation free cut through algorithms #2567
refactor: allocation free cut through algorithms #2567
Conversation
core/src/core/transaction.rs
Outdated
let mut outs_insert_idx = 0..; | ||
|
||
let mut ncut = 0; | ||
while let (Some(i), Some(i2)) = (inps.peek(), outs.peek()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through this a few times an I still don't fully understand what we're doing here...
Lets say -
- inputs:
[A,B,F,H]
- outputs:
[B,C,H,J]
After cut-through we should see -
- inputs:
[A,F]
- outputs:
[C,J]
- (removed:
[B,H]
)
Reading through the impl here -
- We sort inputs and outputs (and check outputs are unique)
- then we iterate over peekable inputs and outputs
- if they are equal we
next()
both and incrementncut
- if not equal we
next()
one or the other
- if they are equal we
- Then we
drain
slices of inputs and outputs usingncut
Given outputs [B,C,H,J]
how do end up with [C,J]
using a single drain with a single slice using ncut?
What does ncut
do here? I'm not following this (and missing something obvious).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Let's go through step by step:
^ means the _insert_idx will next yield this location,
* means the inps/outs will next yield this location
Step 1:
[^*A,B,F,H]
[^*B,C,H,J]
ncut = 0
A < B
Step 2:
[A,^*B,F,H]
[^*B,C,H,J]
ncut = 0
B == B
Step 3:
[A,^B,*F,H]
[^B,*C,H,J]
ncut = 1
F > C
Step 4:
[A,^B,*F,H]
[C,^C,*H,J]
ncut = 1
F < H
Step 5:
[A,F,^F,*H]
[C,^C,*H,J]
ncut = 1
H == H
Step 6:
[A,F,^F,H] *
[C,^C,H,*J]
ncut = 2
vec 1 has None to compare
Step 7:
vec1.drain(^..^+ncut)
[A,F,_,_]
[A,F]
vec2.drain(^..^+ncut)
[C,_,_,*J]
[C,J]
I think I wrote this out correctly... if there are any errors, let me know.
Essentially what we are doing is walking both lists in sorted order with four fingers, and backshifting anything that is not in the other list, which we are guaranteed via the sort order. When we finish, the drain of that particular range cuts out only the ones we've already inserted into the list earlier. It leaves (and then backshifts) the range after which we've yet to look at (but don't need to, because it is greater than the greatest element in the other list).
Technically I think you can compute ncut as (*-^), but I think there is some edge condition, so tracking ncut seemed more straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for the explanation. Looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also am unclear that the sort_unstable_by_key I am doing in the beginning is correct -- is sorting by the commitments here the same order that the sort at the end of the function was previously doing? It seems like that's not accurate, but nothing came up running tests (perhaps a good test case to add). I think it just turns out that everywhere we call cut_through, we also pass false on verify_sort to TransactionBody::init, so it doesn't matter if we're mis-sorted.
I think we are good here.
You added an explicit sort to both inputs and outputs at the start of cut_through
right?
The magic (and somewhat obscure) hashable_ord!
macro on Inputs/Outputs/TxKernels ensures we sort them consistently by their hash()
.
Edit: So actually, you're right, the sort is different here.
outputs.sort_unstable_by(|v1, v2| v1.commitment().0.cmp(&v2.commitment().0));
should really be -
outputs.sort_unstable()
(and let our hashable_ord!
handle the cmp
logic).
Same for inputs
.
So actually similar to what we're doing in #2564.
I'm not sure that's right -- I think they need to be sorted by the commitment because we need to cut the commitments, and the order of the inputs.hash() will be uncorrelated with the order of the outputs.hash()? This is a problem for the verify_cut_through method, which was assuming sort based on commitment... |
Actually it seems that the hash of an input and the hash of an output are indeed the same, not just the commitment -- the only additional field is the Output Features flag! Therefore sorting the inbound on either the commitment or the hash should be correct -- makes sense why I wasn't able to get errors on either case. That the OutputFeatures flags match for matching commitments is checked in verify_features. |
f650b9c
to
64d658b
Compare
I pushed up an updated version of the algorithm. In this version I:
|
Just testing this out locally - but looks good so far. 👍
Yes. Makes sense.
We've actually discussed how useful it would be to have some kind of generic solution to this. There are a lot of places where we a lot of hashing (appending to MMRs for example). This is good here but we should revisit this at some point when we have a better idea of how to do this in a single place (if we even can). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge conflict from the other PR but 👍 once it builds and the tests all pass.
Tested it locally and against mainnet
and it looks fine.
ae2a68a
to
ea18c1a
Compare
Rebased |
To be honest I'm a little torn. I do agree this will likely do less copying and hence use less memory. But we're not sure. And we have no idea by how much. LLVM compiler optimizations can be pretty good too. Now this replaces 25 lines of somewhat idiomatic functional Rust that's easy to read by 75 lines of iterative-looking, much harder to follow code. I agree these functions are likely hot code. But I can't tell if it's a premature optimization or not and it bothers me. |
I could introduce a PR that tries to bench the existing function -- it's hard to get a 'whole system' picture though. It's unclear if it is faster, but I think it is a pretty safe assumption that it's not slower. I think the main time draw of an algorithm like this in place of the previous one is predictability from a DoS perspective. A specially crafted set of transactions should perform roughly equivalently to any other irregardless of cut through. Whereas in the prior code, bigger cut through can yield O(N^2) behavior (n^2 retain). I don't really think it's the kind of thing a compiler can optimize. It's possible to make this code more rust idiomatic and still get a pretty good win I think, and give our compiler a much better chance of optimizing it (the clone is only needed here to make the borrow checker happy...). I don't particularly find the below version less offensive though (iterators with peekable are tidy, but a bit harder to reason about IMO), and it has the extra copy. (I haven't tested the below) fn verify_cut_through(&self) -> Result<(), Error> {
let mut inputs = self.inputs.iter().map(|x| x.hash()).peekable();
let mut outputs = self.outputs.iter().map(|x| x.hash()).peekable();
loop {
match (inputs.peek(), outputs.peek()) {
(Some(ih), Some(oh)) => match ih.cmp(oh) {
Ordering::Less => {
inputs.next();
}
Ordering::Greater => {
outputs.next();
}
Ordering::Equal => {
return Err(Error::CutThrough);
}
},
_ => {
return Ok(());
}
}
}
}
pub fn cut_through(inputs: &mut Vec<Input>, outputs: &mut Vec<Output>) -> Result<(), Error> {
// assemble output commitments set, checking they're all unique
outputs.sort_unstable();
if outputs.windows(2).any(|pair| pair[0] == pair[1]) {
return Err(Error::AggregationError);
}
inputs.sort_unstable();
let inputs_cpy = inputs.clone();
let outputs_cpy = outputs.clone();
let mut input_hashes = inputs_cpy.iter().map(|x| (x, x.hash())).peekable();
let mut output_hashes = outputs_cpy.iter().map(|x| (x, x.hash())).peekable();
let mut insert_input = inputs.iter_mut().enumerate();
let mut insert_output = outputs.iter_mut().enumerate();
let mut ncut = 0;
loop {
match (input_hashes.peek(), output_hashes.peek()) {
(Some((&i, ih)), Some((&o, oh))) => match ih.cmp(oh) {
Ordering::Less => {
*insert_input.next().unwrap().1 = i;
input_hashes.next();
}
Ordering::Greater => {
*insert_output.next().unwrap().1 = o;
output_hashes.next();
}
Ordering::Equal => {
ncut += 1;
output_hashes.next();
input_hashes.next();
}
},
_ => {
// Cut elements that have already been copied
if let Some((drain_idx, _)) = insert_output.next() {
outputs.drain(drain_idx..drain_idx + ncut);
}
if let Some((drain_idx, _)) = insert_input.next() {
inputs.drain(drain_idx..drain_idx + ncut);
}
return Ok(());
}
}
}
} |
p.s. do we have any benchmarking/profiling tools currently? |
I wonder if we punted on the hash caching for now (we know its a more general issue) - would this make the changes in this PR cleaner? Related - #2584. |
It's a little shorter -- if we think we'll solve hash caching elsewhere, then the below is very tolerable pub fn cut_through(inputs: &mut Vec<Input>, outputs: &mut Vec<Output>) -> Result<(), Error> {
// assemble output commitments set, checking they're all unique
outputs.sort_unstable();
if outputs.windows(2).any(|pair| pair[0] == pair[1]) {
return Err(Error::AggregationError);
}
inputs.sort_unstable();
let mut inputs_idx = 0;
let mut outputs_idx = 0;
let mut inputs_insert_idx = 0;
let mut outputs_insert_idx = 0;
let mut ncut = 0;
while inputs_idx < inputs.len() && outputs_idx < outputs.len() {
// if our cached hashes are stale
match inputs[inputs_idx].hash().cmp(&outputs[outputs_idx].hash()) {
Ordering::Less => {
inputs[inputs_insert_idx] = inputs[inputs_idx];
inputs_idx += 1;
inputs_insert_idx += 1;
}
Ordering::Greater => {
outputs[outputs_insert_idx] = outputs[outputs_idx];
outputs_idx += 1;
outputs_insert_idx += 1;
}
Ordering::Equal => {
ncut += 1;
inputs_idx += 1;
outputs_idx += 1;
}
}
}
// Cut elements that have already been copied
outputs.drain(outputs_insert_idx..outputs_insert_idx + ncut);
inputs.drain(inputs_insert_idx..inputs_insert_idx + ncut);
Ok(())
} For posterity and completeness, here's what a caching version looks like with iterators and no copying, using RefCell to fight the borrowchecker (can probably be cleaned up further). pub fn cut_through(inputs: &mut Vec<Input>, outputs: &mut Vec<Output>) -> Result<(), Error> {
// assemble output commitments set, checking they're all unique
outputs.sort_unstable();
if outputs.windows(2).any(|pair| pair[0] == pair[1]) {
return Err(Error::AggregationError);
}
inputs.sort_unstable();
let mut inputs_r = RefCell::new(vec![]);
let mut outputs_r = RefCell::new(vec![]);
std::mem::swap(inputs_r.get_mut(), inputs);
std::mem::swap(outputs_r.get_mut(), outputs);
assert!(inputs.len() == 0);
let (drain_idx_inputs, drain_idx_outputs) = {
let mut ncut = 0;
let mut drain_idx_inputs = 0;
let mut drain_idx_outputs = 0;
let inputs = &inputs_r;
let outputs = &outputs_r;
let mut input_hashes = (0..inputs.borrow().len())
.map(|x| (x, inputs.borrow()[x].hash()))
.peekable();
let mut insert_input = (0..).map(|x| {
move |y| {
let mut inputs = inputs.borrow_mut();
inputs[x] = inputs[y];
x
}
});
let mut output_hashes = (0..outputs.borrow().len())
.map(|x| (x, outputs.borrow()[x].hash()))
.peekable();
let mut insert_output = (0..).map(|x| {
move |y| {
let mut outputs = outputs.borrow_mut();
outputs[x] = outputs[y];
x
}
});
while let (Some((get_i, ih)), Some((get_o, oh))) =
(input_hashes.peek(), output_hashes.peek())
{
match ih.cmp(oh) {
Ordering::Less => {
drain_idx_inputs = insert_input.next().unwrap()(*get_i);
input_hashes.next();
}
Ordering::Greater => {
drain_idx_outputs = insert_output.next().unwrap()(*get_o);
output_hashes.next();
}
Ordering::Equal => {
ncut += 1;
output_hashes.next();
input_hashes.next();
}
}
}
(
drain_idx_inputs..(drain_idx_inputs + ncut),
drain_idx_outputs..(drain_idx_outputs + ncut),
)
};
// Cut elements that have already been copied
std::mem::swap(outputs_r.get_mut(), outputs);
std::mem::swap(inputs_r.get_mut(), inputs);
outputs.drain(drain_idx_outputs);
inputs.drain(drain_idx_inputs);
Ok(())
} |
I say we go with that most recent example, the Thanks for doing this exploratory work to look into different options here. 👍 And its not like the "convert them into sets and find the intersection" approach was particularly intuitive in terms of knowing what it was doing at a glance anyway. |
ea18c1a
to
b809be0
Compare
Ok. I pushed up new code for both, which I think is even a little simpler.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
In our current algorithms for
cut_through
andverify_cut_through
, we allocate memory and use some relatively inefficient APIs (e.g., retain).The algorithms presented should be more time-efficient (this is untested), but are certainly more space efficient.
Opening a PR now to get feedback on the approach.
I also am unclear that the sort_unstable_by_key I am doing in the beginning is correct -- is sorting by the commitments here the same order that the sort at the end of the function was previously doing? It seems like that's not accurate, but nothing came up running tests (perhaps a good test case to add). I think it just turns out that everywhere we call cut_through, we also pass false on verify_sort to TransactionBody::init, so it doesn't matter if we're mis-sorted.