Reduce deserialization allocations/copies#1197
Conversation
Calling data_mut().reserve(additional) used to result in _two_ allocs and memcpys: the first to unshare the underlying vector, and the second upon calling reserve since Arc::make_mut clones so it uses capacity == len. With this fix we manually "unshare" allocating with capacity = len + additional, therefore saving on extra alloc and memcpy.
We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). With this change we don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.
| // Note that we intentionally don't call self.make_data_mut() here. make_data_mut() will | ||
| // allocate + memcpy the current data if self.account is shared. We don't need the memcpy | ||
| // here tho because account.set_data_from_slice(data) is going to replace the content | ||
| // anyway. |
There was a problem hiding this comment.
Very nice catch finding that self.make_data_mut() is redundant. Totally makes sense.
| // Note that we intentionally don't call self.make_data_mut() here. make_data_mut() will | |
| // allocate + memcpy the current data if self.account is shared. We don't need the memcpy | |
| // here tho because account.set_data_from_slice(data) is going to replace the content | |
| // anyway. | |
| // Note that we intentionally don't call self.make_data_mut() here, since we are replacing | |
| // the contents transaction wide anyway with account.set_data_from_slice(data) |
There was a problem hiding this comment.
I'm +0 on the edit, so I think I'm going to keep my comment. I think it's worth being explicit about make_data_mut() doing a copy, since the method name itself doesn't immediately suggest that.
|
much like #1192 (comment), i did similar benmarking. i confirmed this improves favorably unified scheduler performance as well yet again. before(at 206a87a) (love bureaucratic work? lol. this result almost same as the result seen in after(at f180b08) blockstore-processor sees Now unified scheduler is basically faster twice than the blockstore processor. :) |
This removes an allocation and a copy in AccountSharedData::reserve(). Calling data_mut().reserve(additional) used to result in two allocs and memcpys: the first to unshare the underlying vector, and the second upon calling reserve since Arc::make_mut clones so it uses capacity == len. With this change we now manually "unshare" allocating with capacity = len + additional, therefore saving on extra alloc and memcpy.
Additionally we skip another extra copy in AccountSharedData::set_data_from_slice(). We used call make_data_mut() from set_data_from_slice() from the days when direct mapping couldn't deal with accounts getting shrunk. That changed in solana-labs#32649 (see the
if callee_account.capacity() < min_capacity check in cpi.rs:update_caller_account()). We now don't call make_data_mut() anymore before set_data_from_slice(), saving the cost of a memcpy since set_data_from_slice() overrides the whole account content anyway.These two changes improve deserialization perf.
Before:

After:

On my mnb node this reduces cost of deserialization from 16% to 5%. It improves overall replay perf by 6%, although I expect the saving to be even larger on nodes with a lot of stake which seem to struggle more with memory allocations.