Conversation
…per memory allocation
|
cc @DaniPopes who may also have thoughts |
|
In general very supportive of this, tbh I would have expected the same or better performance. Will see to play with it a little bit to check it out. Would like to see if we can remove Rc<RefCell<>>, maybe the overhead that we see is related to the dynamic borrow checks inside refcell. Snailtracer should be a good benchmark to check as init of evm is small in comparison of the work that interpreter is doing. |
I'm very happy you like it! After this PR I can try to do a
I've found a way to remove the |
Same as #660 (comment) |
|
#582 was merged, can you rebase this PR? @lorenzofero |
I can't really perform a rebase because I have conflicts in multiple commits and it's really cumbersome to get out. I'm working on a merge with conflicts resolution 👍 . I hope it's not a problem! It will take some time though, I would like to keep as close as possible how you managed some memory functions |
|
Hey @rakita @DaniPopes @gakonst, now tests are passing if you want to check it out again. I managed to keep all Dani's updates of memory related functions inside Lastly, is there a new way to run benches now? I've seen in the readme that something has changed but it seems not up to date. Thanks! |
|
You can run |
|
This is great btw, I think we can get perf regression down to neutral or positive |
6947afc to
0e349b5
Compare
Thanks and thank you for the very detailed feedback you provided; I'll try to resolve all the comments as soon as possible |
572e15b to
07ece23
Compare
DaniPopes
left a comment
There was a problem hiding this comment.
I think this is what's causing tests to fail
|
Today if I have time I'll take a look at tests failing with fn set_ptr(&mut self, checkpoint: usize) {
if self.data.len() > 0 {
assume!(checkpoint < self.data.len());
self.current_ptr = unsafe { self.data.as_mut_ptr().add(checkpoint) };
}
} |
e94d183 to
23a1191
Compare
|
Hi @lorenzofero, how is the performance on this? In essence, I expected this to be more impactful but if this is not the case unfortunately there is no good reason to include it. |
|
Hey @rakita I tried to run again Cachegrin after Dani's first measurements
Which is around -3.5% in performance. After current changes, this is what I get:
which is around -2.7%. @DaniPopes maybe I can try with this now to see if we can squeeze a little bit more
Yeah I get that the more complex the transaction, the better the performance. For simple stuff on average I expect it to be in some way worse, as benches suggest. |
|
Hey, I hope you don't find all of this comments pedantic. I tried to use a different model for the shared memory similar to ipsilon/evmone#481 (comment) that you can see here on my fork: thedevbirb#2. This model has some benefits imo:
Here is the result of running If you like it, I can bring the changes to this branch or in a new pr |
It is fine, this is a tough decision as you invested a lot of time into this, and the benefits are unfortunately small, I have a few examples of this where the idea didn't pan out as expected (gas block i am looking at you, we had performance boost of 5-7% and the code was merged but it was very hard to use, so after few months it was reverted) so in the end it was more like a research effort to get data.
This seems better, not all transactions are going to be 1024 calls deep or use 30M gas, so this is more reasonable. But with this,
And spilling context (in this case memory) from one Interpreter to another would be fine if we would gain something significant on the other hand having just one place for memory opens things for new ideas. This is probably going to look better if we switch from recursive calls to loop calls (if the loop approach turns out okay). I am on the fence here but let us include it, will review it in detail in the next few days (@DaniPopes already did an amazing job there). |
|
Ok I brought the changes here for reviewing. However, I was wondering if it was worth it to open a new PR which supersedes this one with this new model, since both commit history and github conversation here are becoming a little messy imo. Let me know what you think about it! |
It is messy but it I fine for it to be in one place, so people can follow what is happening. |
rakita
left a comment
There was a problem hiding this comment.
We should reintroduce the memory limit.
Other parts look good!
| crate::interpreter::next_multiple_of_32($offset.saturating_add($len)) | ||
| { | ||
| #[cfg(feature = "memory_limit")] | ||
| if new_size > ($interp.memory_limit as usize) { |
There was a problem hiding this comment.
Should we reintroduce the memory limit?
| /// Memory checkpoints for each depth | ||
| checkpoints: Vec<usize>, | ||
| /// How much memory has been used in the current context | ||
| current_len: usize, |
There was a problem hiding this comment.
We can probably put memory limit here, it feels like a better place
| /// Get the last memory checkpoint | ||
| #[inline(always)] | ||
| fn last_checkpoint(&self) -> usize { | ||
| *self.checkpoints.last().unwrap_or(&0) |
There was a problem hiding this comment.
| *self.checkpoints.last().unwrap_or(&0) | |
| self.checkpoints.last().cloned().unwrap_or_default() |
cloned would remove the reference from the option.
It should be good now! |
|
I'm very happy we've found the right approach for this and got some good performance improvements. Thanks a lot @rakita and @DaniPopes for all the support in the last month; it has been a pleasure working with you. |
I tried to give a shot to #445 just for fun and as a challenge with this PR.
This introduces a new struct called
SharedMemorywhich is indeed shared between calls.About the implementation
Memoryimplementationcurrent_slicewhich is internally used to refer to the portion of data reserved for the current context. This requires twounsafemethodsget_current_sliceandget_current_slice_mutwhich deferences the raw pointercurrent_slicepointer is updated when entering a new context (which is when a newInterpreterinstance is created) using thenew_context_memorymethod and when exiting it, which happens when thereturn_valuemethod ofInterpreteris calledPerformance
I'd like a feedback on this because:
Anyway, this is the result on running
cargo bench --allonmainand then on my branch:I also tried to run cachegrind as explained here #582 and it is indeed slower, of about a 10% or less. Gas limit required is between 2^22 and 2^23. Maybe also here I'm allocating memory which is not used very much. It depends on the bytecode of the snailtracer.
Thanks in advance for any feedback!