SIMD-0268: Raise CPI Nesting Limit#268
Conversation
8e5dfc5 to
f6a888a
Compare
| ## Drawbacks | ||
|
|
||
| The maximum amount of VMs stack and heap memory, which needs to be reserved and | ||
| zeroed out, would double. |
There was a problem hiding this comment.
Do we know what the hardware implications are here? Negligible?
There was a problem hiding this comment.
We would only activate this after #219, which should free up a lot of memory and zeroing bandwidth.
There was a problem hiding this comment.
I feel like we should probably gather some kind of benchmark to make sure. In theory this means the upper bound of entire execution memory throughput is doubled across the entire node. We don't want anyone getting DoS'ed.
There was a problem hiding this comment.
This does not allow one to execute more instructions per transaction, the limit there stays 64. And the processing power and memory bandwidth needed are the same, no matter if instructions in a transaction are sequential or nested.
Thus, the only thing this really raises is the amount of total memory allocated simultaneously (peak can be higher if instructions are nested). But the interaction with that memory stays the same; the VM just needs to "remember" more context.
There was a problem hiding this comment.
@Lichtso can you update the SIMD to reflect the requirements of going in after 219 is activated and direct mapping is activated?
While you are at it can you update the meta data to status "accepted" and add a feature flag is possible?
There was a problem hiding this comment.
Thus, the only thing this really raises is the amount of total memory allocated simultaneously (peak can be higher if instructions are nested). But the interaction with that memory stays the same; the VM just needs to "remember" more context.
But do we know the actual specifications of this? In other words, can we feasibly state that if someone does 64 instructions in a single transaction through 8 instructions (nested to 8 frames each) to some super large program every time, they won't blast a node's memory?
I suppose we can scratchpad it a bit. We have a stack, heap, input region, and program section, and you'd just multiple those by 8, right?
There was a problem hiding this comment.
Well, you can already have 4 VMs simultaneously (via CPI nesting) so a step up to 8 is only doubling it. The program section is allocated per executable, not per VM. And the input region will be a whole lot smaller once #219 (direct mapping) is active. Thus, only the stack and heap memory allocation would double.
There was a problem hiding this comment.
a step up to 8 is only doubling it
When it comes to memory, doubling anything is always something to consider carefully.
I ran a little experiment, and put my findings in the root discussion for anyone who wants to check this SIMD out in the future.
|
I threw together a script to estimate the sBPF VM load a node could possibly incur per slot. I ran it across 50 mainnet-beta blocks. I counted instructions (not transactions) that invoke sBPF programs (not builtins). Each instruction could initiate a CPI chain. Instruction Stats Across 50 Blocks
Upper Bound on VM Memory Per Instruction (Post-Direct-Mapping)Each CPI frame duplicates two memory regions:
Upper Bound on Memory Use by CPI Depth
Per-Block Memory Upper BoundsAverage Case (1300 instructions):
Max Case (3122 instructions):
SummarySo, raising the CPI limit from 4 to 8 means that nodes could face the following increases in theoretical upper bounds for total memory usage per block:
These are obviously worst-case projections assuming every instruction spawns the maximum CPI depth. |
|
Just out of curiosity @jacobcreech how necessary is a nested depth of 8? Could we get away with 6? 8 seems excessive. |
|
Nice work!
These numbers are only relevant for memory bandwidth (as that much needs to be zeroed out). |
Right, yep. This is a very far-fetched, near-impossible worst-case upper bound. |
You need at least 5 in order to work with most smart wallet usecases. 8 gives a bit more room as people add additional CPI depth to their programs so as to not break the smart wallet usecases. Could we get away with 6? Potentially as a first step.
Is this still the case after direct mapping? I thought this change was held until direct mapping landed first. |
buffalojoec
left a comment
There was a problem hiding this comment.
We're going with 8! 🚀
ripatel-fd
left a comment
There was a problem hiding this comment.
Approving considering the peak amount of memory allocated only raises by a small amount (few megabytes). This is probably mostly painful to API services and block explorers, but not so much for validator developers.
No description provided.