SIMD-0268: Raise CPI Nesting Limit by Lichtso · Pull Request #268 · solana-foundation/solana-improvement-documents

Lichtso · 2025-03-26T10:57:50Z

No description provided.

buffalojoec

Lgtm

buffalojoec · 2025-06-11T02:38:39Z

+## Drawbacks
+
+The maximum amount of VMs stack and heap memory, which needs to be reserved and
+zeroed out, would double.


Do we know what the hardware implications are here? Negligible?

We would only activate this after #219, which should free up a lot of memory and zeroing bandwidth.

I feel like we should probably gather some kind of benchmark to make sure. In theory this means the upper bound of entire execution memory throughput is doubled across the entire node. We don't want anyone getting DoS'ed.

This does not allow one to execute more instructions per transaction, the limit there stays 64. And the processing power and memory bandwidth needed are the same, no matter if instructions in a transaction are sequential or nested.

Thus, the only thing this really raises is the amount of total memory allocated simultaneously (peak can be higher if instructions are nested). But the interaction with that memory stays the same; the VM just needs to "remember" more context.

@Lichtso can you update the SIMD to reflect the requirements of going in after 219 is activated and direct mapping is activated?

While you are at it can you update the meta data to status "accepted" and add a feature flag is possible?

Thus, the only thing this really raises is the amount of total memory allocated simultaneously (peak can be higher if instructions are nested). But the interaction with that memory stays the same; the VM just needs to "remember" more context.

But do we know the actual specifications of this? In other words, can we feasibly state that if someone does 64 instructions in a single transaction through 8 instructions (nested to 8 frames each) to some super large program every time, they won't blast a node's memory?

I suppose we can scratchpad it a bit. We have a stack, heap, input region, and program section, and you'd just multiple those by 8, right?

Well, you can already have 4 VMs simultaneously (via CPI nesting) so a step up to 8 is only doubling it. The program section is allocated per executable, not per VM. And the input region will be a whole lot smaller once #219 (direct mapping) is active. Thus, only the stack and heap memory allocation would double.

a step up to 8 is only doubling it

When it comes to memory, doubling anything is always something to consider carefully.

I ran a little experiment, and put my findings in the root discussion for anyone who wants to check this SIMD out in the future.

buffalojoec · 2025-06-23T07:54:40Z

I threw together a script to estimate the sBPF VM load a node could possibly incur per slot. I ran it across 50 mainnet-beta blocks.

I counted instructions (not transactions) that invoke sBPF programs (not builtins). Each instruction could initiate a CPI chain.

Instruction Stats Across 50 Blocks

Total sBPF instructions: 64_996
Average per block: 1_300
Maximum in a block: 3_122

Upper Bound on VM Memory Per Instruction (Post-Direct-Mapping)

Each CPI frame duplicates two memory regions:

Max stack size: 256 KiB
Max heap size: 256 KiB
Total per CPI frame: 256 + 256 = 512 KiB

Upper Bound on Memory Use by CPI Depth

4 nested CPIs: 512 KiB * 4 = 2.0 MiB per instruction
8 nested CPIs: 512 KiB * 8 = 4.0 MiB per instruction

Per-Block Memory Upper Bounds

Average Case (1300 instructions):

4 CPIs: 2.0 MiB * 1_300 = 2.54 GiB
8 CPIs: 4.0 MiB * 1_300 = 5.08 GiB

Max Case (3122 instructions):

4 CPIs: 2.0 MiB * 3_122 = 6.10 GiB
8 CPIs: 4.0 MiB * 3_122 = 12.20 GiB

Summary

So, raising the CPI limit from 4 to 8 means that nodes could face the following increases in theoretical upper bounds for total memory usage per block:

Average: 2.54 GiB → 5.08 GiB (∆ +2.54 GiB, or +2.73 GB)
Maximum: 6.10 GiB → 12.20 GiB (∆ +6.10 GiB, or +6.55 GB)

These are obviously worst-case projections assuming every instruction spawns the maximum CPI depth.

buffalojoec · 2025-06-23T07:56:32Z

Just out of curiosity @jacobcreech how necessary is a nested depth of 8? Could we get away with 6? 8 seems excessive.

Lichtso · 2025-06-23T09:38:09Z

Nice work!

Average Case (1300 instructions):

4 CPIs: 2.0 MiB * 1_300 = 2.54 GiB
8 CPIs: 4.0 MiB * 1_300 = 5.08 GiB

Max Case (3122 instructions):

4 CPIs: 2.0 MiB * 3_122 = 6.10 GiB
8 CPIs: 4.0 MiB * 3_122 = 12.20 GiB

These numbers are only relevant for memory bandwidth (as that much needs to be zeroed out).
They are not representative of peak memory allocation because not all transactions coexist in memory at the same time. Yes, we have multiple threads but each one does serial processing. Thus for peak allocation you would have to multiply the number of transaction processor threads by the maximum CPI nesting (2.0 MiB or 4.0 MiB).

buffalojoec · 2025-06-23T10:38:43Z

Thus for peak allocation you would have to multiply the number of transaction processor threads by the maximum CPI nesting (2.0 MiB or 4.0 MiB).

Right, yep. This is a very far-fetched, near-impossible worst-case upper bound.

jacobcreech · 2025-07-09T02:43:37Z

how necessary is a nested depth of 8? Could we get away with 6? 8 seems excessive.

You need at least 5 in order to work with most smart wallet usecases. 8 gives a bit more room as people add additional CPI depth to their programs so as to not break the smart wallet usecases.

Could we get away with 6? Potentially as a first step.

Average Case (1300 instructions):

4 CPIs: 2.0 MiB * 1_300 = 2.54 GiB
8 CPIs: 4.0 MiB * 1_300 = 5.08 GiB

Max Case (3122 instructions):

4 CPIs: 2.0 MiB * 3_122 = 6.10 GiB
8 CPIs: 4.0 MiB * 3_122 = 12.20 GiB

Is this still the case after direct mapping? I thought this change was held until direct mapping landed first.

buffalojoec

We're going with 8! 🚀

ripatel-fd

Approving considering the peak amount of memory allocated only raises by a small amount (few megabytes). This is probably mostly painful to API services and block explorers, but not so much for validator developers.

First draft

f6a888a

Lichtso force-pushed the raise-cpi-nesting-limit branch from 8e5dfc5 to f6a888a Compare March 26, 2025 11:00

github-actions Bot mentioned this pull request Mar 31, 2025

Upstream Updates - Mon Mar 31 00:16:12 UTC 2025 smartcontractkit/chainlink-solana#1162

Closed

buffalojoec previously approved these changes Jun 11, 2025

View reviewed changes

Update status and requirements.

c7f2448

Lichtso dismissed buffalojoec’s stale review via c7f2448 June 13, 2025 15:52

buffalojoec approved these changes Aug 1, 2025

View reviewed changes

ripatel-fd approved these changes Aug 1, 2025

View reviewed changes

topointon-jump approved these changes Aug 1, 2025

View reviewed changes

jacobcreech approved these changes Aug 1, 2025

View reviewed changes

jacobcreech merged commit 7280829 into solana-foundation:main Aug 1, 2025
2 checks passed

github-actions Bot mentioned this pull request Aug 4, 2025

Upstream Updates - Mon Aug 4 00:19:45 UTC 2025 smartcontractkit/chainlink-solana#1312

Open

github-actions Bot locked as resolved and limited conversation to collaborators Jan 12, 2026

Conversation

Lichtso commented Mar 26, 2025

Uh oh!

buffalojoec left a comment

Choose a reason for hiding this comment

Uh oh!

buffalojoec Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Lichtso Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

buffalojoec Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lichtso Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Benhawkins18 Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

buffalojoec Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Lichtso Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

buffalojoec Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

buffalojoec commented Jun 23, 2025

Instruction Stats Across 50 Blocks

Upper Bound on VM Memory Per Instruction (Post-Direct-Mapping)

Upper Bound on Memory Use by CPI Depth

Per-Block Memory Upper Bounds

Average Case (1300 instructions):

Max Case (3122 instructions):

Summary

Uh oh!

buffalojoec commented Jun 23, 2025

Uh oh!

Lichtso commented Jun 23, 2025

Uh oh!

buffalojoec commented Jun 23, 2025

Uh oh!

jacobcreech commented Jul 9, 2025

Uh oh!

buffalojoec left a comment

Choose a reason for hiding this comment

Uh oh!

ripatel-fd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

buffalojoec Jun 11, 2025 •

edited

Loading