Skip to content

SIMD-0268: Raise CPI Nesting Limit#268

Merged
jacobcreech merged 2 commits into
solana-foundation:mainfrom
Lichtso:raise-cpi-nesting-limit
Aug 1, 2025
Merged

SIMD-0268: Raise CPI Nesting Limit#268
jacobcreech merged 2 commits into
solana-foundation:mainfrom
Lichtso:raise-cpi-nesting-limit

Conversation

@Lichtso
Copy link
Copy Markdown
Contributor

@Lichtso Lichtso commented Mar 26, 2025

No description provided.

buffalojoec
buffalojoec previously approved these changes Jun 11, 2025
Copy link
Copy Markdown
Contributor

@buffalojoec buffalojoec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Comment on lines +45 to +48
## Drawbacks

The maximum amount of VMs stack and heap memory, which needs to be reserved and
zeroed out, would double.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what the hardware implications are here? Negligible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would only activate this after #219, which should free up a lot of memory and zeroing bandwidth.

Copy link
Copy Markdown
Contributor

@buffalojoec buffalojoec Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should probably gather some kind of benchmark to make sure. In theory this means the upper bound of entire execution memory throughput is doubled across the entire node. We don't want anyone getting DoS'ed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not allow one to execute more instructions per transaction, the limit there stays 64. And the processing power and memory bandwidth needed are the same, no matter if instructions in a transaction are sequential or nested.

Thus, the only thing this really raises is the amount of total memory allocated simultaneously (peak can be higher if instructions are nested). But the interaction with that memory stays the same; the VM just needs to "remember" more context.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lichtso can you update the SIMD to reflect the requirements of going in after 219 is activated and direct mapping is activated?

While you are at it can you update the meta data to status "accepted" and add a feature flag is possible?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus, the only thing this really raises is the amount of total memory allocated simultaneously (peak can be higher if instructions are nested). But the interaction with that memory stays the same; the VM just needs to "remember" more context.

But do we know the actual specifications of this? In other words, can we feasibly state that if someone does 64 instructions in a single transaction through 8 instructions (nested to 8 frames each) to some super large program every time, they won't blast a node's memory?

I suppose we can scratchpad it a bit. We have a stack, heap, input region, and program section, and you'd just multiple those by 8, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you can already have 4 VMs simultaneously (via CPI nesting) so a step up to 8 is only doubling it. The program section is allocated per executable, not per VM. And the input region will be a whole lot smaller once #219 (direct mapping) is active. Thus, only the stack and heap memory allocation would double.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a step up to 8 is only doubling it

When it comes to memory, doubling anything is always something to consider carefully.

I ran a little experiment, and put my findings in the root discussion for anyone who wants to check this SIMD out in the future.

@buffalojoec
Copy link
Copy Markdown
Contributor

I threw together a script to estimate the sBPF VM load a node could possibly incur per slot. I ran it across 50 mainnet-beta blocks.

I counted instructions (not transactions) that invoke sBPF programs (not builtins). Each instruction could initiate a CPI chain.

Instruction Stats Across 50 Blocks

  • Total sBPF instructions: 64_996
  • Average per block: 1_300
  • Maximum in a block: 3_122

Upper Bound on VM Memory Per Instruction (Post-Direct-Mapping)

Each CPI frame duplicates two memory regions:

Upper Bound on Memory Use by CPI Depth

  • 4 nested CPIs: 512 KiB * 4 = 2.0 MiB per instruction
  • 8 nested CPIs: 512 KiB * 8 = 4.0 MiB per instruction

Per-Block Memory Upper Bounds

Average Case (1300 instructions):

  • 4 CPIs: 2.0 MiB * 1_300 = 2.54 GiB
  • 8 CPIs: 4.0 MiB * 1_300 = 5.08 GiB

Max Case (3122 instructions):

  • 4 CPIs: 2.0 MiB * 3_122 = 6.10 GiB
  • 8 CPIs: 4.0 MiB * 3_122 = 12.20 GiB

Summary

So, raising the CPI limit from 4 to 8 means that nodes could face the following increases in theoretical upper bounds for total memory usage per block:

  • Average: 2.54 GiB → 5.08 GiB (∆ +2.54 GiB, or +2.73 GB)
  • Maximum: 6.10 GiB → 12.20 GiB (∆ +6.10 GiB, or +6.55 GB)

These are obviously worst-case projections assuming every instruction spawns the maximum CPI depth.

@buffalojoec
Copy link
Copy Markdown
Contributor

Just out of curiosity @jacobcreech how necessary is a nested depth of 8? Could we get away with 6? 8 seems excessive.

@Lichtso
Copy link
Copy Markdown
Contributor Author

Lichtso commented Jun 23, 2025

Nice work!

Average Case (1300 instructions):

4 CPIs: 2.0 MiB * 1_300 = 2.54 GiB
8 CPIs: 4.0 MiB * 1_300 = 5.08 GiB

Max Case (3122 instructions):

4 CPIs: 2.0 MiB * 3_122 = 6.10 GiB
8 CPIs: 4.0 MiB * 3_122 = 12.20 GiB

These numbers are only relevant for memory bandwidth (as that much needs to be zeroed out).
They are not representative of peak memory allocation because not all transactions coexist in memory at the same time. Yes, we have multiple threads but each one does serial processing. Thus for peak allocation you would have to multiply the number of transaction processor threads by the maximum CPI nesting (2.0 MiB or 4.0 MiB).

@buffalojoec
Copy link
Copy Markdown
Contributor

Thus for peak allocation you would have to multiply the number of transaction processor threads by the maximum CPI nesting (2.0 MiB or 4.0 MiB).

Right, yep. This is a very far-fetched, near-impossible worst-case upper bound.

@jacobcreech
Copy link
Copy Markdown
Contributor

how necessary is a nested depth of 8? Could we get away with 6? 8 seems excessive.

You need at least 5 in order to work with most smart wallet usecases. 8 gives a bit more room as people add additional CPI depth to their programs so as to not break the smart wallet usecases.

Could we get away with 6? Potentially as a first step.

Average Case (1300 instructions):

4 CPIs: 2.0 MiB * 1_300 = 2.54 GiB
8 CPIs: 4.0 MiB * 1_300 = 5.08 GiB

Max Case (3122 instructions):

4 CPIs: 2.0 MiB * 3_122 = 6.10 GiB
8 CPIs: 4.0 MiB * 3_122 = 12.20 GiB

Is this still the case after direct mapping? I thought this change was held until direct mapping landed first.

Copy link
Copy Markdown
Contributor

@buffalojoec buffalojoec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going with 8! 🚀

Copy link
Copy Markdown
Contributor

@ripatel-fd ripatel-fd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving considering the peak amount of memory allocated only raises by a small amount (few megabytes). This is probably mostly painful to API services and block explorers, but not so much for validator developers.

@jacobcreech jacobcreech merged commit 7280829 into solana-foundation:main Aug 1, 2025
2 checks passed
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jan 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants