Skip to content

SIMD-0219: Stricter ABI and Runtime Constraints#219

Merged
joncinque merged 9 commits into
solana-foundation:mainfrom
Lichtso:stricter-vm-verification-constraints
Sep 17, 2025
Merged

SIMD-0219: Stricter ABI and Runtime Constraints#219
joncinque merged 9 commits into
solana-foundation:mainfrom
Lichtso:stricter-vm-verification-constraints

Conversation

@Lichtso
Copy link
Copy Markdown
Contributor

@Lichtso Lichtso commented Jan 6, 2025

No description provided.

abrahem79

This comment was marked as spam.

@Lichtso Lichtso force-pushed the stricter-vm-verification-constraints branch from 9c1578c to eaf4fe6 Compare May 29, 2025 14:12
@Lichtso Lichtso force-pushed the stricter-vm-verification-constraints branch from eaf4fe6 to 1f3dda9 Compare June 6, 2025 15:37
Copy link
Copy Markdown
Contributor

@buffalojoec buffalojoec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I think the fixes for the frame gaps and the memory accesses across regions make sense. However, I pushed back on the stack/heap pointers for CPI verification.

Also, I'm wondering if it makes sense to break this into three SIMDs? It seems like we could do each in isolation, which might help speed up the approval & implementation process.

Comment on lines +64 to +78
- The following pointers must be on the stack or heap,
meaning their virtual address is inside `0x200000000..0x400000000`,
otherwise `SyscallError::InvalidPointer` must be thrown:
- The pointer in the array of `&[AccountInfo]` / `SolAccountInfo*`
- The `AccountInfo::data` field,
which is a `RefCell<&[u8]>` in `sol_invoke_signed_rust`
- The `AccountInfo::lamports` field,
which is a `RefCell<&u64>` in `sol_invoke_signed_rust`
- The following pointers must point to what was originally serialized in the
input regions by the program runtime,
otherwise `SyscallError::InvalidPointer` must be thrown:
- `AccountInfo::key` / `SolAccountInfo::key`
- `AccountInfo::owner` / `SolAccountInfo::owner`
- `AccountInfo::lamports` / `SolAccountInfo::lamports`
- `AccountInfo::data::ptr` / `SolAccountInfo::data`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, this constraint forces programs to waste limited stack/heap space just to hold these pointer structures, when we can probably validate these in a different way, and keep them in the input region.

On Agave, when we serialize all of the accounts into the input region, we actually hang onto a bunch of the pointers in SyscallContext/SerializedAccountMetadata. We can just store more pointer information in the syscall context, and perform a very quick pointer analysis:

  • Before each CPI frame is created
  • When the program execution finishes

We can catch the violations then, and throw the SyscallError::InvalidPointer. This way, instead of imposing location constraints, we simply validate that all AccountInfo pointers match their original pointers when the VM was created.

As mentioned in the proposal, when direct mapping is enabled, these kinds of pointer violations will throw immediately, which means pushing the pointers to the stack/heap would only be necessary as a temporary measure, until the DM feature is enabled. Afterwards, programs are stuck with this constraint for no reason.

Copy link
Copy Markdown
Contributor Author

@Lichtso Lichtso Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are misunderstanding. This SIMD says that the AccountInfo structs need to be on stack or heap, not that the data they point to has to be there. The data they point to remains in the account serialization. This is compatible with the SDK entrypoint as is, and has to be so that we don't break existing programs.

Think part of the confusion comes from the fact that AccountInfo::data and AccountInfo::lamports have two levels of indirection and both outer and inner pointer need to not be in the account serialization (the final deref however must be), what no program ever does anyway.

We can just store more pointer information in the syscall context, and perform a very quick pointer analysis

Yes exactly, that is how direct mapping is implemented and what the SIMD is ought to describe. Seems I should reformulate it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are misunderstanding. This SIMD says that the AccountInfo structs need to be on stack or heap, not that the data they point to has to be there.

I never considered this SIMD to be implying we should move any serialized data to the stack or heap. I believe I'm understanding that part correctly.

What I'm maybe misunderstanding is what the goal is by limiting where AccountInfo pointers can be created. The sol_invoke_signed syscall accepts only pointers.

fn sol_invoke_signed_c(
    instruction_addr: *const u8,
    account_infos_addr: *const u8, // <-- Pointer to slice of `SolAccountInfo`
    account_infos_len: u64,
    signers_seeds_addr: *const u8,
    signers_seeds_len: u64,
) -> u64

struct SolAccountInfo {
    key_addr: u64,
    lamports_addr: u64,
    data_len: u64,
    data_addr: u64,
    owner_addr: u64,
    rent_epoch: u64,
    is_signer: bool,
    is_writable: bool,
    executable: bool,
}

Since translation of accounts just deref's out of the RefCell trackers, we can consider the two translations (Rust and C) identical for serialized accounts.

Enforcing that each account pointer lives on stack or heap doesn't seem to actually solve the problem, which is the ability to pass a pointer to an invalid input-region SolAccountInfo into CPI. Furthermore, there are multiple perfectly valid reasons you'd pass a pointer to already-serialized legitimate accounts, such as avoiding copies.

My initial point of just evaluating all pointers against the SerializedAccountMetadata would solve this problem at the fundamental level. It can happen during translate_accounts or translate_account_infos, where you've already got the stack/heap check implemented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial point of just evaluating all pointers against the SerializedAccountMetadata would solve this problem at the fundamental level.

That is what the SIMD already proposes anyway: "The following pointers must point to what was originally serialized in the input regions by the program runtime" which refers to the SerializedAccountMetadata.

But, that alone is insufficient, CPI also writes to AccountInfo as it returns, it should never be writing to an account during the returning (returning takes multiple steps after all) as that can result in iterate-while-modifying issues and violates Rust borrow checker rules.

@Lichtso
Copy link
Copy Markdown
Contributor Author

Lichtso commented Jun 24, 2025

Also, I'm wondering if it makes sense to break this into three SIMDs? It seems like we could do each in isolation, which might help speed up the approval & implementation process.

This entire SIMD is describing the account data direct mapping feature, which is already implemented, just not feature gated. I know from a dApp perspective they look like a bunch thrown together, but in the program runtime they are all interconnected. So I don't think that splitting them would speed things up, but rather make it much more complex.

@buffalojoec
Copy link
Copy Markdown
Contributor

I know from a dApp perspective they look like a bunch thrown together, but in the program runtime they are all interconnected.

Actually I am talking about the program runtime!

Right now, the plan is to just switch on direct mapping all at once with a feature gate, which will enable all of these constraints - as well as the implementation itself - immediately. That's a lot of change area in one feature gate, and I think some contributors are getting a bit nervous about the "all at once" approach.

What I'm instead suggesting is that we introduce these constraints piecemeal. You can break VM constraints into three separate SIMDs, with three separate feature gates, and activate them one by one:

  1. Frame gaps
  2. Memory access violations
  3. AccountInfo pointer regions

This approach would allow us to more easily address any security issues that might arise from just one "phase" of direct mapping constraints.

Later, once all three are activated, you can make direct mapping a validator startup flag for a while, before we just remove the flag altogether and make it the de facto hot path.

agave validator <...> --direct-memory-mapping

Overall I think this seems like a much safer approach to getting this all in. What do you think?

@Lichtso
Copy link
Copy Markdown
Contributor Author

Lichtso commented Jul 8, 2025

I think the change to the stack frame gaps is risky and not necessary anymore for the current implementation of direct mapping so we could revert it.

The restrictions to the AccountInfo pointer regions could be done first in a separate SIMD, but we know that it affects next to no programs, and it seems to have little risk associated.

Finally, the tough nut is the memory access violations. These are interwoven with the implementation switching to direct mapping. It is hard to correctly emulate these restrictions without implementing direct mapping. Decoupling them into a feature gate of pure restrictions which still uses the copy based serialization path is complex.

@Lichtso Lichtso changed the title SIMD-0219: Stricter VM verification constraints SIMD-0219: Stricter VM constraints Jul 17, 2025
@Lichtso Lichtso force-pushed the stricter-vm-verification-constraints branch from 40f9989 to 56f2cc5 Compare July 18, 2025 19:33
@Lichtso Lichtso changed the title SIMD-0219: Stricter VM constraints SIMD-0219: Stricter ABI and Runtime Constraints Jul 18, 2025
@Lichtso Lichtso force-pushed the stricter-vm-verification-constraints branch from 56f2cc5 to 25933db Compare July 18, 2025 19:39
- The access is completely within the rest of the account growth budget of the
transaction, otherwise `InstructionError::InvalidRealloc` must be thrown.
- The access is completely within the current length of the account,
otherwise extend the the account with zeros to the maximum allowed by the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
otherwise extend the the account with zeros to the maximum allowed by the
otherwise extend the account with zeros to the maximum allowed by the

@Lichtso Lichtso force-pushed the stricter-vm-verification-constraints branch from 25933db to 2c6fc74 Compare August 1, 2025 21:30
- Heap (`0x300000000..0x400000000`)
- Instruction meta data
- Account meta data
- Account payload address space
Copy link
Copy Markdown
Contributor

@topointon-jump topointon-jump Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are each of the mapped metadata/data address ranges for each account considered separate regions, in the sense that you can't have a single access spanning multiple accounts? It would greatly simplify the implementation if this was the case.

Copy link
Copy Markdown
Contributor Author

@Lichtso Lichtso Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we dropped support for multi / cross region accesses from this SIMD, as in: We don't allow that anymore.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just to be clear - each account's data and metadata region will be considered a separate region?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, meta data and account payload a separate, meaning there are two regions per account

runtime never serialized
- `AccountInfo` structures can be overwritten by CPI during CPI, causing
complex side effects
- VM write access
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- VM write access
- VM memory access

says "write" but first sub bullet point says "read"


## Security Considerations

None.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are surely some security considerations (additional validation logic risks introducing more places where clients can diverge), but most of it is already implied by above.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the security risks come from the implementation of direct mapping, not so much from imposing these additional restrictions. We decided to split the behavior changes (constituting this SIMD) from the direct mapping implementation, see: #219 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the implementation of direct mapping itself have a separate feature gate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely yes, though it is kind of an implementation detail, as in we can't make a SIMD for it as that would be empty.

@Lichtso Lichtso force-pushed the stricter-vm-verification-constraints branch from 2c6fc74 to ebc45da Compare August 6, 2025 18:12
- VM memory access
- Bad read accesses go unnoticed as long as they stay within the reserved
address space, even if they leave the actual account payload
- Bad write accesses go unnoticed as long as the original value is restored
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't bad read/write accesses throw a segfault and have the VM return an error code?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is describing the current state: You can write to readonly accounts, as long as you write the existing value before the instruction ends. But, I can make it more clear that this only applies to account payload data.

@Lichtso Lichtso force-pushed the stricter-vm-verification-constraints branch from ebc45da to 74e59a8 Compare August 6, 2025 19:26

- The account is flagged as writable,
otherwise `InstructionError::ReadonlyDataModified` must be thrown
- The account is owned by the currently executed program,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- The access is completely within the current length of the account,
otherwise `InstructionError::AccountDataTooSmall` must be thrown.

shouldn't this also apply here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because the way reallocations / growing of accounts currently works in ABIv1 is that a program first writes beyond the end of the account, and then at the next CPI or the end of the instruction communicates the change of the account length to the program runtime.

@yufeng-jump
Copy link
Copy Markdown

The access is completely within the current length of the account, otherwise extend the account with zeros to the maximum allowed by the previous two checks.

Just to clarify, the way things are gonna be after this SIMD, assuming 10KB of realloc growth is allowed under the budget:

  • Loads into the realloc growth region without a preceding Store into said region is disallowed, and will result in an AccountDataTooSmall.
  • A Store into anywhere in the realloc growth region will result in logical zero-filling of the entire 10KB of realloc growth region, and a subsequent Load into anywhere in the entire 10KB of realloc growth region, whether lower or higher than the preceding Store in VM address space, will be allowed, so long as the Load does not go out of bounds of the original account payload+10KB. In short, a single Store anywhere in the realloc region initializes the entire realloc region.

Is that correct?

@Lichtso
Copy link
Copy Markdown
Contributor Author

Lichtso commented Aug 6, 2025

Yes that is correct. It essentially switches from eager to lazy initialization of the realloc padding. Also, the realloc padding will not have its own region but be part of the account payload region.

@joncinque
Copy link
Copy Markdown
Contributor

It looks like there's wide approval for this. I will merge the SIMD next Friday to give time for any final objections.

@HHoldingCrypto

This comment was marked as spam.

@joncinque joncinque merged commit 44d0610 into solana-foundation:main Sep 17, 2025
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.