Skip to content

[SIMD-0321]: VM Register 2 Instruction Data Pointer#321

Merged
jacobcreech merged 3 commits intosolana-foundation:mainfrom
buffalojoec:r2-instruction-data-ptr
Sep 18, 2025
Merged

[SIMD-0321]: VM Register 2 Instruction Data Pointer#321
jacobcreech merged 3 commits intosolana-foundation:mainfrom
buffalojoec:r2-instruction-data-ptr

Conversation

@buffalojoec
Copy link
Copy Markdown
Contributor

Provide a pointer to instruction data in VM register 2 (r2) at program
entrypoint, enabling direct access to instruction data without parsing the
serialized input region.

@buffalojoec buffalojoec changed the title [SIMD-0320]: VM Register 2 Instruction Data Pointer [SIMD-0321]: VM Register 2 Instruction Data Pointer Jul 11, 2025
@buffalojoec buffalojoec force-pushed the r2-instruction-data-ptr branch from 61ac6c6 to a6a212a Compare July 11, 2025 00:58
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md
Comment thread proposals/0321-vm-r2-instruction-data-pointer.md
@buffalojoec
Copy link
Copy Markdown
Contributor Author

Thanks for the review @deanmlittle! I've added a commit with your suggestions, adapted in some places.

Comment thread proposals/0321-vm-r2-instruction-data-pointer.md Outdated
Comment on lines +66 to +67
* The pointer in `r2` points to the first byte of the actual instruction data,
NOT the length field.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not point it to the length?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it should be pointing to the length field and that could then be offset to get the actual data pointer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a bad idea that complicates everything downstream from it. The major benefit of the r2 upgrade is the ability to statically derive instruction data from fixed offsets. Length is typically only accessed once and can be derived from an offset of -8. Conversely, having to add 8 all of your static offsets or burn an additional register/cu to store another pointer makes no sense.

Copy link
Copy Markdown
Contributor Author

@buffalojoec buffalojoec Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, originally I had length as well, but a few people in the discussion requested we point to the data. There are also areas where we subtract 8 to obtain a length in the SDK/tooling already, so this pattern isn't too unusual.

https://github.com/anza-xyz/solana-sdk/blob/df57ce256ba2b427c9e480d7e5e5993a2b7414bf/account-info/src/lib.rs#L170

Ultimately, it makes no difference to me. I think it should be whatever the people who will use it think is best.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversely, having to add 8 all of your static offsets or burn an additional register/cu to store another pointer makes no sense.

I mentioned this on the discussion @buffalojoec pointed. I'm not confident the gains will be noticeable. r2 stores function call arguments, so you'll lose the reference to the pointer as soon as you call a function. In that case, it won't make a difference if you have r2 pointing directly to the data or you've already offset it and stored it elsewhere. Either you'll have a spill or you'll use another register.

If you design an entry point without any function call, you may see a difference. But again, if this entry point has enough operations, you'll need to spill the value to the stack.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my proof-of-concept I demonstrated how a program can simply define its entrypoint to receive the r2 parameter, which would turn it into a stack variable, no?

#[no_mangle]
pub unsafe extern "C" fn entrypoint(
    input: *mut u8,
    instruction_data_offset: u64,
) -> u64 {
    let _ = instruction_data_offset;
    /* ... */
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. It will only be stored in the stack when needed. In a very simple function that needs no stack space, it will stay in a register. These cases aren't so common, though.

I saw your example mentions instruction_data_offset. It is supposed to be the pointer, right? not the offset from the starting pointer.


1. **Provide a pointer to instruction data length**: Store a pointer to the
instruction data length field in `r2`. However, providing a direct pointer to
the start of instruction data is more ergonomic.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

providing a direct pointer to the start of instruction data is more ergonomic.

Why is that? If I were to deserialize, I would expect to see the length first.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the point for discussion is whether it is wrong or right to subtract the pointer. The account info point you just mentioned needs the length first: https://github.com/anza-xyz/solana-sdk/blob/d708b8a829743013accc5f537709b5dd7b7fbc91/program-entrypoint/src/lib.rs#L429-L438.

Since in most cases, you'll need to validate the data pointer to ensure you are not reading garbage or doing an out of bounds read, you'll fetch the length first.

I believe the design should incentivize the correct management of pointers and memory regions, so my suggestion to keep pointing to the length first is based on that.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the design should incentivize the correct management of pointers and memory regions, so my suggestion to keep pointing to the length first is based on that.

This is a moot argument imo, I can apply the same logic and reach the opposite conclusion: since we should incentivize correct management of pointers and memory regions, my suggestion is to make it so that people who're hand writing assembly don't have to offset all accesses to instruction data by 8, but only do it once for the data length.

I'm not actually making that argument to be clear. The real argument is that the people who are actually hand writing assembly are telling us that their code would be nicer and less error prone if they could offset just once. Sure they can offset every ix data access and materially it doesn't change anything, but API wise - if you consider bytecode an API, which it is - which one is better and more ergonomic?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since when are we optimizing for hand writing assembly over compiled code?

My personal preference is to go with positive offsets (point to the length field) as that essentially makes it a Flexible Array Member, a well established concept.

The whole discussion is bike shedding, the truth is that neither direction has a significant impact on the outcome. Because memory ops have a built-in offset field, it does not even cost more or less CUs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since when are we optimizing for hand writing assembly over compiled code?

since people started preferring hand writing assembly to using our tool chain. we can either start listening to them or be usurped by them

our goal should be to get people to stop hand writing assembly, not by subjective roadblocks, but by providing tooling that makes the effort undesirable

if a decision results in indifference to compiled code, but an improvement to the assembly, always take the improvement. even for our compiled code, one should be able to disassemble it with minimal cognitive overhead

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have substantially improved the output quality of our toolchain in the last months. I agree that we should not try to hinder assembly / disassembly. However, we should also not prioritize it over compiled code. There are multiple dimensions to every decision: Here the performance is indifferent, but the convenience / readability of the code is either better for the assembly or better for the higher level language. And I will choose better for higher level languages any day.

Copy link
Copy Markdown
Contributor

@deanmlittle deanmlittle Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad that aspirationally you want to prioritize compiled languages over assembly. That is a sensible goal. However, here you have an opportunity to prioritize both without any tradeoffs. It has now been explained to you three times by three different people that, in this case, you can prioritize both asm and compiled languages with zero cost to performance. Does "prioritizing compiled languages" actually just mean "intentionally designing apis that are hostile to asm to discourage its use" to you?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I said: "we should not try to hinder assembly / disassembly"
Yet, you ask: "Does "prioritizing compiled languages" actually just mean "intentionally designing apis that are hostile to asm to discourage its use" to you?"

It has now been explained to you three times by three different people that, in this case, you can prioritize both asm and compiled languages with zero cost to performance.

Yep, again let me repeat myself, as you still have not gotten my point, I agree that "here the performance is indifferent", but the part you still have not understood is "the convenience / readability of the code is either better for the assembly or better for the higher level language".

Let's continue this discussion in the other thread #321 (comment) as you have now finally, for the first time, considered the other side and thought about how one would write this in a higher level language.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine thinking someone with 100 open source Rust repos where roughly ~1/4 of the Solana ecosystem learned how to write onchain programs only just now considered how this would work in Rust. There is already an example attached to the SIMD. You just didn't bother to read it.

Comment on lines +107 to +109
Programs should read and validate the instruction data length (stored at `r2 - 8`)
before accessing data via the `r2` pointer. Failing to check the length could
result in reading unintended memory contents or out-of-bounds access attempts.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly because of this, I would expect to read the length first. So pointing to the length would be more straightforward.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point! I can go either way. My original PoC was pointing to length, until a few devs requested the actual data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for a pointer to the length.

Copy link
Copy Markdown
Contributor

@deanmlittle deanmlittle Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pushback is a telling sign that neither of you understand what people who actually write bytecode or high performance libraries for this chain want or need. The proposal to point to the start of the data is the right choice. Adding an offset of 8 to every single offset of every field in the instruction data, or holding a second pointer to re-zero the offsets to the start of the instruction data instead of subtracting an offset of 8 from the pointer to get the length of the IX data one time if needed is an unergonomic, moronic idea.

Copy link
Copy Markdown
Contributor

@LucasSte LucasSte Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to say that you don't need to add 8 at every single offset since you can offset r2 after reading the length add64 r2, 8, and work with that throughout your entire code.

Then, I recalled that you don't need to use additions to offset the pointer, and instead rely on the load from register + offset. That way you can save CUs by avoiding separate addition and subtractions. If your object is to handwrite assembly, then having the pointer directly to the data helps with that. If you are writing Rust, the compiler will combine the pointer offsets and the loads, so the initial pointing won't make any difference (i.e. it will propagate the +8 for you through constant folding).

Copy link
Copy Markdown
Contributor

@deanmlittle deanmlittle Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recommendation:

ldxdw r3, [r2-8]   ; load ix data length
jne r3, 1, err     ; if ix data length ≠ 1, error
ldxdb r3, [r2+0]   ; load ix discriminator
jne r3, 1, ix_one  ; jump to ix one

Your recommendation:

ldxdw r3, [r2+0]  ; load ix data length
jne r3, 1, err    ; if ix data length ≠ 1, error
add64 r2, 8       ; add 8 to r2
ldxdb r3, [r2+0]  ; load ix discriminator
jne r3, 1, ix_one ; jump to ix one

Also your recommendation:

ldxdw r3, [r2+0]  ; load ix data length
jne r3, 1, err    ; if ix data length ≠ 1, error
ldxdb r3, [r2+8]  ; load ix discriminator
jne r3, 1, ix_one ; jump to ix one

Your add64 recommendation is objectively less performant, and your +8 offset recommendation is objectively less ergonomic. Just point to the data. It solves everything. It's how our rust libraries already treat account length/account data. This is not rocket science.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's how our rust libraries already treat account length/account data.

If you mean the SDK, that is a horrible artifact that happened because the account realloc feature was shoehorned in as we have to retain both lengths: The original length and the current length. The entrypoint shuffles the original length into some unused padding space: https://github.com/anza-xyz/solana-sdk/blob/94b5923e5572963d90572a4dfd47a4c77c817eaa/program-entrypoint/src/lib.rs#L465 . And that is then later it is accessed again by some obscure pointer arithmetic: https://github.com/anza-xyz/solana-sdk/blob/94b5923e5572963d90572a4dfd47a4c77c817eaa/account-info/src/lib.rs#L94 so that it can overwrite the original place with the actual length: https://github.com/anza-xyz/solana-sdk/blob/94b5923e5572963d90572a4dfd47a4c77c817eaa/account-info/src/lib.rs#L170 (the line you linked earlier, I think).

It is not a good design by any stretch and I would not use it as inspiration of how we should do things in the future.

and your +8 offset recommendation is objectively less ergonomic

What about the macro suggestion (in my other comment) if you think that all these +8 are too ugly?

It solves everything

From your perspective of writing assembler. You have not illuminated how this would look like for higher level languages.

Copy link
Copy Markdown
Contributor

@deanmlittle deanmlittle Aug 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it has been illuminated multiple times, but if you would like me to repeat it once more to help you understand why this costs nothing and is really no big deal, here is an extremely unimaginative, completely untested version I wrote in ~30 seconds:

pub unsafe extern "C" fn e(input: *mut u8, instruction_data: *const u8) -> ProgramResult {
    /// Safety:
    /// This is sound due to several conditions of the VM:
    /// 1. Instruction length always immediately precedes the data in the input region
    /// 2. VM always rigidly defines the length of our instruction data
    /// 3. Data is always properly aligned due to account data alignment padding.
    let instruction_data = unsafe { 
        core::slice::from_raw_parts(
            instruction_data, 
            *(instruction_data.sub(8) as u64) as usize
        )
    };

    let (discriminator, instruction_data) = instruction_data.split_first()
        .ok_or(ProgramError::InvalidInstructionData)?;
    match discriminator {
        0 => instruction_zero_entrypoint(input, instruction_data),
        1 => instruction_one_entrypoint(input, instruction_data),
        2 => instruction_two_entrypoint(input, instruction_data),
        3 => instruction_three_entrypoint(input, instruction_data),
        _ => Err(ProgramError::InvalidInstructionData)
    }
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly as I feared, quoting myself again: "How do you express a structure with a negative offset? That only works with manual pointer arithmetic I think."

Now, imagine it was the other way around with a positive offset, then you could define a structure, have the parameter be a reference (instead of a raw pointer) to that structure type and you might not need any unsafe at all.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, the status quo is very scary. 🙃 https://docs.rs/solana-program-entrypoint/3.0.0/src/solana_program_entrypoint/lib.rs.html#370

I see no point in wasting everyone's time any further trying to convince you to see reason.

@t-nelson I recommend merging this SIMD as is.

Comment on lines +123 to +124
This feature is NOT backwards compatible for any programs that depend on the
uninitialized/garbage data previously in `r2`.
Copy link
Copy Markdown
Contributor

@topointon-jump topointon-jump Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this, this change should be feature-gated to reduce the risk of a divergence. It's possible that, for a given client implementation, the uninitialized data is the same across all the nodes. If a program is taking advantage of this (accidental or otherwise) and we change this value, then we could see a divergence.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only uninitialized data from the PoV of the program, the runtime should always have set it to zero so far. Yes, the change needs a feature gate, but we have tested that it would work for all currently used programs on MNB.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did plan to feature-gate this change. I thought all SIMDs were feature gates. If I need to call that out explicitly in the proposal, I can.

Comment on lines +40 to +42
When the feature is activated, the VM shall set register 2 (`r2`) to contain a
pointer to the beginning of the instruction data section within the input
region. The instruction data format remains unchanged:
Copy link
Copy Markdown
Contributor

@topointon-jump topointon-jump Jul 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long must the pointer retain this value for? The whole execution, or only for a certain period of time/number of instructions? What is considered "at program entrypoint" - just the first instruction? Do you mind clarifying in the SIMD? Thank-you!

Apologies if I missed this anywhere.

Copy link
Copy Markdown
Contributor

@Lichtso Lichtso Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the first instruction, r2 is the second argument register, thus a parameter of the entrypoint function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add this to the proposal!

@Lichtso
Copy link
Copy Markdown
Contributor

Lichtso commented Aug 23, 2025

@deanmlittle we already had a PR for a while and will implement it once FD approves, maybe even backport it to v3.0, if that is what you are worried about.

However, I do consider your defamation and swearing campaign over at X / Twitter absolutely inappropriate.

@Lichtso
Copy link
Copy Markdown
Contributor

Lichtso commented Aug 24, 2025

I see no point in wasting everyone's time any further trying to convince you to see reason.

Approved as continuing this discussion is indeed a waste of time, letting the "reason" part pass without further comment.

@jacobcreech
Copy link
Copy Markdown
Contributor

I appreciate all the discussion here and the passion everyone has for pushing Solana in the right direction. That said, I'd like to avoid any personal attacks in the future of other core contributors. It has no place in this process and only distracts us.

Merging this PR as we've received all approvals necessary to move forward.

@jacobcreech jacobcreech merged commit 38c0e57 into solana-foundation:main Sep 18, 2025
2 checks passed
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators Jan 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants