Skip to content

Improve program entrypoint#176

Merged
febo merged 33 commits into
mainfrom
febo/improve-entrypoint
Jul 15, 2025
Merged

Improve program entrypoint#176
febo merged 33 commits into
mainfrom
febo/improve-entrypoint

Conversation

@febo
Copy link
Copy Markdown
Collaborator

@febo febo commented Jun 9, 2025

Problem

The current program entrypoint does not translate to very efficient bytecode, as can be seen from an assembly implementation of entrypoint – e.g., cavey's asmr.

Solution

Tweak the implementation to improve its efficiency, borrowing ideas from the assembly implementation (credits to @cavemanloverboy).

One key difference is that the entrypoint includes inlined code to parse accounts, which reduces the number of jumps required and therefore reduces CUs.

Results

image

@febo febo force-pushed the febo/improve-entrypoint branch from c235845 to aa4201c Compare June 9, 2025 13:13
@cavemanloverboy
Copy link
Copy Markdown

i just found two more 1 CU/account optimizations, so hold your horses here. One of them might be unusable unfortunately but we will see.

@febo
Copy link
Copy Markdown
Collaborator Author

febo commented Jun 9, 2025

i just found two more 1 CU/account optimizations, so hold your horses here. One of them might be unusable unfortunately but we will see.

Nice – this one will go after #166 goes in.

@febo febo mentioned this pull request Jun 10, 2025
Copy link
Copy Markdown
Collaborator

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall! Just some small comments

Comment on lines +166 to +214
/// Align a pointer to the BPF alignment of `u128`.
macro_rules! align_pointer {
($ptr:ident) => {
(($ptr as usize + (BPF_ALIGN_OF_U128 - 1)) & !(BPF_ALIGN_OF_U128 - 1)) as *mut u8
};
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this align to u128? Shouldn't it align to u64 per the serialization spec?

Copy link
Copy Markdown
Collaborator Author

@febo febo Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the name of the constant on the SDK. 😊 I think it is meant to represent the alignment of an u128 in BPF, which is 8. We could rename it if that makes it more clear – agree that the name is a bit confusing.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly want to make sure I'm not missing anything, but I guess you could use any usize whose value is equal to 8 😅

Comment thread sdk/pinocchio/src/entrypoint/mod.rs Outdated
Comment thread sdk/pinocchio/src/entrypoint/mod.rs Outdated
@febo febo force-pushed the febo/entrypoint-cleanup branch from 9361601 to fad2c71 Compare June 11, 2025 13:32
@febo
Copy link
Copy Markdown
Collaborator Author

febo commented Jun 12, 2025

@cavemanloverboy Found a few more savings with a small tweak in the code.

@febo febo force-pushed the febo/entrypoint-cleanup branch from f6f4596 to e8b191a Compare June 13, 2025 11:06
Base automatically changed from febo/entrypoint-cleanup to main June 14, 2025 00:13
@febo febo force-pushed the febo/improve-entrypoint branch 2 times, most recently from ad37c78 to a9285b4 Compare June 15, 2025 00:00
@febo febo requested a review from joncinque June 15, 2025 09:50
@febo febo marked this pull request as ready for review June 15, 2025 09:50
@febo febo marked this pull request as draft June 19, 2025 15:17
@febo
Copy link
Copy Markdown
Collaborator Author

febo commented Jun 19, 2025

@joncinque Put the PR back to draft to test a suggestion from @cavemanloverboy

@febo
Copy link
Copy Markdown
Collaborator Author

febo commented Jun 23, 2025

Current benchmark:

Name CUs
Account (0) 9
Account (1) 13
Account (2) 22
Account (3) 36
Account (4) 45
Account (5) 52
Account (6) 72
Account (7) 75
Account (8) 80
Account (16) 154
Account (32) 280
Account (64) 541

@febo febo force-pushed the febo/improve-entrypoint branch from 0f21920 to 01a09f4 Compare June 26, 2025 14:17
@febo febo marked this pull request as ready for review June 26, 2025 14:23
@nlgripto
Copy link
Copy Markdown

let him cook

@febo
Copy link
Copy Markdown
Collaborator Author

febo commented Jun 30, 2025

@joncinque PR updated and ready for another review. 😊

Comment thread sdk/pinocchio/src/entrypoint/mod.rs Outdated
Comment on lines +340 to +355
// There might be remininag accounts to process.
if to_process > 3 {
// 4..3 accounts left to process.
if to_process > 4 {
process_accounts!(4 => (input, accounts, accounts_slice));
} else {
process_accounts!(3 => (input, accounts, accounts_slice));
}
} else {
// 2..1 accounts left to process.
if to_process > 2 {
process_accounts!(2 => (input, accounts, accounts_slice));
} else if to_process > 1 {
process_accounts!(1 => (input, accounts, accounts_slice));
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor

Have you considered a match for this?

Suggested change
// There might be remininag accounts to process.
if to_process > 3 {
// 4..3 accounts left to process.
if to_process > 4 {
process_accounts!(4 => (input, accounts, accounts_slice));
} else {
process_accounts!(3 => (input, accounts, accounts_slice));
}
} else {
// 2..1 accounts left to process.
if to_process > 2 {
process_accounts!(2 => (input, accounts, accounts_slice));
} else if to_process > 1 {
process_accounts!(1 => (input, accounts, accounts_slice));
}
}
// There might be remaining accounts to process.
match to_process {
5 => process_accounts!(4 => (input, accounts, accounts_slice)),
4 => process_accounts!(3 => (input, accounts, accounts_slice)),
3 => process_accounts!(2 => (input, accounts, accounts_slice)),
2 => process_accounts!(1 => (input, accounts, accounts_slice)),
1 => (),
_ => {
// SAFETY: `while` loop above makes sure that `to_process` has 1 to 5
// entries left.
unsafe { core::hint::unreachable_unchecked() }
}
};

Copy link
Copy Markdown

@cavemanloverboy cavemanloverboy Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point of this was to manually unroll the binary search. did you verify that this match statement does not increase cus? if it doesn't increase it, i'm in favor of this so long as we add a comment that the compiler can figure out the binary search (in case someone changes in the future)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit, I didn't realize it.
Maybe a short note explaining the optimization could help the future readers.

Copy link
Copy Markdown
Collaborator Author

@febo febo Jul 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A match statement increases CUs – in the end it generates a "standard" jump table, which in the worse case will do 4 comparisons. The "manual" one uses 3 comparisons at most, and 2 for most of the values.

I will add a comment explaining the rationale of the nested if statements.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably out of scope for the PR by now.

I started wondering if there are some alternatives to the "binary search" that have similarly good properties.
In particular, I noticed that as written we produce 16 identical code blocks for the account processing, in total.
Not sure if any of them gets optimized away, but if you are saying that the binary search tree makes it into the final code, then all the blocks are probably there as well.

The very first code block is an optimization for the case when there is only one account.
But the rest 15 are used to process accounts.
And the longest "uninterrupted" sequence of accounts we process is 5 accounts at a time.
I guess, the size of the program is less important, but it still costs something to deploy it.

On x86, jump table is a single jump: https://godbolt.org/z/f7E5P3YKs

This:

    match iterations {
        3 => { process(); process(); process(); }
        2 => { process(); process(); }
        1 => { process(); }
        0 => {}
        _ => unreachable!(),
    }

Turns into this:

.LBB1_1:
        lea     rax, [rip + .LJTI1_0]
        movsxd  rcx, dword ptr [rax + 4*rbx]
        add     rcx, rax
        jmp     rcx
.LBB1_3:
        call    example::process::h15029326abde9722
.LBB1_4:
        call    example::process::h15029326abde9722
.LBB1_5:
        call    example::process::h15029326abde9722
.LBB1_6:
        add     rsp, 16
        pop     rbx
        ret
.LJTI1_0:
        .long   .LBB1_6-.LJTI1_0
        .long   .LBB1_5-.LJTI1_0
        .long   .LBB1_4-.LJTI1_0
        .long   .LBB1_3-.LJTI1_0

There are no comparisons.
But maybe with the SBF backend can not produce computed jumps here?
The instruction set seems to have the necessary instruction.

If I pass the index into process(), then it gets a bit more confusing.
Though, process_n_accounts!(@process_account => (input, accounts, accounts_slice)) calls are identical. All the state change is a side effect of the call.
Though, there is a lot of code that is inlined, so the compiler might miss the fact that they are indeed identical.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of platform tools are you using? I merged the switch simplify pass recently anza-xyz/llvm-project#153. It is available on platform tools v1.49 onwards.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd encourage you to test with v1.50, because it enables a pass to simplify branches, so you'll see even different results.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using platform-tools v1.50 we get even more improvements:

| Name         | CUs | Delta |
|--------------|-----|-------|
| Account (0)  |   9 |   --  |
| Account (1)  |  13 |   --  |
| Account (2)  |  21 |   -1  |
| Account (3)  |  34 |   --  |
| Account (4)  |  42 |   -1  |
| Account (5)  |  49 |   -1  |
| Account (6)  |  64 |   -6  |
| Account (7)  |  68 |   -5  |
| Account (8)  |  75 |   -3  |
| Account (16) | 140 |  -12  |
| Account (32) | 258 |  -20  |
| Account (64) | 501 |  -37  |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the compiler is a bit unpredictable – sometimes you change a single line and things are significantly different. 😅

Fernando The Compiler Whisperer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cavemanloverboy example generates different code in v1.50:

This is for the case with 16 match arms. The compiler builds a lookup table:

entrypoint:
	ldxdw r1, [r1 + 0]
	jgt r1, 16, LBB0_3
	mov64 r2, r1
	lsh64 r2, 32
	rsh64 r2, 32
	mov64 r3, 129023
	rsh64 r3, r2
	and64 r3, 1
	jeq r3, 0, LBB0_3
	lsh64 r1, 3
	lddw r2, .Lswitch.table.entrypoint
	add64 r2, r1
	ldxdw r1, [r2 + 0]
	mov64 r2, 1
	call sol_log
LBB0_3:
	mov64 r0, 0
	exit

Comment thread sdk/pinocchio/src/entrypoint/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@illia-bobyr illia-bobyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR

I find it a bit hard to quickly spot all the differences between parse() and parse_into<MAX_TX_ACCOUNTS>().

I can see that the skip extra accounts logic is only present in the parse_into() version.
Is this the only difference?

I think if this is the case, there is probably a way to remove the skip logic from the parse_into<MAX_TX_ACCOUNTS>() case.
Specifically, only when MAX_ACCOUNTS equals MAX_TX_ACCOUNTS.
As MAX_ACCOUNTS is a generic constant, the compiler will create a new instance of this function for every specific value of MAX_ACCOUNTS.
Plus, it is marked as #[inline(always)] allowing further optimizations.

I think if

        while to_skip > 0 {

is augmented with a compile time check, maybe like this:

        if MAX_ACCOUNTS < MAX_TX_ACCOUNTS {
            while to_skip > 0 {

the compiler should remove this code completely when MAX_ACCOUNTS equals MAX_TX_ACCOUNTS.
And it should also remove the to_skip calculation, as it becomes dead code.

Also, would it make sense to add a compile time assertion, to make sure that MAX_ACCOUNTS is not above MAX_TX_ACCOUNTS?

I wonder if a change like that would be enough to end up with just a single version of the parse() function.
It looks like it has a considerable overlap with parse_into() and you are making changes in both functions in parallel.

Comment thread sdk/pinocchio/src/entrypoint/mod.rs Outdated
@febo febo requested a review from illia-bobyr July 6, 2025 17:01
illia-bobyr
illia-bobyr previously approved these changes Jul 8, 2025
Copy link
Copy Markdown
Contributor

@illia-bobyr illia-bobyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did one more pass and found a few minor things.
But overall it is good :)

Comment thread sdk/pinocchio/src/entrypoint/mod.rs Outdated
Comment thread sdk/pinocchio/src/entrypoint/mod.rs
Copy link
Copy Markdown
Collaborator

@joncinque joncinque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This might be one of the few cases where a macro is uniquely suited to do the job, and it's really well factored.

@febo febo merged commit bd28a5f into main Jul 15, 2025
9 checks passed
@febo febo deleted the febo/improve-entrypoint branch July 15, 2025 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants