SIMD-0326: Alpenglow#326
Conversation
| and *safe-to-skip*, explained in the white paper) that cause the validators to | ||
| vote *notarize-fallback* or *skip-fallback*. | ||
|
|
||
| Votes are distributed by broadcasting them directly to all other validators. |
There was a problem hiding this comment.
nit: should we say all other staked validators?
There was a problem hiding this comment.
In Alpenglow terminology every validator is staked.
| In this proposal we make sure that nodes that do not participate in the protocol | ||
| will not be rewarded. Towards this end, all nodes prove that they are voting | ||
| actively. In slot *s*+8 (and only in that slot), the corresponding leader can | ||
| post up to two vote aggregates (a notarization aggregate and/or skip aggregate) |
There was a problem hiding this comment.
dumb question: would the notarization aggregate include all notarization-fallback votes with the same block-id?
I assume a notarization with wrong block-id will be ignored?
Do we reward notarization with wrong block-id? What if someone cast skip and notar-fallback, but skip vote got lost, unfortunately notar-fallback is for wrong block-id?
There was a problem hiding this comment.
Only skip and notarization, no fallbacks. Everybody just gets one point at most.
There was a problem hiding this comment.
These funny scenarios could arise on successful block equivocation by the leader. In the future, if we want to be nerdy about this possibility, we could do something to count multiple block-ids (or just issue the one-vote reward to everyone while we slash the equivocating leader lol).
| per slot. The submitter (leader) gets the same amount of SOL as each of the | ||
| voters included in the aggregate. Nodes receiving 0 SOL at the end of the epoch | ||
| are removed from the active set of nodes. This scheme will practically eliminate | ||
| today’s voting transaction overhead while still rewarding voting. |
There was a problem hiding this comment.
No reward for someone who voted both Notarization and Skip right?
There was a problem hiding this comment.
If we had slashing, it would be slashable. That's even stronger than just not getting rewards. But since we don't have slashing, we might consider to punish it by not giving any rewards.
| ## Impact | ||
|
|
||
| The most visible change will be that optimistic confirmation is superseded by | ||
| faster (actual) finality. |
There was a problem hiding this comment.
So the proposal is to make Confirmed the same as Finalized in Alpenglow?
There was a problem hiding this comment.
Optimistic Confirmation is a concept from TowerBFT, and Finality is a concept from Alpenglow. In this sentence we want to argue that the second is strictly better than the first (faster and actually final). Maybe we need to reformulate to make this clear?
There was a problem hiding this comment.
Okay that's fine. I just wasn't sure whether we are announcing an API change here.
There was a problem hiding this comment.
Should we write TowerBFT's Optimistic Confirmation and Alpenglow's (actual) finality to make it 100% clear, or do you think it's okay like this?
There was a problem hiding this comment.
I think it's okay like this.
| ## Drawbacks | ||
|
|
||
| The main drawback is the risk related to implementing a big protocol change. | ||
| Migrating to Alpenglow will be challenging. |
There was a problem hiding this comment.
nit: should we mention Migration will be designed and proposed in following SIMD?
There was a problem hiding this comment.
I'm not an expert, but I don't think migration should have a SIMD. At least I would not announce it here.
There was a problem hiding this comment.
Wouldn't the migration also require a specific implementation for any client that would want to be part of the network during the switchover? Seems like it would need a SIMD then to me?
There was a problem hiding this comment.
We would like to get a general consensus about Alpenglow, so that the engineers working on it know that it will come. The switching mechanism is not really a votable issue (as long as it's done in the best possible way). But of course we have to find an agreement between all the clients which will be involved in the switch. That could be covered by a separate (technical) SIMD if necessary.
There was a problem hiding this comment.
Anything that is a consensus-critical change should have a SIMD, even if it does not go through governance. The purpose of SIMDs is to communicate and agree upon breaking (consensus-critical) changes that are coming to different validator teams.
|
Rotor not actually defined anywhere btw |
lidatong
left a comment
There was a problem hiding this comment.
thanks to the anza research team for doing the research to create and propose this consensus protocol. approving on behalf of firedancer -- excited for the improvement this will be bring to the network
Rotor is not part of this SIMD. |
|
The Alpenglow whitepaper states:
Why we would need transactions to record this information? The stake value is already stored on-chain in stake accounts. The public keys are stored on-chain (validator identity key, vote account key). The IP address and port number are ephemeral values that operators might change at any time. Why would we want or need transactions to "record" these things? And also I assume that these transactions update accounts state, so where are the additional values going to be stored? |
|
The Alpenglow whitepaper states (in section 1.5 under Broadcast):
Since all messages fit into one UDP packet, all messages take the same amount of time to transmit. What is meant here by voting messages "need even less time" due to being "shorter"? |
|
The Alpenglow whitepaper states (in section 1.5 under Time):
Given that clock drift is very likely to exist for all real-world machines, I think it would really be worthwhile to incorporate some notion of expected drift (likely minimum, likely maximum) in the timeout periods used to derive the expected block completion times as proposed by this whitepaper. Time synchronization is done totally differently under Alpenglow than under Proof of History and extra care and "proving out" of the likely expected values is warranted given that we'd be switching from a known and proven method, to a theoretical method. EDIT: I see that this is more directly addressed in the Timeout section. I would like to see some evidence of typical clock drift for known data center level hardware and for that value to be incorporated into any Timeouts used in this paper. Thank you. |
|
With regards to the 20% threshold for liveness vs. Solana's current 33%: With the current stake distribution, this takes the "halt line" from 22 nodes down to 9 nodes. And since two of those nodes are operated by the same entity (Figment and Ledger by Figment) this is actually 8 operating entities. Does this seem troubling? I may be misunderstanding the "halt line" analogy when considering Alpenglow. Is it the case that this 20% is actually just the fraction of stake that could prevent "fast finalization" (80%) but that the cluster would just fall back to the slower finalization (two 60% rounds), meaning that to truly halt Alpenglow would require 40% of stake to refuse/fail to vote? But that 20% could "slow down" consensus by preventing fast single-vote consensus? |
|
The Alpenglow whitepaper states (in section 2.6):
What happens if a node or subset of nodes observe enough notarization votes, but others do not? Some nodes will observe the block as finalized, and others will not. This is a transient situation that could result in the subset of nodes that believed the block was finalized, finding out later that it was in fact not finalized because some malicious nodes produced (presumably slashable) double-voting that sent Notarization Votes to some nodes, and Skip Votes to other nodes. While it is true that slashing would deter such malicious behavior, the fact that "finalization" is so easily foiled seems problematic. In the existing Solana consensus model, "optimistically confirmed" blocks are subject to the same weak finalization criteria that Alpenglow "finalized" blocks are. And presumably, each block subsequently chained decreases the likelihood of "rollback" in Alpenglow just as it did for classic Solana. The only significant difference I can see is that classic Solana defines a slashing schedule that makes it exponentially more costly for malicious nodes to cause a rollback of older and older confirmed blocks, but Alpenglow doesn't define its slashing mechanism so has effectively worse guarantee on finalization than classic Solana. I probably am misunderstanding, but why isn't Alpenglow "finalized" the same as classic Solana "optimistic confirmation", with the same chances of being rolled back? |
|
Thank you for the reply. Is the analysis the same for the situation in which: Group A 19.99% malicious sends Notarize to group B and Skip to group C Repeat this for two rounds and then group B sees the slot as finalized and group C sees the slot as skipped. EDIT: Oops I just saw the flaw. Group C will only see 59.99% of votes for Skip, so will not conclude skipped. |
|
Doesn't allowing multiple ParentReady(s,...) to proceed, as illlustrated in Figure 8, allow a leader to extend its slot time? The ParentReady "function" calls SetTimeouts(), which uses the current clock to set a new set of timeouts for the slot. Doesn't this then allow a leader to increase its total "window time"? In Figure 8 this is illustrated - when the second ParentReady occurs, the timeouts reset and so the leader can effectively get a lot longer slot time for its remaining blocks? I see that a further rule is "In this case, slices 1, . . . , t − 1 are ignored for the purpose of execution.". So the leader gives up the first t slices it emitted; but since time timeouts have been extended, it can emit those same number of slices again, possibly in an advantageous way given that the expanded time duration of its slot has allowed it to survey many more transactions and build more profitable blocks. |
|
For efficiency, shouldn't there be a "getShreds" instead of "getShred" which must specifiy a large number of parameters that are duplicated if a contiguous range of shreds is desired? |
True, clocks are not perfectly accurate. However, system clocks drift by roughly 1 second per day only. Per leader window (and that's the maximum timeouts we have) this drift is only 0.2 ms. If you want to wait 399.8 or 400.2 ms instead of 400 ms before you skip, that's perfectly fine. In fact, Alpenglow could tolerate much bigger deviations by all nodes. Note that clocks are naturally synced with every new timeout, so drifts don't accumulate at all. |
If you send 1000 messages with 1500 bytes each it will take more time than sending 1000 messages with 200 bytes each. |
Maybe the term "transaction" is a bit misleading here? What we meant to say is exactly this: You can update your information by updating (for instance) your accounts state. It's just important that all your information is present. So "transaction" does not need to be a financial transaction. |
|
(All comments so far should be answered?) |
AshwinSekar
left a comment
There was a problem hiding this comment.
Overall a great improvement for the network. Will bolster and formalize the security of Solana's consensus protocol while greatly decreasing finalization times.
Allows us to cleanup a lot of tech debt, and makes it easier to reason about further consensus improvements like Asynchronous Execution and MCP.
But it is an on-chain transaction right? Presumably one that updates some kind of accounts state with the details that you have listed? Is that what the docs mean when they say "transaction"? |
There are still comments on the changed file that haven't been addressed. |
Yes. But we're open for an alternative term. |
Indeed, there was one more comment. Thanks for noticing. |
|
@bji some more clarifications:
|
topointon-jump
left a comment
There was a problem hiding this comment.
Excited for this change!! 🚀
| **Vote** is an existing term already, but votes are different in Alpenglow. In | ||
| Alpenglow, votes are not transactions on chain anymore, but just sent directly | ||
| between validators. Also, votes do not include lockouts. |
There was a problem hiding this comment.
Will there still be a vote processor program? If so, will this be a bpf program? Would we migrate over from the current vote processing program to a new alpenglow bpf vote program? It might be worth specifying that in this SIMD.
There was a problem hiding this comment.
Engineers should have the word here, but I would assume the accounting for vote aggregations (and first certificates) can be done in the same way as accounting for votes now (with a different program though).
There was a problem hiding this comment.
For Alpenglow v1 we do not plan to create a new vote program, because the votes are not transactions any more, nothing much for a vote program to do. We will add bls_pubkey in the current vote program for verification and pubkey management. It has been added to Vote Account v4 #185
For the rewards scheme described in SIMD, I suppose we need to add the two BLS certs to block footer
#307, and we need to add parent bankhash to block footer as well as agreed in NYC. The rewards calculation will probably be similar to now, done at beginning of epoch.
Does that answer your question?
There was a problem hiding this comment.
Thanks for clarifying!! Few more questions - will we need to keep some data on-chain for reward computation/distribution? How will that data be updated?
There was a problem hiding this comment.
That's very good question, I think what needs to be done for block footer is still up in the air. But I would imagine the block footer will be kept in the big table as it contains important information. It's not "on-chain" as a normal transaction, but at least the raw data is kept around.
There was a problem hiding this comment.
That's another good question, the conclusion from last meeting was:
- BLS certs in block footer
- but credits are updated in vote accounts
- Non leader checks the BLS certs -> credits conclusion during replay
- rest of rewards calculation same as now
There was a problem hiding this comment.
but credits are updated in vote accounts
this is the part I'm not clear about - how exactly will this happen? will there will need to be some kind of bpf program updating the vote account state? if this is documented anywhere feel free to point me to it 😄
There was a problem hiding this comment.
Sure, shared the meeting notes. The current plan is to "do it outside the VM before all transactions", so have to implement it in validator code....
We can discuss pros and cons about this though, this is just the current proposal.
There was a problem hiding this comment.
but credits are updated in vote accounts
Does it mean that reading amount of vote credits can remain unchanged e.g. when stake pools want to understand performance of validators?
There was a problem hiding this comment.
Vote credits now is closely related to timely vote credits, which is a concept we won't need anymore with Alpenglow because there is no more incentive to hold back votes. While vote credits can still be used unchanged, this metric is less useful to measure performance. We will be introducing a new way to measure performance, but at least initially this measure will not go on chain.
| ## Drawbacks | ||
|
|
||
| The main drawback is the risk related to implementing a big protocol change. | ||
| Migrating to Alpenglow will be challenging. |
There was a problem hiding this comment.
Anything that is a consensus-critical change should have a SIMD, even if it does not go through governance. The purpose of SIMDs is to communicate and agree upon breaking (consensus-critical) changes that are coming to different validator teams.
| ### Rewards | ||
|
|
||
| In this SIMD we focus on the consensus-related benefits of Alpenglow. Below, we | ||
| translate the existing vote rewards as they are, while removing some harmful |
There was a problem hiding this comment.
Implementation-wise, are the rewards still computed and distributed using the same mechanisms (via state stored in the on-chain stake and vote accounts)? Might be good to specify that here.
There was a problem hiding this comment.
Yes, also here, same mechanisms.
There are more, but perhaps my comments don't merit response? |
| higher resilience and better performance. | ||
|
|
||
| This SIMD comes with an extensive companion paper. The Alpenglow White Paper | ||
| v1.1 is available at https://www.anza.xyz/alpenglow-1-1. |
There was a problem hiding this comment.
I’d prefer to see the whitepaper here as part of the PR, so that it’s clear the document has changed (via commits). Even now, when I follow the link in the document title, I see Solana Alpenglow White Paper 2025-05-19 v1.1.pdf, while in the document header it says White Paper v1.1, July 22, 2025.
There was a problem hiding this comment.
the pdf title is wrong, but if you see v1.1 you are the correct place
There was a problem hiding this comment.
Nothing prevents someone from changing something in the document without updating its version, but uploading the revised version to Google Drive. If the document is part of the PR, at least it will be visible.
Alright, I’ll leave this here then:
shasum "Solana Alpenglow White Paper 2025-05-19 v1.1.pdf"
07788c2ba9528f526c1514d18f1f5496a1fcdc50 Solana Alpenglow White Paper 2025-05-19 v1.1.pdf
There was a problem hiding this comment.
This is a good point. We are going to upload the PDF directly to github to make sure that we refer to the same version.
can you please give me a hint where to find you other comments? I don't see anything without response anymore. |
| In this SIMD we focus on the consensus-related benefits of Alpenglow. Below, we | ||
| translate the existing vote rewards as they are (same mechanisms, just different | ||
| programs), while removing some harmful incentives (such as the incentive to wait | ||
| before casting a vote). Economic changes are left to future economics-focused |
There was a problem hiding this comment.
I think here a reference would be great
There was a problem hiding this comment.
Sure, but first we have to do the research, so we cannot have a reference at this time.
There was a problem hiding this comment.
There is research done around "incentive to wait before casting a vote".
There was a problem hiding this comment.
Ah, in this case I misunderstood your original comment. I thought you talked about the last sentence. In my opinion, there is no need to cite anything here, since we actually removed the incentive. I'm sure this kind of research was cited in the TVC era.
|
|
||
| Currently, validators must post their vote on the blockchain for every slot, and | ||
| they pay about 1 SOL per day in vote transaction fees. With Alpenglow, votes | ||
| will not go on chain. However, to maintain the present economic equilibrium, |
There was a problem hiding this comment.
What is the meaning of "present economic equilibrium"? You do want to limit nodes, but this is done by adding a cap. Is this because you want to prevent stake to split too much and saturate this limit earlier?
There was a problem hiding this comment.
"Present economic equilibrium" means "keep the set of validators steady, at similar economic conditions". Why do we do this? First, the VAT indeed disincentivizes stake splitting, and second (more importantly): We want to be below the hard limit. If demand was above the hard limit, then Alpenglow would have to exclude some validators. No matter how inclusion is decided (randomly, by stake), this would be a bad experience for those excluded. In a future SIMD we might propose to adapt the VAT based on demand.
There was a problem hiding this comment.
We only want to specify things we actually know. We don't know the future.
There was a problem hiding this comment.
I meant it is worth specify the why:
- First: the VAT indeed disincentivizes stake splitting
- Second: we want to be below the hard limit
So, it's not that you want to know the future, it's a feature you want to give to the system.
There was a problem hiding this comment.
Meh, it's not deep enough in my opinion.
There was a problem hiding this comment.
If it's not deep enough, why do you have a VAT?
There was a problem hiding this comment.
We have the VAT because we want "to maintain the present economic equilibrium."
removed this paragraph (see discussion with ashwin)
Would you be so kind and point out what was not answered? |
bw-solana
left a comment
There was a problem hiding this comment.
Approving from Anza side
|
Here are concrete rule text suggestions to harden SIMD-0326 against latency griefing and vote manipulation:
|
I answered most of those in the forum. I hope that's okay with you? |
Benhawkins18
left a comment
There was a problem hiding this comment.
I see all the required approvals. Merging

No description provided.