SIMD-0307: Add Block Footer#307
Conversation
4b45bde to
5558f57
Compare
911787c to
459b5a9
Compare
459b5a9 to
5100ff7
Compare
|
💯 |
|
I can understand the motivation, but I'm not convinced this will be used properly. As an operator, I'm competing with every other node on the network for stakers, if I'm generating higher returns by producing better blocks, why would I willingly share any information about how I'm doing that? In fact, I'd say I'm incentivized to lie! If I can generate higher returns by using jito-agave I will make my blocks state that I'm using native-firedancer as an attempt to trick other operators into using a different client than me. |
| - `version: u64` is a positive integer which changes anytime a change is made to | ||
| the header. The initial version will be 1. | ||
|
|
||
| - `header_length: u64` is the length of the rest of the header in bytes (i.e. |
There was a problem hiding this comment.
Do we need such large lengths for the header? Could this be a u16?
| - `block_producer_time_nanos: u64` is a nanosecond UNIX timestamp representing | ||
| the time when the block producer became leader and started constructing the | ||
| block. |
There was a problem hiding this comment.
Let's be specific that this is the time they started constructing the block. Current description to me had some ambiguity on if it's when I became leader (for my entire allocation) or separately for each block. I think the intent here is a separate timestamp for each block.
There was a problem hiding this comment.
I can't think of an incentivized reason to lie about this, but it also cannot be verified. Is there a concern about having incorrect values on this?
There was a problem hiding this comment.
There is a slight incentive for block-time cheaters to lie, but this can probably be detected if its too aggregious. Eventually would be nice to make this timestamp consensus-based like the vote timestamp but I figured that would be its own future SIMD.
| following fields in the header | ||
|
|
||
| - `block_producer_time_nanos`: u64 | ||
| - `block_user_agent`: [u8; 256] |
There was a problem hiding this comment.
Can you explain thoughts here on having a static size here for the block_user_agent? Why not u8 length with trailing bytes?
There was a problem hiding this comment.
I figured the overhead was minimal and the extra unused space could be used as a scratch pad whatever the validator wants (e.g. non-string binary data). Happy to change this though if you feel you strongly prefer a variable-length string.
There was a problem hiding this comment.
Just specify if >256 the block is invalid
At some point pal validators were not reporting that they were pal in gossip but I believe they are reporting now. Either way I think that specific field may or may not be useful but many of the other fields seem very useful. |
Yea I agree some validators will most likely lie, intentionally or even unintentionally (e.g. a software fork that doesn't see updating this field as a huge priority and just never gets to it). Unfortunately there isn't a straightforward way to fix this problem, which exists for gossip-sourced metrics already. The community will use this metric whether its fully trustworthy or not, and if we don't include it in the block header we'll just keep getting it from gossip. The real benefit here is the metric persisted on the chain, and we get a bunch of extra info that isn't currently in gossip. Also, I'd say persisting the info on the chain actually improves accountability, since there is now a public, easily accessible history of what you claimed to be. There are real world incentives not to be caught lying (e.g. stake from pools, reputation), and there are ways to catch liers (sometimes) by inferring the client (e.g. suble differences in the way jito bundles are sent to agave vs frankdancer. Patterns in transaction ordering that differ in different scheduler implementations, etc.) |
Thanks for this clarification, I had overlooked we publish this information over gossip already, but its' done on a less granular & trackable way than what this proposal provides. I'd need to constantly be listening gossip in order to get rough estimate. |
KirillLykov
left a comment
There was a problem hiding this comment.
To me seems to be valuable for the analysis of the cluster behavior to have additional block information such as when block production has started and other additional data which might be added later as follow up of this proposal.
| - `block_producer_time_nanos: u64` is a nanosecond UNIX timestamp representing | ||
| the time when the block producer started constructing the block. | ||
|
|
||
| - `block_user_agent: [u8; 256]` is a string that provides identifying |
There was a problem hiding this comment.
Maybe a naive question, but don't we know the leader for each block from schedule?
There was a problem hiding this comment.
Yeah, the leader schedule is generated two epochs in advance from information in staked on-chain vote accounts. We don't know much more than the leader pubkey and info related to active stake.
A use case for block_user_agent these fields is to get validator implementation specific information, which is more detailed than what's just on the leader schedule or even whats described by any part of the solana protocol. A use case for block_producer_time_nanos is to estimate when blocks were added to the chain and their duration (protocol limits for these quantities, but are fairly loose, and identifiying exact timestamps / durations is useful).
| - `header_length: u16` is the length of the rest of the header in bytes (i.e. | ||
| not including the `block_header_flag`, `version`, and `header_length` fields). | ||
|
|
||
| - `block_producer_time_nanos: u64` is a nanosecond UNIX timestamp representing |
There was a problem hiding this comment.
don't we need to have also the information about how long it took to execute this block? The timestamp in the block is in seconds so hard to use it.
There was a problem hiding this comment.
We can estimate block duration (including network latency) by taking the difference in timestamps between adjacent blocks.
I figured this would be sufficient for an MVP of the block header. There's an endless list of metrics we could additionally include in the block header (e.g. how long it took to execute the block, how long it took to fetch / resolve account data, timing for votes / non-votes, timing for validation checks like deduplication or expiration ). Since its not obvious which ones are worth persisting in the block header I think they should be a future SIMD.
There was a problem hiding this comment.
Agree, I think even this timestamp would be sweet to have. The current block timestamp was quite useless for me due to lack of precision.
There was a problem hiding this comment.
In the pre-Alpenglow era, does started constructing effectively mean when I started generating the PoH ticks? Or when replay/maybe_start_leader decided it was my turn to start packing transactions?
Agree w/ your point that there are a million metrics we could debate, and I don't have a strong preference on what we include for v1, but I do have a preference to get really specific on what the metrics are supposed to mean
There was a problem hiding this comment.
I'm leaning towards replay/maybe_start_leader since POH will get removed with alpenglow anyways. I'll add something a bit more specific
b026020 to
154d60a
Compare
| - `header_length: u16` is the length of the rest of the header in bytes (i.e. | ||
| not including the `block_header_flag`, `version`, and `header_length` fields). | ||
|
|
||
| - `block_producer_time_nanos: u64` is a nanosecond UNIX timestamp representing |
There was a problem hiding this comment.
In the pre-Alpenglow era, does started constructing effectively mean when I started generating the PoH ticks? Or when replay/maybe_start_leader decided it was my turn to start packing transactions?
Agree w/ your point that there are a million metrics we could debate, and I don't have a strong preference on what we include for v1, but I do have a preference to get really specific on what the metrics are supposed to mean
| } | ||
| ``` | ||
| <!-- markdownlint-restore --> | ||
|
|
There was a problem hiding this comment.
You mention this below, but I think it would be good to include a section here to mention that we need to mark blocks dead if they don't contain a valid header at the beginning of the payload section
|
|
||
| ## Security Considerations | ||
|
|
||
| - The header fields are untrusted and purely informational. Tools that expose |
There was a problem hiding this comment.
I'm okay starting with this.
But I think if we truly want to replay timestamps in vote (which will definitely be going away as part of Alpenglow), we might want to wrap some policy around the timestamp piece.
But again, this could be a follow-up
| This SIMD add the following header at the beginning of the raw block data. This | ||
| puts it on the same abstraction layer as serialized entry batch data. Put | ||
| differently, the serialized header will be prepended to the first serialized | ||
| entry batch in the block. |
There was a problem hiding this comment.
I have mixed feelings about this... On the one hand having this special header come first makes replay easier because we can treat the first "entry" differently and then assume we are only deserializing entries thereafter.
On the other hand, it means we need to fill out all the block header data up front before constructing/sending out anything else. I don't think this matters for any of the fields you mention below. However, for something like SIMD-0298, this means we need the bank hash for N-1 before we can build N. Doesn't seem great for pipelining w/ async.
Would a Block footer give us more flexibility? Might be a pain in the ass for deserializing if we don't know exactly where the end is...
Apologies for the gripe w/o a clear alternative suggestion. Just want to make sure we don't overlook this constraint we're introducing.
There was a problem hiding this comment.
You make a good point, I think a block "footer" makes more sense with chained merkle shreds. A footer also makes sense if we eventually include more timing metrics.
I think we should just be able to add the footer as a suffix to the block payload.
From the shred spec document (which is hopefully not outdated)
The serialization of a batch is the concatenation of all serialized entries, prefixed by the entry count as a u64 integer (8 bytes).
Since the first byte of a typical entry batch starts with a positive, non-zero integer, but this footer starts with 0 (block_header_flag), replay can know to treat this "entry batch" differently.
There was a problem hiding this comment.
Yeah I like footer better. But should note 0298 is more like training wheels eventually we plan to remove that check once we are comfortable with the VMs playing nicely with each other.
ac72270 to
66a0d14
Compare
|
This latest version looks good to me. @apfitzge what do you think? |
| ``` | ||
| Block Footer Layout | ||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
| | block_footer_flag (64 bits) | |
There was a problem hiding this comment.
Now that we have changed to Block Footer instead of Header should `block_footer_flag' be placed at the end? Otherwise you need to know the offset to read the block_footer_flag which is supposed to tell you the offset?
There was a problem hiding this comment.
I'm assuming block_footer_flag at beginning is still correct. It would be nice to know the offset of the footer, but this would mean buffering and stalling the pipeline.
I believe the implication here is a batch of 0 entries is still invalid and will be used to indicate this must be the start of the footer
There was a problem hiding this comment.
Is the idea that after this is committed to block store the fields would be available so the only time we need to reference the flag for the offsets is during replay in which case we see the flag as we are reading the block top to bottom?
There was a problem hiding this comment.
yes, that's my assumption
apfitzge
left a comment
There was a problem hiding this comment.
Overall looks good to me. Few comments on the some specific wording.
It's implied that this footer is the last thing in the block, since we call it a "footer". But we never explicitly state what to do if there's stuff after it.
Let's add a short description of how to handle blocks that have this footer serialized somewhere in the middle of a block. i.e. [entries, footer, entries].
| highest layer a block consists of some number (~100+) FEC sets. A single FEC | ||
| set contains a handful of shreds (~32). Once sufficient shreds are available | ||
| the raw block data is reconstructed and reinterpreted as an array of entry | ||
| batches. Entry batches do not cross shred boundaries. |
There was a problem hiding this comment.
Entry batches do not cross shred boundaries.
This isn't true, right? shreds are MTU sized and batches of transactions are (often) larger than that, so they'd be split between threads.
Maybe I misunderstand what is meant by "shred boundary" here.
There was a problem hiding this comment.
Bad wording on my part, I mean that entry batches are aligned with shred boundaries (i.e. they will start / stop at a shred boundary).
|
|
||
| While it is possible to make the block footer optional thanks to the | ||
| `block_footer_flag` field, this proposal makes it mandatory. Blocks that don't | ||
| include a valid footer in the block payload will be flagged as dead blocks and |
There was a problem hiding this comment.
Request a slight change of wording:
will be flagged as dead blocks
to
must be flagged as dead blocks
fc68b06 to
6428df0
Compare
|
Nice chatting earlier! Placing some thoughts down here. The layout below enables:
The key here is that while block footers occur at the end of a block, other variants may occur elsewhere; indeed, Would be awesome if we could fold this layout into this SIMD! |
|
@ksn6 thanks for your suggestion! I like the idea of using variants to separate out different types of metadata Two things I'll suggest
|
cbd36a6 to
022907a
Compare
022907a to
531dbac
Compare
|
Agree. We should hash out other variants in separate SIMDs. |
|
Sure - introducing other variants in other SIMDs sounds reasonable. Agreed regarding "Special Batch"; what are your thoughts on BlockMarker rather than BlockMeta? |
Sure, BlockMarker fits better. |
AshwinSekar
left a comment
There was a problem hiding this comment.
LGTM, just a question about follow up work
| - `footer_version: u16` is a positive integer which changes anytime a change is | ||
| made to the footer. The initial version will be 1. | ||
|
|
||
| - `block_producer_time_nanos: u64` is a nanosecond UNIX timestamp representing |
There was a problem hiding this comment.
In a follow up SIMD for alpenglow specifics we plan to leverage this timestamp to fill in the clock sysvar - essentially a replacement for the current timestamp oracle derived from TowerBFT votes.
In order to accomplish this we would require that all valid blocks contain a footer (as specified in this SIMD) with the additional requirement that the producer timestamp is sensible (at a minimum, this value must be greater than the parent, and some requirement of how much this can increase from block to block. Full details TBD).
If these requirements are not met we would reject the block. Would FD be amenable to enforcing such a requirement down the line? Any concerns about using the Block Footer in such a manner?
An alternative would be to use a different BlockMarker variant or some other shred/transaction based approach.
There was a problem hiding this comment.
Would FD be amenable to enforcing such a requirement down the line?
Yea afaik we're on board. This is briefly mentioned in the Alternatives Considered section
derive timestamp footer field from consensus and enforce user agent format
- This can and probably should be implemented as a future SIMD.
There was a problem hiding this comment.
Yes, that clock SIMD is almost ready BTW. I would probably remove descriptions from here, and put the clock stuff in the clock SIMD. Let's make this one a bit more vanilla and just describe the general framework without applications.
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
| | variant=1 (8 bits) | | ||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
| | length (16 bits) | |
There was a problem hiding this comment.
Could we remove length here and use version and variant for block parsing purposes?
EDIT: apologies, caught the below two issues after approving 🙈
There was a problem hiding this comment.
I'm partial to keeping it in, since parsers (e.g. hardware parsers, non-client software parsers) that only care about transaction data would otherwise have to keep up with a constantly changing spec. Is there a reason you want to take it out?
There was a problem hiding this comment.
Cool - makes sense. In order for this to work, short of a major upgrade, this seems to require us to commit to the first few fields always being:
Block Marker Variant -- Footer
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| block_marker_flag (64 bits of 0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version=1 (16 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| variant=1 (8 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| length (16 bits) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
which seems fine.
This is probably good design for all BlockMarkers, come to think of it.
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
| | version=1 (16 bits) | | ||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||
| | variant=1 (8 bits) | |
There was a problem hiding this comment.
Could we have variants start at 0 rather than 1?
f194119 to
814b50d
Compare
bw-solana
left a comment
There was a problem hiding this comment.
Approving from Anza side
|
@jacobcreech could we merge this? |
| scheduler used, mods, configuration settings, etc). | ||
| - Vote timestamps have a granularity of 1-second, so they cannot be used to | ||
| estimate block duration. | ||
| - Vote timestamps will be removed with Alpenglow. |
There was a problem hiding this comment.
Is this a Tower SIMD, an Alpenglow SIMD, or somewhat both? If it was an Alpenglow SIMD, then I'd have a lot of comments...
In this block:
- Gossip will be gradually removed with Alpenglow. We should not rely on it anymore.
- We do have a proposal how to have a clock (to replace vote timestamps) in Alpenglow, and the two proposals should be aligned.
There was a problem hiding this comment.
It is supposed to be a small SIMD to address an immediate need of the chain, which is how to put block producer information in the block, to support publicly verifiable analysis, competition and differentiation among validator clients, validators, and mod builders. Since we have an area to self report the validator user agent, we also allow self reporting timestamp, which is also useful for analysis; and would continue to be useful separate of any consensus level clock.
It's useful to lightly consider the SIMD in the context of Alpenglow, and if there would be some kind of breaking incompatibility, but I would really protest trying to tie them closely together. It would be an unnecessary coupling of a simple and somewhat urgent need to a longer term re-architecture.
There was a problem hiding this comment.
Okay, thanks for the quick clarification. My other comments take the Alpenglow POV, nothing is breaking.
| - Entry Batch - An array of entries. | ||
| - Entry - An array of transactions. | ||
| - Block Marker - A chunk of structured non-transaction data that can be placed | ||
| before, after, or in-between entry batches in a block. |
There was a problem hiding this comment.
Terminology:
- Client is such a generic term that I am often asked what I mean. What about software client?
- With Alpenglow we try to say "slice" instead of "FEC Set".
| @@ -0,0 +1,381 @@ | |||
| --- | |||
| simd: '0307' | |||
| title: Add Block Footer | |||
There was a problem hiding this comment.
I like the idea of a generic "block marker" (as introduced below). So why call this SIMD "block footer"?
There was a problem hiding this comment.
In particular, we already know that we have a purpose for a marker somewhere in a middle slice for optimistic handover.
Benhawkins18
left a comment
There was a problem hiding this comment.
I see all the right approvals. Merging at request of Author
#### Problem and Summary of Changes Construct and disseminate the block footer. In a follow-up PR, we'll parse the footer in replay. This PR addresses solana-foundation/solana-improvement-documents#307.
No description provided.