From 2ac522344fc73966df9b7bbe9ca2af34cfa201d9 Mon Sep 17 00:00:00 2001 From: ksn6 <2163784+ksn6@users.noreply.github.com> Date: Sun, 21 Sep 2025 19:05:04 -0400 Subject: [PATCH 1/5] SIMD-0364: Enforce DATA_COMPLETE_SHRED Placement only on Last Data Shreds in FEC Sets --- ...4-enforce-data-complete-shred-placement.md | 151 ++++++++++++++++++ 1 file changed, 151 insertions(+) create mode 100644 proposals/0364-enforce-data-complete-shred-placement.md diff --git a/proposals/0364-enforce-data-complete-shred-placement.md b/proposals/0364-enforce-data-complete-shred-placement.md new file mode 100644 index 000000000..42b02bee3 --- /dev/null +++ b/proposals/0364-enforce-data-complete-shred-placement.md @@ -0,0 +1,151 @@ +--- +simd: '0364' +title: Enforce DATA_COMPLETE_SHRED Placement only on Last Data Shreds in FEC Sets +authors: + - ksn6 (Anza) +category: Standard +type: Core +status: Review +created: 2025-09-21 +feature: https://github.com/anza-xyz/agave/pull/8099 +--- + +## Summary + +This SIMD enforces that the `DATA_COMPLETE_SHRED` flag can only be set on the final data shred within an FEC (Forward Error Correction) set. One key use-case for this SIMD: this restriction enables efficient detection of soon-to-be-introduced `BlockComponent`s at shred ingestion time, providing performance improvements for critical consensus operations. The alternative to this SIMD would be to detect `BlockComponent`s during replay, which would be substantially slower. One key example is `UpdateParent` detection in Alpenglow's fast leader handover; detection during replay would not be ideal, as malicious leaders could intentionally misplace `DATA_COMPLETE_SHRED` flags to force validators into expensive search operations, creating DoS attack vectors and delaying block repairs. + +## Motivation + +Currently, the `DATA_COMPLETE_SHRED` flag can theoretically be set on any data shred within an FEC set, though in practice it marks FEC set boundaries. This ambiguity creates significant performance challenges: + +1. **Expensive BlockComponent Detection**: Without guaranteed placement, detecting `BlockComponent`s requires expensive searches across multiple shreds and FEC sets. + +2. **Performance Impact**: alternative `BlockComponent` methods require either complex inference algorithms or expensive replay operations. + +3. **Security Vulnerabilities**: Malicious leaders could intentionally misplace `DATA_COMPLETE_SHRED` flags to force validators into expensive search operations, creating DoS attack vectors; for certain proposed `BlockComponent`s (specifically, `UpdateParent`), malicious leaders could delay block repairs for following leaders. + +By enforcing `DATA_COMPLETE_SHRED` placement only on the last data shred in each FEC set, validators can efficiently detect `BlockComponent` boundaries during online shred ingestion. + +## Forward Dependencies + +This proposal is required for: + +- **[SIMD-0337]: UpdateParent Marker for Alpenglow Fast Leader Handover** + + Alpenglow's fast leader handover feature depends on efficient UpdateParent detection, which this SIMD enables through guaranteed FEC set boundary markers. + +[SIMD-0337]: https://github.com/solana-foundation/solana-improvement-documents/pull/337 + +## New Terminology + +- **FEC Set**: A fixed-size group of exactly 32 data shreds plus associated coding shreds used for erasure coding and error correction. + +- **FEC Set Boundary**: The transition point between consecutive FEC sets, marked by the `DATA_COMPLETE_SHRED` flag on the final data shred of each set. + +- **BlockComponent**: A control element within a block that provides metadata or instructions for block processing. `BlockComponent`s are aligned with FEC set boundaries when the previous shred has `DATA_COMPLETE_SHRED` set. E.g., see https://github.com/anza-xyz/agave/pull/8100 for an implementation of `BlockComponent` within Agave. + +## Detailed Design + +### FEC Set Structure + +FEC sets in Solana have a fixed structure: +- Each FEC set contains exactly 32 data shreds +- Each FEC set contains exactly 32 coding shreds for error correction +- FEC sets are never partial - if insufficient data exists to fill a complete set, the remaining positions are zero-padded + +### `DATA_COMPLETE_SHRED` Placement Rules + +The following rules MUST be enforced by all validator implementations: + +1. **Placement Restriction**: The `DATA_COMPLETE_SHRED` flag MUST only be set to `true` on the final data shred within an FEC set (index position 31 within the set, or more generally at positions where `(shred_index + 1) % 32 == 0` for the data shred's position within the slot). + +2. **Validation Requirement**: Validators MUST check `DATA_COMPLETE_SHRED` placement during shred ingestion (either at turbine receipt or blockstore insertion). + +3. **Invalid Placement Handling**: If a validator detects `DATA_COMPLETE_SHRED` set to `true` on any data shred other than the last one in an FEC set: + - The validator MUST mark the entire slot as dead + +4. **Zero-Padding Behavior**: For the final FEC set in a block that contains fewer than 32 data shreds: + - The `DATA_COMPLETE_SHRED` flag MUST be set on the last actual data shred before zero-padding + - The remaining positions in the FEC set MUST be zero-padded to maintain the fixed 32-shred structure + +### `BlockComponent` Alignment + +With enforced `DATA_COMPLETE_SHRED` placement: + +1. **Boundary Detection**: `BlockComponent`s start at the beginning of an FEC set when the previous FEC set's last data shred has `DATA_COMPLETE_SHRED` set to `true`. + +2. **Efficient Lookup**: `BlockComponent`s can be detected by checking only specific, predictable shred positions rather than searching across multiple shreds. + +3. **Online Processing**: `BlockComponent` detection can occur during shred ingestion without requiring batch processing or replay. + +## Use Case: Alpenglow Fast Leader Handover + +Alpenglow's fast leader handover feature relies on an `UpdateParent` `BlockComponent` to signal parent slot changes. With this SIMD: + +1. **Current Challenge**: Without guaranteed `DATA_COMPLETE_SHRED` placement, detecting `UpdateParent` requires: + - Searching across multiple shreds in potentially multiple FEC sets + - Complex inference algorithms to guess `BlockComponent` locations + - Or expensive replay operations to determine `UpdateParent` positions + +2. **With This SIMD**: `UpdateParent` detection becomes deterministic: + - Check the 0th shred of the next FEC set when `DATA_COMPLETE_SHRED` is true + - Check the current shred when transitioning from a `DATA_COMPLETE_SHRED` boundary + - No searching or inference required + - See https://github.com/anza-xyz/alpenglow/pull/459 for an example implementation. + +3. **Performance Impact**: This optimization will provide performance improvements to alternative implementations in `UpdateParent` detection latency, and more broadly, `BlockMarker` detection latency, enabling Alpenglow's fast leader handover to operate efficiently at scale. + +## Alternatives + +### Complex Inference Algorithms +Implement sophisticated algorithms to infer `BlockComponent` locations through partial data analysis. + +**Rejected because**: Substantially more complex to maintain and test, slower than the proposed approach, and prone to edge cases. + +### Replay-Based Detection +Determine `UpdateParent` locations through transaction replay after full block receipt. + +**Rejected because**: Substantially slower than online detection, increases block confirmation latency, requires significant computational resources, and prone to DoS attacks from malicious actors. + +### Status Quo +Continue allowing `DATA_COMPLETE_SHRED` placement anywhere within FEC sets. + +**Rejected because**: Maintains current performance limitations and security vulnerabilities. + +## Impact + +### Validator Implementations +All validator clients (Agave, Firedancer, Jito, etc.) MUST implement the validation logic to check `DATA_COMPLETE_SHRED` placement and mark violating slots as dead. + +### Block Production +Block producers MUST ensure they only set `DATA_COMPLETE_SHRED` on the final data shred in each FEC set. Most implementations already follow this pattern. + +### Existing Infrastructure +- No impact on RPC methods or APIs +- No changes required for block explorers +- No impact on existing ledger data (validation only applies to new blocks) + +## Security Considerations + +### DoS Attack Prevention +By enforcing predictable `DATA_COMPLETE_SHRED` placement, this SIMD prevents malicious leaders from: +- Forcing validators into expensive `BlockComponent` searches +- Specifically, with `UpdateParent` in Alpenglow fast leader handover, causing delayed block repairs + +### Deterministic Validation +All validators will deterministically agree on whether a block has valid `DATA_COMPLETE_SHRED` placement, preventing consensus splits due to implementation differences. + +### Consequences of Invalid `DATA_COMPLETE_SHRED` Placement +Blocks with invalid `DATA_COMPLETE_SHRED` placement are marked as dead if they contain valid transactions. + +## Backwards Compatibility + +This change maintains full backwards compatibility: + +1. **Existing Ledgers**: Validation only applies to newly produced blocks. Historical ledger data is not re-validated. + +2. **Rollout Strategy**: The enforcement will be controlled by a feature gate (`discard_unexpected_data_complete_shreds`). + +3. **Version Compatibility**: Validators running older versions will continue to accept blocks with misplaced `DATA_COMPLETE_SHRED` flags until they upgrade. + +4. **No Replay Required**: Existing snapshots and ledger data remain valid without any migration or replay necessary. From d86594abf2d77a4850c85cf2619818580094ccaa Mon Sep 17 00:00:00 2001 From: ksn6 <2163784+ksn6@users.noreply.github.com> Date: Sun, 21 Sep 2025 19:12:48 -0400 Subject: [PATCH 2/5] lint + SIMD number change --- ...-enforce-data-complete-shred-placement.md} | 170 +++++++++++++----- 1 file changed, 127 insertions(+), 43 deletions(-) rename proposals/{0364-enforce-data-complete-shred-placement.md => 0366-enforce-data-complete-shred-placement.md} (50%) diff --git a/proposals/0364-enforce-data-complete-shred-placement.md b/proposals/0366-enforce-data-complete-shred-placement.md similarity index 50% rename from proposals/0364-enforce-data-complete-shred-placement.md rename to proposals/0366-enforce-data-complete-shred-placement.md index 42b02bee3..16d8b8297 100644 --- a/proposals/0364-enforce-data-complete-shred-placement.md +++ b/proposals/0366-enforce-data-complete-shred-placement.md @@ -1,6 +1,6 @@ --- -simd: '0364' -title: Enforce DATA_COMPLETE_SHRED Placement only on Last Data Shreds in FEC Sets +simd: '0366' +title: Enforce DATA_COMPLETE_SHRED Placement authors: - ksn6 (Anza) category: Standard @@ -12,19 +12,40 @@ feature: https://github.com/anza-xyz/agave/pull/8099 ## Summary -This SIMD enforces that the `DATA_COMPLETE_SHRED` flag can only be set on the final data shred within an FEC (Forward Error Correction) set. One key use-case for this SIMD: this restriction enables efficient detection of soon-to-be-introduced `BlockComponent`s at shred ingestion time, providing performance improvements for critical consensus operations. The alternative to this SIMD would be to detect `BlockComponent`s during replay, which would be substantially slower. One key example is `UpdateParent` detection in Alpenglow's fast leader handover; detection during replay would not be ideal, as malicious leaders could intentionally misplace `DATA_COMPLETE_SHRED` flags to force validators into expensive search operations, creating DoS attack vectors and delaying block repairs. +This SIMD enforces that the `DATA_COMPLETE_SHRED` flag can only be set on the +final data shred within an FEC (Forward Error Correction) set. One key +use-case for this SIMD: this restriction enables efficient detection of +soon-to-be-introduced `BlockComponent`s at shred ingestion time, providing +performance improvements for critical consensus operations. The alternative to +this SIMD would be to detect `BlockComponent`s during replay, which would be +substantially slower. One key example is `UpdateParent` detection in +Alpenglow's fast leader handover; detection during replay would not be ideal, +as malicious leaders could intentionally misplace `DATA_COMPLETE_SHRED` flags +to force validators into expensive search operations, creating DoS attack +vectors and delaying block repairs. ## Motivation -Currently, the `DATA_COMPLETE_SHRED` flag can theoretically be set on any data shred within an FEC set, though in practice it marks FEC set boundaries. This ambiguity creates significant performance challenges: +Currently, the `DATA_COMPLETE_SHRED` flag can theoretically be set on any data +shred within an FEC set, though in practice it marks FEC set boundaries. This +ambiguity creates significant performance challenges: -1. **Expensive BlockComponent Detection**: Without guaranteed placement, detecting `BlockComponent`s requires expensive searches across multiple shreds and FEC sets. +1. **Expensive BlockComponent Detection**: Without guaranteed placement, + detecting `BlockComponent`s requires expensive searches across multiple + shreds and FEC sets. -2. **Performance Impact**: alternative `BlockComponent` methods require either complex inference algorithms or expensive replay operations. +2. **Performance Impact**: alternative `BlockComponent` methods require either + complex inference algorithms or expensive replay operations. -3. **Security Vulnerabilities**: Malicious leaders could intentionally misplace `DATA_COMPLETE_SHRED` flags to force validators into expensive search operations, creating DoS attack vectors; for certain proposed `BlockComponent`s (specifically, `UpdateParent`), malicious leaders could delay block repairs for following leaders. +3. **Security Vulnerabilities**: Malicious leaders could intentionally + misplace `DATA_COMPLETE_SHRED` flags to force validators into expensive + search operations, creating DoS attack vectors; for certain proposed + `BlockComponent`s (specifically, `UpdateParent`), malicious leaders could + delay block repairs for following leaders. -By enforcing `DATA_COMPLETE_SHRED` placement only on the last data shred in each FEC set, validators can efficiently detect `BlockComponent` boundaries during online shred ingestion. +By enforcing `DATA_COMPLETE_SHRED` placement only on the last data shred in +each FEC set, validators can efficiently detect `BlockComponent` boundaries +during online shred ingestion. ## Forward Dependencies @@ -32,95 +53,144 @@ This proposal is required for: - **[SIMD-0337]: UpdateParent Marker for Alpenglow Fast Leader Handover** - Alpenglow's fast leader handover feature depends on efficient UpdateParent detection, which this SIMD enables through guaranteed FEC set boundary markers. + Alpenglow's fast leader handover feature depends on efficient + UpdateParent detection, which this SIMD enables through guaranteed FEC set + boundary markers. [SIMD-0337]: https://github.com/solana-foundation/solana-improvement-documents/pull/337 ## New Terminology -- **FEC Set**: A fixed-size group of exactly 32 data shreds plus associated coding shreds used for erasure coding and error correction. +- **FEC Set**: A fixed-size group of exactly 32 data shreds plus associated + coding shreds used for erasure coding and error correction. -- **FEC Set Boundary**: The transition point between consecutive FEC sets, marked by the `DATA_COMPLETE_SHRED` flag on the final data shred of each set. +- **FEC Set Boundary**: The transition point between consecutive FEC sets, + marked by the `DATA_COMPLETE_SHRED` flag on the final data shred of each + set. -- **BlockComponent**: A control element within a block that provides metadata or instructions for block processing. `BlockComponent`s are aligned with FEC set boundaries when the previous shred has `DATA_COMPLETE_SHRED` set. E.g., see https://github.com/anza-xyz/agave/pull/8100 for an implementation of `BlockComponent` within Agave. +- **BlockComponent**: A control element within a block that provides metadata + or instructions for block processing. `BlockComponent`s are aligned with FEC + set boundaries when the previous shred has `DATA_COMPLETE_SHRED` set. E.g., + see https://github.com/anza-xyz/agave/pull/8100 for an implementation of + `BlockComponent` within Agave. ## Detailed Design ### FEC Set Structure FEC sets in Solana have a fixed structure: + - Each FEC set contains exactly 32 data shreds - Each FEC set contains exactly 32 coding shreds for error correction -- FEC sets are never partial - if insufficient data exists to fill a complete set, the remaining positions are zero-padded +- FEC sets are never partial - if insufficient data exists to fill a complete + set, the remaining positions are zero-padded ### `DATA_COMPLETE_SHRED` Placement Rules The following rules MUST be enforced by all validator implementations: -1. **Placement Restriction**: The `DATA_COMPLETE_SHRED` flag MUST only be set to `true` on the final data shred within an FEC set (index position 31 within the set, or more generally at positions where `(shred_index + 1) % 32 == 0` for the data shred's position within the slot). +1. **Placement Restriction**: The `DATA_COMPLETE_SHRED` flag MUST only be set + to `true` on the final data shred within an FEC set (index position 31 + within the set, or more generally at positions where + `(shred_index + 1) % 32 == 0` for the data shred's position within the + slot). -2. **Validation Requirement**: Validators MUST check `DATA_COMPLETE_SHRED` placement during shred ingestion (either at turbine receipt or blockstore insertion). +2. **Validation Requirement**: Validators MUST check `DATA_COMPLETE_SHRED` + placement during shred ingestion (either at turbine receipt or blockstore + insertion). -3. **Invalid Placement Handling**: If a validator detects `DATA_COMPLETE_SHRED` set to `true` on any data shred other than the last one in an FEC set: +3. **Invalid Placement Handling**: If a validator detects + `DATA_COMPLETE_SHRED` set to `true` on any data shred other than the last + one in an FEC set: - The validator MUST mark the entire slot as dead -4. **Zero-Padding Behavior**: For the final FEC set in a block that contains fewer than 32 data shreds: - - The `DATA_COMPLETE_SHRED` flag MUST be set on the last actual data shred before zero-padding - - The remaining positions in the FEC set MUST be zero-padded to maintain the fixed 32-shred structure +4. **Zero-Padding Behavior**: For the final FEC set in a block that contains + fewer than 32 data shreds: + - The `DATA_COMPLETE_SHRED` flag MUST be set on the last actual data shred + before zero-padding + - The remaining positions in the FEC set MUST be zero-padded to maintain + the fixed 32-shred structure ### `BlockComponent` Alignment With enforced `DATA_COMPLETE_SHRED` placement: -1. **Boundary Detection**: `BlockComponent`s start at the beginning of an FEC set when the previous FEC set's last data shred has `DATA_COMPLETE_SHRED` set to `true`. +1. **Boundary Detection**: `BlockComponent`s start at the beginning of an FEC + set when the previous FEC set's last data shred has `DATA_COMPLETE_SHRED` + set to `true`. -2. **Efficient Lookup**: `BlockComponent`s can be detected by checking only specific, predictable shred positions rather than searching across multiple shreds. +2. **Efficient Lookup**: `BlockComponent`s can be detected by checking only + specific, predictable shred positions rather than searching across multiple + shreds. -3. **Online Processing**: `BlockComponent` detection can occur during shred ingestion without requiring batch processing or replay. +3. **Online Processing**: `BlockComponent` detection can occur during shred + ingestion without requiring batch processing or replay. ## Use Case: Alpenglow Fast Leader Handover -Alpenglow's fast leader handover feature relies on an `UpdateParent` `BlockComponent` to signal parent slot changes. With this SIMD: +Alpenglow's fast leader handover feature relies on an `UpdateParent` +`BlockComponent` to signal parent slot changes. With this SIMD: -1. **Current Challenge**: Without guaranteed `DATA_COMPLETE_SHRED` placement, detecting `UpdateParent` requires: +1. **Current Challenge**: Without guaranteed `DATA_COMPLETE_SHRED` placement, + detecting `UpdateParent` requires: - Searching across multiple shreds in potentially multiple FEC sets - Complex inference algorithms to guess `BlockComponent` locations - Or expensive replay operations to determine `UpdateParent` positions 2. **With This SIMD**: `UpdateParent` detection becomes deterministic: - - Check the 0th shred of the next FEC set when `DATA_COMPLETE_SHRED` is true - - Check the current shred when transitioning from a `DATA_COMPLETE_SHRED` boundary + - Check the 0th shred of the next FEC set when `DATA_COMPLETE_SHRED` is + true + - Check the current shred when transitioning from a `DATA_COMPLETE_SHRED` + boundary - No searching or inference required - See https://github.com/anza-xyz/alpenglow/pull/459 for an example implementation. -3. **Performance Impact**: This optimization will provide performance improvements to alternative implementations in `UpdateParent` detection latency, and more broadly, `BlockMarker` detection latency, enabling Alpenglow's fast leader handover to operate efficiently at scale. +3. **Performance Impact**: This optimization will provide performance + improvements to alternative implementations in `UpdateParent` detection + latency, and more broadly, `BlockMarker` detection latency, enabling + Alpenglow's fast leader handover to operate efficiently at scale. -## Alternatives +## Alternatives Considered ### Complex Inference Algorithms -Implement sophisticated algorithms to infer `BlockComponent` locations through partial data analysis. -**Rejected because**: Substantially more complex to maintain and test, slower than the proposed approach, and prone to edge cases. +Implement sophisticated algorithms to infer `BlockComponent` locations through +partial data analysis. + +**Rejected because**: Substantially more complex to maintain and test, slower +than the proposed approach, and prone to edge cases. ### Replay-Based Detection -Determine `UpdateParent` locations through transaction replay after full block receipt. -**Rejected because**: Substantially slower than online detection, increases block confirmation latency, requires significant computational resources, and prone to DoS attacks from malicious actors. +Determine `UpdateParent` locations through transaction replay after full block +receipt. + +**Rejected because**: Substantially slower than online detection, increases +block confirmation latency, requires significant computational resources, and +prone to DoS attacks from malicious actors. ### Status Quo + Continue allowing `DATA_COMPLETE_SHRED` placement anywhere within FEC sets. -**Rejected because**: Maintains current performance limitations and security vulnerabilities. +**Rejected because**: Maintains current performance limitations and security +vulnerabilities. ## Impact ### Validator Implementations -All validator clients (Agave, Firedancer, Jito, etc.) MUST implement the validation logic to check `DATA_COMPLETE_SHRED` placement and mark violating slots as dead. + +All validator clients (Agave, Firedancer, Jito, etc.) MUST implement the +validation logic to check `DATA_COMPLETE_SHRED` placement and mark violating +slots as dead. ### Block Production -Block producers MUST ensure they only set `DATA_COMPLETE_SHRED` on the final data shred in each FEC set. Most implementations already follow this pattern. + +Block producers MUST ensure they only set `DATA_COMPLETE_SHRED` on the final +data shred in each FEC set. Most implementations already follow this pattern. ### Existing Infrastructure + - No impact on RPC methods or APIs - No changes required for block explorers - No impact on existing ledger data (validation only applies to new blocks) @@ -128,24 +198,38 @@ Block producers MUST ensure they only set `DATA_COMPLETE_SHRED` on the final dat ## Security Considerations ### DoS Attack Prevention -By enforcing predictable `DATA_COMPLETE_SHRED` placement, this SIMD prevents malicious leaders from: + +By enforcing predictable `DATA_COMPLETE_SHRED` placement, this SIMD prevents +malicious leaders from: + - Forcing validators into expensive `BlockComponent` searches -- Specifically, with `UpdateParent` in Alpenglow fast leader handover, causing delayed block repairs +- Specifically, with `UpdateParent` in Alpenglow fast leader handover, causing + delayed block repairs ### Deterministic Validation -All validators will deterministically agree on whether a block has valid `DATA_COMPLETE_SHRED` placement, preventing consensus splits due to implementation differences. + +All validators will deterministically agree on whether a block has valid +`DATA_COMPLETE_SHRED` placement, preventing consensus splits due to +implementation differences. ### Consequences of Invalid `DATA_COMPLETE_SHRED` Placement -Blocks with invalid `DATA_COMPLETE_SHRED` placement are marked as dead if they contain valid transactions. + +Blocks with invalid `DATA_COMPLETE_SHRED` placement are marked as dead if they +contain valid transactions. ## Backwards Compatibility This change maintains full backwards compatibility: -1. **Existing Ledgers**: Validation only applies to newly produced blocks. Historical ledger data is not re-validated. +1. **Existing Ledgers**: Validation only applies to newly produced blocks. + Historical ledger data is not re-validated. -2. **Rollout Strategy**: The enforcement will be controlled by a feature gate (`discard_unexpected_data_complete_shreds`). +2. **Rollout Strategy**: The enforcement will be controlled by a feature gate + (`discard_unexpected_data_complete_shreds`). -3. **Version Compatibility**: Validators running older versions will continue to accept blocks with misplaced `DATA_COMPLETE_SHRED` flags until they upgrade. +3. **Version Compatibility**: Validators running older versions will continue + to accept blocks with misplaced `DATA_COMPLETE_SHRED` flags until they + upgrade. -4. **No Replay Required**: Existing snapshots and ledger data remain valid without any migration or replay necessary. +4. **No Replay Required**: Existing snapshots and ledger data remain valid + without any migration or replay necessary. From bb610cd1d1fc3a07a14598ba6d5c0e55ee92c0e3 Mon Sep 17 00:00:00 2001 From: ksn6 <2163784+ksn6@users.noreply.github.com> Date: Sun, 21 Sep 2025 19:23:00 -0400 Subject: [PATCH 3/5] improve writing style --- ...6-enforce-data-complete-shred-placement.md | 103 +++++++++--------- 1 file changed, 52 insertions(+), 51 deletions(-) diff --git a/proposals/0366-enforce-data-complete-shred-placement.md b/proposals/0366-enforce-data-complete-shred-placement.md index 16d8b8297..3fddc0a9d 100644 --- a/proposals/0366-enforce-data-complete-shred-placement.md +++ b/proposals/0366-enforce-data-complete-shred-placement.md @@ -12,26 +12,26 @@ feature: https://github.com/anza-xyz/agave/pull/8099 ## Summary -This SIMD enforces that the `DATA_COMPLETE_SHRED` flag can only be set on the +This SIMD enforces that the `DATA_COMPLETE_SHRED` flag can only appear on the final data shred within an FEC (Forward Error Correction) set. One key use-case for this SIMD: this restriction enables efficient detection of soon-to-be-introduced `BlockComponent`s at shred ingestion time, providing performance improvements for critical consensus operations. The alternative to -this SIMD would be to detect `BlockComponent`s during replay, which would be -substantially slower. One key example is `UpdateParent` detection in -Alpenglow's fast leader handover; detection during replay would not be ideal, +this SIMD would detect `BlockComponent`s during replay, which would run +substantially slower. One key example involves `UpdateParent` detection in +Alpenglow's fast leader handover; detection during replay would not work well, as malicious leaders could intentionally misplace `DATA_COMPLETE_SHRED` flags to force validators into expensive search operations, creating DoS attack vectors and delaying block repairs. ## Motivation -Currently, the `DATA_COMPLETE_SHRED` flag can theoretically be set on any data +Currently, the `DATA_COMPLETE_SHRED` flag can theoretically appear on any data shred within an FEC set, though in practice it marks FEC set boundaries. This ambiguity creates significant performance challenges: 1. **Expensive BlockComponent Detection**: Without guaranteed placement, - detecting `BlockComponent`s requires expensive searches across multiple + `BlockComponent` detection requires expensive searches across multiple shreds and FEC sets. 2. **Performance Impact**: alternative `BlockComponent` methods require either @@ -43,13 +43,13 @@ ambiguity creates significant performance challenges: `BlockComponent`s (specifically, `UpdateParent`), malicious leaders could delay block repairs for following leaders. -By enforcing `DATA_COMPLETE_SHRED` placement only on the last data shred in -each FEC set, validators can efficiently detect `BlockComponent` boundaries -during online shred ingestion. +Enforcing `DATA_COMPLETE_SHRED` placement only on the last data shred in +each FEC set allows validators to efficiently detect `BlockComponent` +boundaries during online shred ingestion. ## Forward Dependencies -This proposal is required for: +This proposal supports: - **[SIMD-0337]: UpdateParent Marker for Alpenglow Fast Leader Handover** @@ -62,14 +62,14 @@ This proposal is required for: ## New Terminology - **FEC Set**: A fixed-size group of exactly 32 data shreds plus associated - coding shreds used for erasure coding and error correction. + coding shreds for erasure coding and error correction. - **FEC Set Boundary**: The transition point between consecutive FEC sets, - marked by the `DATA_COMPLETE_SHRED` flag on the final data shred of each + which the `DATA_COMPLETE_SHRED` flag marks on the final data shred of each set. - **BlockComponent**: A control element within a block that provides metadata - or instructions for block processing. `BlockComponent`s are aligned with FEC + or instructions for block processing. `BlockComponent`s align with FEC set boundaries when the previous shred has `DATA_COMPLETE_SHRED` set. E.g., see https://github.com/anza-xyz/agave/pull/8100 for an implementation of `BlockComponent` within Agave. @@ -78,19 +78,19 @@ This proposal is required for: ### FEC Set Structure -FEC sets in Solana have a fixed structure: +FEC sets in Solana follow a fixed structure: - Each FEC set contains exactly 32 data shreds - Each FEC set contains exactly 32 coding shreds for error correction -- FEC sets are never partial - if insufficient data exists to fill a complete - set, the remaining positions are zero-padded +- FEC sets never allow partial fills - if insufficient data exists to fill a + complete set, zero-padding fills the remaining positions ### `DATA_COMPLETE_SHRED` Placement Rules -The following rules MUST be enforced by all validator implementations: +All validator implementations MUST enforce the following rules: -1. **Placement Restriction**: The `DATA_COMPLETE_SHRED` flag MUST only be set - to `true` on the final data shred within an FEC set (index position 31 +1. **Placement Restriction**: The `DATA_COMPLETE_SHRED` flag MUST only appear + as `true` on the final data shred within an FEC set (index position 31 within the set, or more generally at positions where `(shred_index + 1) % 32 == 0` for the data shred's position within the slot). @@ -100,26 +100,26 @@ The following rules MUST be enforced by all validator implementations: insertion). 3. **Invalid Placement Handling**: If a validator detects - `DATA_COMPLETE_SHRED` set to `true` on any data shred other than the last + `DATA_COMPLETE_SHRED` as `true` on any data shred other than the last one in an FEC set: - The validator MUST mark the entire slot as dead 4. **Zero-Padding Behavior**: For the final FEC set in a block that contains fewer than 32 data shreds: - - The `DATA_COMPLETE_SHRED` flag MUST be set on the last actual data shred + - The `DATA_COMPLETE_SHRED` flag MUST appear on the last actual data shred before zero-padding - - The remaining positions in the FEC set MUST be zero-padded to maintain + - Zero-padding MUST fill the remaining positions in the FEC set to maintain the fixed 32-shred structure ### `BlockComponent` Alignment -With enforced `DATA_COMPLETE_SHRED` placement: +Enforcing `DATA_COMPLETE_SHRED` placement enables: 1. **Boundary Detection**: `BlockComponent`s start at the beginning of an FEC - set when the previous FEC set's last data shred has `DATA_COMPLETE_SHRED` - set to `true`. + set when the previous FEC set's last data shred shows `DATA_COMPLETE_SHRED` + as `true`. -2. **Efficient Lookup**: `BlockComponent`s can be detected by checking only +2. **Efficient Lookup**: `BlockComponent` detection requires checking only specific, predictable shred positions rather than searching across multiple shreds. @@ -128,21 +128,21 @@ With enforced `DATA_COMPLETE_SHRED` placement: ## Use Case: Alpenglow Fast Leader Handover -Alpenglow's fast leader handover feature relies on an `UpdateParent` +Alpenglow's fast leader handover feature uses an `UpdateParent` `BlockComponent` to signal parent slot changes. With this SIMD: 1. **Current Challenge**: Without guaranteed `DATA_COMPLETE_SHRED` placement, - detecting `UpdateParent` requires: + `UpdateParent` detection requires: - Searching across multiple shreds in potentially multiple FEC sets - Complex inference algorithms to guess `BlockComponent` locations - Or expensive replay operations to determine `UpdateParent` positions -2. **With This SIMD**: `UpdateParent` detection becomes deterministic: - - Check the 0th shred of the next FEC set when `DATA_COMPLETE_SHRED` is +2. **With This SIMD**: `UpdateParent` detection works deterministically: + - Check the 0th shred of the next FEC set when `DATA_COMPLETE_SHRED` shows true - Check the current shred when transitioning from a `DATA_COMPLETE_SHRED` boundary - - No searching or inference required + - No searching or inference needed - See https://github.com/anza-xyz/alpenglow/pull/459 for an example implementation. 3. **Performance Impact**: This optimization will provide performance @@ -157,37 +157,38 @@ Alpenglow's fast leader handover feature relies on an `UpdateParent` Implement sophisticated algorithms to infer `BlockComponent` locations through partial data analysis. -**Rejected because**: Substantially more complex to maintain and test, slower -than the proposed approach, and prone to edge cases. +**Rejection rationale**: Substantially more complex to maintain and test, +slower than the proposed approach, and prone to edge cases. ### Replay-Based Detection Determine `UpdateParent` locations through transaction replay after full block receipt. -**Rejected because**: Substantially slower than online detection, increases +**Rejection rationale**: Substantially slower than online detection, increases block confirmation latency, requires significant computational resources, and -prone to DoS attacks from malicious actors. +vulnerable to DoS attacks from malicious actors. ### Status Quo Continue allowing `DATA_COMPLETE_SHRED` placement anywhere within FEC sets. -**Rejected because**: Maintains current performance limitations and security -vulnerabilities. +**Rejection rationale**: Maintains current performance limitations and +security vulnerabilities. ## Impact ### Validator Implementations -All validator clients (Agave, Firedancer, Jito, etc.) MUST implement the -validation logic to check `DATA_COMPLETE_SHRED` placement and mark violating -slots as dead. +All validator clients (Agave, Firedancer, Jito, etc.) MUST implement +validation logic that checks `DATA_COMPLETE_SHRED` placement and marks +violating slots as dead. ### Block Production -Block producers MUST ensure they only set `DATA_COMPLETE_SHRED` on the final -data shred in each FEC set. Most implementations already follow this pattern. +Block producers MUST ensure they only place `DATA_COMPLETE_SHRED` on the +final data shred in each FEC set. Most implementations already follow this +pattern. ### Existing Infrastructure @@ -199,8 +200,8 @@ data shred in each FEC set. Most implementations already follow this pattern. ### DoS Attack Prevention -By enforcing predictable `DATA_COMPLETE_SHRED` placement, this SIMD prevents -malicious leaders from: +Enforcing predictable `DATA_COMPLETE_SHRED` placement prevents malicious +leaders from: - Forcing validators into expensive `BlockComponent` searches - Specifically, with `UpdateParent` in Alpenglow fast leader handover, causing @@ -208,23 +209,23 @@ malicious leaders from: ### Deterministic Validation -All validators will deterministically agree on whether a block has valid -`DATA_COMPLETE_SHRED` placement, preventing consensus splits due to +All validators will deterministically agree on whether a block shows valid +`DATA_COMPLETE_SHRED` placement, preventing consensus splits from implementation differences. ### Consequences of Invalid `DATA_COMPLETE_SHRED` Placement -Blocks with invalid `DATA_COMPLETE_SHRED` placement are marked as dead if they -contain valid transactions. +Validators mark blocks with invalid `DATA_COMPLETE_SHRED` placement as dead +if they contain valid transactions. ## Backwards Compatibility This change maintains full backwards compatibility: 1. **Existing Ledgers**: Validation only applies to newly produced blocks. - Historical ledger data is not re-validated. + Historical ledger data requires no re-validation. -2. **Rollout Strategy**: The enforcement will be controlled by a feature gate +2. **Rollout Strategy**: A feature gate controls the enforcement (`discard_unexpected_data_complete_shreds`). 3. **Version Compatibility**: Validators running older versions will continue @@ -232,4 +233,4 @@ This change maintains full backwards compatibility: upgrade. 4. **No Replay Required**: Existing snapshots and ledger data remain valid - without any migration or replay necessary. + without requiring any migration or replay. From 8a4043ee5104f25bd21756bcdcc55fcb0c606c78 Mon Sep 17 00:00:00 2001 From: ksn6 <2163784+ksn6@users.noreply.github.com> Date: Sun, 21 Sep 2025 23:28:35 -0400 Subject: [PATCH 4/5] update --- proposals/0366-enforce-data-complete-shred-placement.md | 1 + 1 file changed, 1 insertion(+) diff --git a/proposals/0366-enforce-data-complete-shred-placement.md b/proposals/0366-enforce-data-complete-shred-placement.md index 3fddc0a9d..b15b0864d 100644 --- a/proposals/0366-enforce-data-complete-shred-placement.md +++ b/proposals/0366-enforce-data-complete-shred-placement.md @@ -3,6 +3,7 @@ simd: '0366' title: Enforce DATA_COMPLETE_SHRED Placement authors: - ksn6 (Anza) + - Ashwin Sekar (Anza) category: Standard type: Core status: Review From 2f98a7d32c0dc15b7306e32240289cf142d23ae7 Mon Sep 17 00:00:00 2001 From: ksn6 <2163784+ksn6@users.noreply.github.com> Date: Mon, 22 Sep 2025 23:01:27 -0400 Subject: [PATCH 5/5] address comments --- proposals/0366-enforce-data-complete-shred-placement.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/proposals/0366-enforce-data-complete-shred-placement.md b/proposals/0366-enforce-data-complete-shred-placement.md index b15b0864d..825f469e5 100644 --- a/proposals/0366-enforce-data-complete-shred-placement.md +++ b/proposals/0366-enforce-data-complete-shred-placement.md @@ -38,7 +38,12 @@ ambiguity creates significant performance challenges: 2. **Performance Impact**: alternative `BlockComponent` methods require either complex inference algorithms or expensive replay operations. -3. **Security Vulnerabilities**: Malicious leaders could intentionally +3. **Complexity**: there is extra code complexity in storing the data complete + ranges for deserialization in replay. With this change and SIMD-0317 (fixed + 32:32 FEC sets), it becomes really easy to figure out the ranges + to deserialize for replay. + +4. **Security Vulnerabilities**: Malicious leaders could intentionally misplace `DATA_COMPLETE_SHRED` flags to force validators into expensive search operations, creating DoS attack vectors; for certain proposed `BlockComponent`s (specifically, `UpdateParent`), malicious leaders could