-
Notifications
You must be signed in to change notification settings - Fork 66
RFC-154: Multi Slot AURA for system parachains #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- fixed minor formatting issues - added another assumption on backers - added discussion on priority transactions v/s tipping mechanism.
| | **Authors** | bhargavbh, burdges, AlistairStewart | | ||
|
|
||
| ## Summary | ||
| This RFC proposes a modification to the AURA round-robin block production mechanism for system parachains (e.g. Polkadot Hub). The proposed change increases the number of consecutive block production slots assigned to each collator from the current single-slot allocation to a configurable value, initially set at four. This modification aims to enhance censorship resistance by mitigating data-withholding attacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that we need this RFC at all. We only need to increase the slot duration and then authors will build multiple blocks. Aka exactly what you are saying here in this document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by "increasing slot duration", if you mean increasing the duration in which the collator has authorship rights but is still expected to produce blocks every 6s, then we are also suggesting the same.
Why this is an RFC? because the above change impacts several stake holders like collators and System parachains users. Essentially, this change effects the trade-off between throughput and censorship resistance for system parachains. Rather than making such changes in a silo, its better to discuss this publicly and address concerns if any. Isn't this exactly whats RFCs are for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing the slot duration is not enough, we would need subtle changes to the existing logic that would also require an RFC.
We did consider designs where collators would normally rotate every 6 seconds but if one does not show up, we would wait say up to 4 blocks and not accept other collations, which I assume is what a longer slot duration means. There are two reasons, one performance and one security, assigning multiple slots to collators and allowing them to produce multiple blocks are better.
For performance, suppose we have 10 collators and a slot is 4 relay chain slots and one of those collators is offline but none are malicious. Then we get 9 blocks every 13 relay chain blocks because the offline collator results in 4 empty relay chain slots. With 4 slots per collator in the multi-slot approach, we get 36 blocks built every 40 blocks. The worst case latency of 5 relay chain slots to get a transcation in is the same but this happens less often and the bandwidth is higher.
For security, there is an advantage of having a fixed schedule of assigning collators to relay chain slots in the case that malicious collators are colluding with a number of malicious validators. The validators are randomly assigned to backing groups and with <1/3 bad validators, most backing groups will be honest. But because these groups are small, there can be a significant chance they are bad. With a flexible schedule, bad collators can not show up in order to make the honest collator's slot line up with a bad backing group. With a fixed schedule, the bad guys have no choice about when the honest collators turn is and whether the backing group at that time is bad should be random.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
additionally, it would help improve performance if collators can fetch blocks directly from backers and initiate the reconstruction from availability layer in parallel. If the backer responds with the block, then the reconstruction can be terminated. The RFC also recommends the above networking level change.
There is a subtlety here, of course, we do not want the backers to be overwhelmed by spam requests just to deny the victim collator's request. This requires some access control mechanism, where the collator can prove his authorship right, along the direction of pre-PVF. However, such access control is more of a discussion for future work and not an immediate requirement. We can always rely on reconstruction from availability, but this would mean a slightly higher x (i.e. the number of consecutive slots).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "increase slot duration" means as interpreted by alistair in the above comment, then such a change may affect the AnV protocol on the Relaychain side and one may have to make further subtle changes to ensure other timeouts are not triggered, etc.
One advantage about the multislot proposed in this RFC is that from Relay chain validators view-point, it is the exact same protocol, just that the collators are assigned differently. The changes can be modularly implemented on AURA without touching AnV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are probably not aware what the implementation already supports, so let's talk about this. We already decided some time ago that the minimum slot time we have is equal to the relay chain slot time. It is important to not mix them up, there is a slot time on the relay chain and one at each parachain.
Block production is also not bound anymore to slots. Right now block production depends on the number of cores assigned to a parachain in each relay chain slot. With my future changes, block production will be completely independent from number of cores and slot duration (on the parachain).
So, right now we already supporting what this RFC is proposing. AH on Kusama is already running with this, but only with 12s long slots and not 24s long slots as proposed by this RFC. In these 12s each author right now is producing 2 blocks, because we have 1 core per relay chain slot. If you would assign more cores, it would produce more blocks. As said, this will change in the future.
None of the changes require any changes to relay chain, because the relay chain doesn't care how the parachains are selecting their collators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I hope @bkchr comment cleaned up most of the confusion. I think I got one argument for multi-slot, in contrast to long slots that is still worth discussing:
If we go multi-slot and let's say per slot you are only allowed to produce X blocks, then you can not hold back all the blocks (n*X) until the very end, but are limited to X blocks, because each time one of your slots passed, you either published your blocks or you did not ... we are forcing some continuity.
But, I don't think this is required on this level (as of now), because your blocks need to make it on the relay chain - if you wait, you miss out on backing opportunities and you get the exact same effect, also we have configurations like max_unincluded_segment .... But it is fragile. E.g. actual elastic scaling would need to allow a more dynamic way of adding cores, which works against our continuity desires here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the minimum slot time we have is equal to the relay chain slot time
Why? I'd think elastic scaling could break this assumption somewhat, although the relay chain only acknowledge them every 6s obviously. You mean you want their final time stamp to come from the relay. chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkchr just to clarify, is this the PR you are referring to when you say multiple blocks can be produced within each slot? paritytech/polkadot-sdk#7569.
Concretely, you are suggesting to map the parameter x (defined in terms of number of slots) in our solution to the slot_duration (basically 6*x for parachains utilising async_backing). Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? I'd think elastic scaling could break this assumption somewhat, although the relay chain only acknowledge them every 6s obviously. You mean you want their final time stamp to come from the relay. chain?
In the current implementation the slots are at least 6s. One author with elastic scaling can produce multiple blocks in this slot, each one for a different core. By increasing the slot duration on the parachain to 24s, each author would produce 4 blocks, one every 6s (assuming async backing).
You mean you want their final time stamp to come from the relay. chain?
I don't understand fully what you are asking here. The runtime currently checks that the parachain slot is not in the future. Based on the relay chain slot and parachain slot duration, we calculate that the parachain slot of the current block is not in the future.
|
We discussed this protocol change here: https://www.youtube.com/watch?v=Rp6usu2wN-A |
|
|
||
| - **Collator Honesty:** The model assumes the presence of at least one honest collator. We intentionally chose the most relaxed security assumption as collators are not slashable (unlike validators). Note that all system parachains use AURA via the [Aura-Ext](https://github.com/paritytech/polkadot-sdk/tree/master/cumulus/pallets/aura-ext) pallet. | ||
|
|
||
| - **Backer Honesty:** The backer assigned to a block candidate is assumed to be honest. This is a reasonable assumption given 2/3rd honesty on relay-chain and that backers are assigned randomly by [ELVES](https://eprint.iacr.org/2024/961.pdf). Additionally, we assume that backers responsible for disbursing the withheld block to the victim collators. Pre-PVFs can definitely help in improving the resilience of backers against DoS attacks. Essentially, the pre PVF lets backers check the slot ownership and hence backers can filter out spamming collators at this stage. However, pre-PVFs have not yet been implemented. The stronger on assumption on backer disbursing the block is only needed for efficiency concerns and not essential for censorship resistance itself (i.e. the collator can always reconstruct from the availability layer). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there already some term for Pre-PVF this in the JAM land? I've called this various different things over the years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Authorizer.
|
|
||
| The number of consecutive slots to be assigned to ensure AURA's censorship resistance depends on Async Backing Parameters like `unincluded_segment_length`. We now describe our approach for deriving $x$ based on paramters of async backing and other variables like block production and latency in availability layer. The relevant values can then be plugged in to obtain $x$ for any system parachain. | ||
|
|
||
| Clearly, the number of consecutive slots (x) in the round-robin is lower bounded by the time required to reconstruct the previous block from the availability layer (b) in addition to the block building time (a). Hence, we need to set $x$ such that $x\geq a+b$. But with async backing, a malicious collator sequentially tries to not share the block and just-in-time front-run the honest collator for all the unincluded_segment blocks. Hence, $x\geq (a+b)\cdot m$ is sufficient, where $m$ is the max allowed candidate depth (unincluded segment allowed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct grammar is "number
|
|
||
| ### Number of consecutive slots for Polkadot Hub | ||
|
|
||
| Assuming the previous block data can be fetched from backers, then we comfortably have $a+b \leq 6s$, i.e. block buiding plus recoinstruciton time is < 6s. Using the current `asyncdelay` of 18s, suffices to set $x$ to 4. If the `max_candidate_depth` (m) for Polkadot Hub is set $m\leq3$, then this will reduce (improve) $x$ from 4 to $m$. Note that a channel would have to be provided for collators to fetch blocks from backers as the preferred option and only recover from availability layer as the fail-safe option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay at least this reaches some reasonsble number here. I'm still kinda dubious that some of the above contraints are excessive, but we can discuss it at the office.
|
I'm not convinced by the rules for setting I do think wall clock time sounds tempting here, but we've too many other delay sources in the system. It'll get ugly.. |
|
|
||
| ## Motivation | ||
|
|
||
| The Polkadot Relay Chain guarantees the safety of parachain blocks, but it does not provide explicit guarantees for liveness or censorship resistance. With the planned migration of core Relay Chain functionalities—such as Balances, Staking, and Governance—to the Polkadot Hub system parachain in early November 2025, it becomes critical to establish a mechanism for achieving censorship resistance for these parachains without compromising throughput. For example, if governance functionality is migrated to Polkadot-Hub, malicious collators could systematically censor `aye` votes for a Relay Chain runtime upgrade, potentially altering the referendum's outcome. This demonstrates that censorship attacks on a system parachain can have a direct and undesirable impact on the security of the Relay Chain. This proposal addresses such censorship vulnerabilities by modifying the AURA block production mechanism utilized by system parachain collator with minimal honesty assumptions on the collators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polkadot Hub is not a chain.
| The Polkadot Relay Chain guarantees the safety of parachain blocks, but it does not provide explicit guarantees for liveness or censorship resistance. With the planned migration of core Relay Chain functionalities—such as Balances, Staking, and Governance—to the Polkadot Hub system parachain in early November 2025, it becomes critical to establish a mechanism for achieving censorship resistance for these parachains without compromising throughput. For example, if governance functionality is migrated to Polkadot-Hub, malicious collators could systematically censor `aye` votes for a Relay Chain runtime upgrade, potentially altering the referendum's outcome. This demonstrates that censorship attacks on a system parachain can have a direct and undesirable impact on the security of the Relay Chain. This proposal addresses such censorship vulnerabilities by modifying the AURA block production mechanism utilized by system parachain collator with minimal honesty assumptions on the collators. | |
| The Polkadot Relay Chain guarantees the safety of parachain blocks, but it does not provide explicit guarantees for liveness or censorship resistance. With the planned migration of core Relay Chain functionalities—such as Balances, Staking, and Governance—to the Asset Hub system parachain in early November 2025, it becomes critical to establish a mechanism for achieving censorship resistance for these parachains without compromising throughput. For example, if governance functionality is migrated to a parachain, malicious collators could systematically censor `aye` votes for a Relay Chain runtime upgrade, potentially altering the referendum's outcome. This demonstrates that censorship attacks on a system parachain can have a direct and undesirable impact on the security of the Relay Chain. This proposal addresses such censorship vulnerabilities by modifying the AURA block production mechanism utilized by system parachain collator with minimal honesty assumptions on the collators. |
|
|
||
| ### Proposed Solution | ||
|
|
||
| This proposal modifies the AURA round-robin mechanism to assign $x$ consecutive slots to each collator. The specific value of $x$ is contingent upon asynchronous backing parameters od the system parachain and will be derived using a generic formula provided in this document. The collator selected by AURA will be responsible for producing $x$ consecutive blocks. This modification will require corresponding adjustments to the AURA authorship checks within the PVF (Parachain Validation Function). For the current configuration of Polkadot Hub, $x=4$. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This proposal modifies the AURA round-robin mechanism to assign $x$ consecutive slots to each collator. The specific value of $x$ is contingent upon asynchronous backing parameters od the system parachain and will be derived using a generic formula provided in this document. The collator selected by AURA will be responsible for producing $x$ consecutive blocks. This modification will require corresponding adjustments to the AURA authorship checks within the PVF (Parachain Validation Function). For the current configuration of Polkadot Hub, $x=4$. | |
| This proposal modifies the AURA round-robin mechanism to assign $x$ consecutive slots to each collator. The specific value of $x$ is contingent upon asynchronous backing parameters od the system parachain and will be derived using a generic formula provided in this document. The collator selected by AURA will be responsible for producing $x$ consecutive blocks. This modification will require corresponding adjustments to the AURA authorship checks within the PVF (Parachain Validation Function). For the current configuration of Asset Hub, $x=4$. |
|
|
||
| where $m$ is the `max_candidate_depth` (or unincluded segment as seen from collator's perpective). | ||
|
|
||
| ### Number of consecutive slots for Polkadot Hub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### Number of consecutive slots for Polkadot Hub | |
| ### Number of Consecutive Slots for System Parachains |
|
|
||
| ### Number of consecutive slots for Polkadot Hub | ||
|
|
||
| Assuming the previous block data can be fetched from backers, then we comfortably have $a+b \leq 6s$, i.e. block buiding plus recoinstruciton time is < 6s. Using the current `asyncdelay` of 18s, suffices to set $x$ to 4. If the `max_candidate_depth` (m) for Polkadot Hub is set $m\leq3$, then this will reduce (improve) $x$ from 4 to $m$. Note that a channel would have to be provided for collators to fetch blocks from backers as the preferred option and only recover from availability layer as the fail-safe option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Assuming the previous block data can be fetched from backers, then we comfortably have $a+b \leq 6s$, i.e. block buiding plus recoinstruciton time is < 6s. Using the current `asyncdelay` of 18s, suffices to set $x$ to 4. If the `max_candidate_depth` (m) for Polkadot Hub is set $m\leq3$, then this will reduce (improve) $x$ from 4 to $m$. Note that a channel would have to be provided for collators to fetch blocks from backers as the preferred option and only recover from availability layer as the fail-safe option. | |
| Assuming the previous block data can be fetched from backers, then we comfortably have $a+b \leq 6s$, i.e. block buiding plus recoinstruciton time is < 6s. Using the current `asyncdelay` of 18s, suffices to set $x$ to 4. If the `max_candidate_depth` (m) for system parachains is set $m\leq3$, then this will reduce (improve) $x$ from 4 to $m$. Note that a channel would have to be provided for collators to fetch blocks from backers as the preferred option and only recover from availability layer as the fail-safe option. |
|
|
||
| ## Performance, Ergonomics, and Compatibility | ||
|
|
||
| The proposed changes are security critical and mitigate censorship attacks on core functionality like balances, staking and governance on Polkadot Hub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The proposed changes are security critical and mitigate censorship attacks on core functionality like balances, staking and governance on Polkadot Hub. | |
| The proposed changes are security critical and mitigate censorship attacks on core functionality like balances, staking and governance on Asset Hub. |
| | **Authors** | bhargavbh, burdges, AlistairStewart | | ||
|
|
||
| ## Summary | ||
| This RFC proposes a modification to the AURA round-robin block production mechanism for system parachains (e.g. Polkadot Hub). The proposed change increases the number of consecutive block production slots assigned to each collator from the current single-slot allocation to a configurable value, initially set at four. This modification aims to enhance censorship resistance by mitigating data-withholding attacks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This RFC proposes a modification to the AURA round-robin block production mechanism for system parachains (e.g. Polkadot Hub). The proposed change increases the number of consecutive block production slots assigned to each collator from the current single-slot allocation to a configurable value, initially set at four. This modification aims to enhance censorship resistance by mitigating data-withholding attacks. | |
| This RFC proposes a modification to the AURA round-robin block production mechanism for system parachains. The proposed change increases the number of consecutive block production slots assigned to each collator from the current single-slot allocation to a configurable value, initially set at four. This modification aims to enhance censorship resistance by mitigating data-withholding attacks. |
|
|
||
| The number of consecutive slots to be assigned to ensure AURA's censorship resistance depends on Async Backing Parameters like `unincluded_segment_length`. We now describe our approach for deriving $x$ based on paramters of async backing and other variables like block production and latency in availability layer. The relevant values can then be plugged in to obtain $x$ for any system parachain. | ||
|
|
||
| Clearly, the number of consecutive slots (x) in the round-robin is lower bounded by the time required to reconstruct the previous block from the availability layer (b) in addition to the block building time (a). Hence, we need to set $x$ such that $x\geq a+b$. But with async backing, a malicious collator sequentially tries to not share the block and just-in-time front-run the honest collator for all the unincluded_segment blocks. Hence, $x\geq (a+b)\cdot m$ is sufficient, where $m$ is the max allowed candidate depth (unincluded segment allowed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and just-in-time front-run the honest collator for all the unincluded_segment blocks
It is important to note front running is not that easy, because we are not picking the first come block we use this
Fork choice rule from here https://github.com/paritytech/polkadot-sdk/blob/63958c454643ddafdde8be17af5334aa95954550/polkadot/node/core/prospective-parachains/src/fragment_chain/mod.rs#L66, based on the hash of the candidate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is a valid point. we might be able to get away with shorter slot duration as a result of fork-choice rule enacted between backing and inclusion.
|
As @ bhargavbh is leaving, I could open another PR from https://github.com/burdges/Polkadot-RFCs/tree/multi-slot-aura or someone else could do so. It'll anyways be me handling this from the research site. |
|
I still don't see the need for this RFC, as we are only talking about increasing the slot duration. Which is a simple configuration change in the runtime. |
|
I do not mind leaving this RFC aside but we do have the question of when a parablock is too late, or too early, and that block production should obey this, even when being delayed. In particular, if our multi-block producer gets squeeze attacked then we need them to make the correct block number that fits with the current on-chain situation, not the first block they were assigned. |
This RFC proposes a modification to the AURA round-robin block production mechanism for system parachains (e.g. Polkadot Hub). The proposed change increases the number of consecutive block production slots assigned to each collator from the current single-slot allocation to a configurable value, initially set at four. This modification aims to enhance censorship resistance by mitigating data-withholding attacks.