Increase power table lookback to detect/mitigate power spikes #876

lucaniz · 2023-12-04T11:01:46Z

lucaniz
Dec 4, 2023
Collaborator

We propose to set power table lookback to 7 days. We invite comments and feedbacks from the community as early as possible, before proposing a formal FIP with a stabilized value.

Motivation

Currently there is no concerning security issue in the Filecoin network. Nevertheless, in case of an hypothetical severe security issue, we want to be prepared to preserve Filecoin.

In particular, we want to have time to react to an attempt of network takeover due to adversarial spikes of powers without compromising consensus security.

Assuming an adversary can fake power and gets noticed, how can we enable the possibility to put in place concrete countermeasures to mitigate the issue? First thing to ensure is to have enough time to put any countermeasure in place.

Today consensus power is granted at least ChainFinality (=900) blocks after a sector is onboarded onchain via ProveCommit.

Technically, a sector is activated right after the first WindowPoSt, which happens within 24h after ProveCommitis finalized. Nevertheless, power is not granted right after the first WindowPoSt.

Indeed, at each epoch t, leader election protocol selects SPs proportionally to their quality adjusted power at epoch t- ChainFinality. This means that the minimum delay in power activation is indeed 900 blocks after ProveCommit (note that this happens if WindowPost happens right after ProveCommit. Any time window elapsed from ProveCommit to the first WindowPost defers sector power acquisition accordingly).

This translates in having a small (~ 7h) window of time to react to any adversarial power spike without compromising consensus security.
Thus, if we want to be sure the time we have to react to a major security issue is long enough, we need to decouple power table lookback and ChainFinality, setting it in an independent manner.

We identify power table lookback = 7 days to be the ideal window of time for sector power activation.

Protocol Specification

Change function GetWinningPoStSectorSetLookback and set the output value to EpochsInOneWeek
Adjust the parameter in (*Miner).mineOne() logline

Reference to the code here

Impact on Filecoin

Sector power is deferred by 7 days wrt today (where we have a minimal of 900 epochs delay between sector onboarding and sector acquiring power).

This power will be only shifted by 7 days (not lost). Indeed, after termination a sector will retain power for 7 days for what regards Leader Election protocol.

Sectors need to be proven via WindowPost for the initial 7 days even if they won't have power. Similarly, they won't need to be proved for the 7 days after termination, while keeping the power. This means that WindowPost is required for the lifetime period on the sector (without extra proving overhead).
That said, it is possible that expired sector counting in the power allocation are challenged at WinningPoSt. This means that in order to be sure to be able to answer WinnignPoSt challnges in the first 7 days after expiration, expired sectors should be stored.

We think that this extra storage effort is not a dealbreaker, considering the entire sector lifetime (assuming 3.5y of sector lifetime, we are talking about an additional storage cost of 0.5% overall). On the other hand, such a change would make way more secure than it would be if this change would not be put in place.

jsoares · 2023-12-04T11:30:17Z

jsoares
Dec 4, 2023
Maintainer

I plan to read this in detail when I'm not on my phone, but, in addition to the F3 discussion to which you pointed (where we've gone back and forth on dropping the 900 parameter entirely), there's also related work by @guy-goren on computing EC finality (doc). No specific conflict here, I think, but just making sure the two teams are aware of ongoing work.

1 reply

jsoares Dec 4, 2023
Maintainer

@anorth already mentioned it below, but current stance on keeping 900 epochs is here.

anorth · 2023-12-04T21:24:46Z

anorth
Dec 4, 2023
Maintainer

Finality is currently set to 900 epochs, but willing to be reduced significantly, as per the new proposal regarding Filecoin consensus

Current fast finality (F3) proposals will not reduce the power table lookback number. That number should not be named finality because it is not a true deterministic finality either now or after F3. The fast finality proposals aim to make minimal changes to EC for many reasons (the potential risk here being but one). So nothing will change except that honest nodes will not extend sibling chains of truly finalised blocks.

There still might be reasonable motivation to extend 900 epochs to 7 days, but they should be motivated without reference to finality.

8 replies

jsoares Dec 5, 2023
Maintainer

Sorry, that was possibly the stupidest possible phrasing for my question. Let me try again.

I understand that's what you're proposing but it was also pined on the assumption was that ChainFinality would be reduced with F3. In the meantime, we also agree that ChainFinality shouldn't be called ChainFinality.

So the real question is: is this the only thing that should be decoupled from ChainFinality and extended, or are there other current uses of that parameter that should change accordingly?

This is probably out of scope for this discussion, but just worth considering before we introduce another single-use constant.

lucaniz Dec 5, 2023
Collaborator Author

We propose to decouple extend GetWinningPoStSectorSetLookback and ChainFinality in any case, and setting GetWinningPoStSectorSetLookback to 7 days (i.e. even if what is called ChainFinality stays the same [i.e. 900 epochs] this would not change the proposal).

My understanding was that F3, by introducing fast finality, would have also reduced what is called ChainFinality, adding one reason to the decoupling of the two parameters.

If this is not the case, as @anorth pointed out, the proposal is still there, in the same way, for the same reasons, just not having the F3 dependency.

So the real question is: is this the only thing that should be decoupled from ChainFinality and extended, or are there other current uses of that parameter that should change accordingly?

good question, but i think this is out of scope for this discussion :)

anorth Dec 5, 2023
Maintainer

is this the only thing that should be decoupled from ChainFinality and extended, or are there other current uses of that parameter that should change accordingly?

Everything that currently refers to chain finality is suspect and should be revisited as we develop F3. I suspect that most of them actually want the power table lookback, rather than the point beyond which no forks are possible. This is something for the F3 FIP though, which should probably enumerate all references to finality in actors and chain validation and make a decision.

lucaniz Dec 13, 2023
Collaborator Author

@anorth I removed the reference to F3. Thanks for spotting the inconsistency.

anorth Feb 29, 2024
Maintainer

Thanks. I have taken the liberty of changing the title of this discussion to remove references to finality, and instead just talk about increasing the power table lookback.

dd45e640b42e6da7da96faee3996ef7c · 2023-12-07T06:16:54Z

dd45e640b42e6da7da96faee3996ef7c
Dec 7, 2023

We identify power table lookback = 7 days to be the ideal window of time for sector power activation.

how? and why is 7 days "ideal"? what does "ideal" mean here?

That said, it is possible that expired sector counting in the power allocation are challenged at WinningPoSt. This means that in order to be sure to be able to answer WinnignPoSt challnges in the first 7 days after expiration, expired sectors should be stored.

7 days is almost 4% of a sectors minimal lifetime. the need to store (and potentially prove) that looks costly from the miner side.

1 reply

lucaniz Dec 13, 2023
Collaborator Author

how? and why is 7 days "ideal"? what does "ideal" mean here?

The main point of this is having time to react with an emergency plan before the attack takes place (i.e. before power spikes are "usable" in consensus) in case of a black swan event (which is NOT in the horizon today).
Currently this delay is ~ 7.5 hours, which is too little to

realize an attack is taking place
alert the network
ship an emergency plan (which could consist in an ad hoc network upgrade with a full set of potential contermeasures, depending on the severity of the issue)

After running some conversations we think that 1 week is a long enough time to realize all the spetps mentioned above smoothly.

7 days is almost 4% of a sectors minimal lifetime. the need to store (and potentially prove) that looks costly from the miner side.

Note that with latest extention to 3.5 years, we are talking about 1 week out of ~ 180 weeks, which is almost 0.5%.
Moreover, note that we are talking about storage overhead and not proving overhead as proving (i.e. WindowPost) will be required until termination and not in the additional week where sector are already terminated but still eligible due to power table lookback.

While we understand it would be better to have 0 storage overhead (which, btw, is not what we have even today as we have 900 epochs), we think this is an acceptable compromise to be covered in case of a major security issue.

dd45e640b42e6da7da96faee3996ef7c · 2023-12-07T06:24:02Z

dd45e640b42e6da7da96faee3996ef7c
Dec 7, 2023

i am not deep enough into the details of the code - but does this have any implications for snapping? especially for sectors that change QAP while snapping data?

asking because i see this as the most critical, instant way to facilitate a hostile power takeover on the network

another thing that comes to mind in that direction are termination fees.

2 replies

nicola Dec 8, 2023
Collaborator

This is taken into account by this FIP. When you snap, the power update delay will still be 7 days.

lucaniz Dec 13, 2023
Collaborator Author

We are not aiming to change anything regarding the mechanism of power table lookback. We are just proposing a different parameter. The only thing that would change would be the following:

today: power table lookback is 900 epochs. This means that at epoch t the power taken into account is the power at epoch t - 900 epochs. As a result, every time you snap, your "updated" power takes effect with 900 epochs delay, and is maintained 900 epochs after sector terminates.
After this change proposal: power table lookback is 7 days. This means that at epoch t the power taken into account is the power at epoch t - 7 days. As a result, every time you snap, your "updated" power takes effect with 7 days delay, and is maintained 7 days after sector terminates.

Termination fees are not touched by this proposal.

anorth · 2024-02-29T22:58:51Z

anorth
Feb 29, 2024
Maintainer

I have a contrary take on this issue. I do not think that there is a problem worth solving and, in case we can be convinced that there, is I think there are better ways to solve it.

The discussion is framed around an "adversarial" spike in power. I don't believe that the protocol can define such a thing, nor could participants necessarily judge it either. The security of the chain is based in the resources committed to it. It must be secure from those resources – if rapidly adding more resources could be a problem, we should address that more directly in protocol design. The discussion focuses on rate of addition of resources, and provides for some advance warning style of alerting mechanism that could (in theory) trigger human intervention. I acknowledge that appealing to "social consensus" is sometimes the only recourse for some disastrous events, but designing toward making it easier and more likely for external intervention to interfere with the protocol rules runs counter to the decentralised and autonomous goals of a protocol like this.

No problem worth solving

If I understand correctly, the supposed problem might be framed as "what if someone commits lots of resources to the network with the aim of damaging it". This has to be irrational at least to the first order, or the fundamental security model is flawed. Adding more resources must increase a participant's incentive alignment with the network. If a malicious party really does want to pay the high cost to attack it, there's not really anything any network can do. Security is rooted in this incentive alignment, and there is some cost which which any network can be attacked. We can, of course, design the protocol to make that cost of attack high and this is a problem worth solving.

Filecoin is secured by both economic stake and physical storage commitments. But consider pure proof-of-stake blockchains, such as Ethereum. In these chains, the only cost of consensus power is stake. To my knowledge, no such chains have a mechanism affording advance notification of future increases in stake, with the purpose of enabling manual operator intervention in the protocol. Why not? Either their teams are also all confident in their incentive security model, or we have discovered something important that none of them have.

Filecoin is naturally in an even better position, because not only does an attacker need to acquire 1/3 of all stake, they also need to provide a significant amount of hardware. The balance of sealing vs storage hardware depends on the rate at which they intend to build power (faster -> more sealing throughput).

Note that in Filecoin and other networks, the "easiest" attack is to acquire power from participants who already have it, say by bribing them. Building it from scratch requires larger total commitments because an attacker needs ⅓ of the resulting stake/hardware, which means they need to add ½ of the starting amount. Note also that such a bribing attack would go completely undetected by any warning system based on growth in committed resources anyway.

Solve it differently

The proposal is based around observation of the rate of commitment of resources. It is reasonable to believe that the cost of an attack is related to the duration for which resources are committed. So even given a basically sound model, perhaps permitting a fast rate of power growth makes an attack uncomfortably cheap.

If this is the case, we should simply and directly address this in the protocol by limiting the maximum rate of growth. It's not effective to limit individual participants (generally anonymous), but we can limit the network as a whole to a rate which would give other participants ample time to observe and respond to rapid growth – much longer than the 7 days proposed.

Such a mechanism would resemble the stake churn limits of POS protocols. E.g. read about Ethereum's entry queue limits in EIP-7514 and discussion here and here.

Filecoin power growth could similarly be limited to, say, support doubling in size at most every 60 days. This would give a huge 60-day "advance warning" of a rapidly growing stake, while still exceeding the network baseline function by 6x (so not effectively limiting network growth). With a long window, we could more reasonably hope that participants would respond with their own onboarding, rather than appealing to a centralised network halt.

A built-in protocol rule like this could directly prevent exploitation of any perceived rate-related weakness in the incentive security model. An independent mechanism decoupled from proof-of-storage or consensus rules would be simpler. It would avoid introducing new problems associated with the lookback (e.g .that end-of-life sectors enjoy power with nothing at stake).

As an aside, an economic mechanism like dynamic onboarding fees (#587) could increase the cost of rapid accumulation of power high enough to make attack irrational again without need for any limit.

In summary:

there's no problem with the basic incentive-alignment security model
if there was, the most rational attack of bribing existing participants is undetectable anyway
if a fast rate-of-growth reduces the attack cost too much, we should just directly limit the growth rate
designing for manual intervention isn't a good decentralised protocol direction

1 reply

irenegia Mar 1, 2024
Collaborator

I agree with 99% of what @anorth says here!

But let me comments on some (minor) points:

If I understand correctly, the supposed problem might be framed as "what if someone commits lots of resources to the network with the aim of damaging it".

In my opinion with the higher PT lookback we want to be more general and have a mechanism in place to react to someone commits lots of resources even if we do not know if this is adversarial or not.
As you said some one can pay a high price to get some controll over the network (even with good purpose) and we still would like to have the time to argue about this behaviour.

This has to be irrational at least to the first order, or the fundamental security model is flawed.

Yes, but remember that we can argue that some attack is irrational only starting from some assumptions (eg, a minimum for the token price, etc). If some assumptions are not met by a future scenario, the irrational attacks become cheaper (and maybe rational).
The longer lookback does not have this problem.

In summary:
there's no problem with the basic incentive-alignment security model
if there was, the most rational attack of bribing existing participants is undetectable anyway
if a fast rate-of-growth reduces the attack cost too much, we should just directly limit the growth rate
designing for manual intervention isn't a good decentralised protocol direction

Yes to everything but note that to have a 100% green light on the first point ("there's no problem with the basic incentive-alignment security model"), we should first solve the pledge bug (see #847 ).

So my conclusion is: longer power table lookback is nice-to-have feature, we should do it if the implementation effort required is small (not worth it otherwise), but be aware that, as anorth pointed out, it does not solve any problems about "spikes of power" attacks being possible (these kind of attacks should be possible/rational in most cases despite of the lookback).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase power table lookback to detect/mitigate power spikes #876

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 13 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Increase power table lookback to detect/mitigate power spikes #876

lucaniz Dec 4, 2023 Collaborator

Motivation

Protocol Specification

Impact on Filecoin

Replies: 5 comments · 13 replies

jsoares Dec 4, 2023 Maintainer

jsoares Dec 4, 2023 Maintainer

anorth Dec 4, 2023 Maintainer

jsoares Dec 5, 2023 Maintainer

lucaniz Dec 5, 2023 Collaborator Author

anorth Dec 5, 2023 Maintainer

lucaniz Dec 13, 2023 Collaborator Author

anorth Feb 29, 2024 Maintainer

dd45e640b42e6da7da96faee3996ef7c Dec 7, 2023

lucaniz Dec 13, 2023 Collaborator Author

dd45e640b42e6da7da96faee3996ef7c Dec 7, 2023

nicola Dec 8, 2023 Collaborator

lucaniz Dec 13, 2023 Collaborator Author

anorth Feb 29, 2024 Maintainer

No problem worth solving

Solve it differently

irenegia Mar 1, 2024 Collaborator

lucaniz
Dec 4, 2023
Collaborator

Replies: 5 comments 13 replies

jsoares
Dec 4, 2023
Maintainer

jsoares Dec 4, 2023
Maintainer

anorth
Dec 4, 2023
Maintainer

jsoares Dec 5, 2023
Maintainer

lucaniz Dec 5, 2023
Collaborator Author

anorth Dec 5, 2023
Maintainer

lucaniz Dec 13, 2023
Collaborator Author

anorth Feb 29, 2024
Maintainer

dd45e640b42e6da7da96faee3996ef7c
Dec 7, 2023

lucaniz Dec 13, 2023
Collaborator Author

dd45e640b42e6da7da96faee3996ef7c
Dec 7, 2023

nicola Dec 8, 2023
Collaborator

lucaniz Dec 13, 2023
Collaborator Author

anorth
Feb 29, 2024
Maintainer

irenegia Mar 1, 2024
Collaborator