Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Implementer's guide: Assignments counting procedure#1930

Closed
burdges wants to merge 8 commits intoparitytech:masterfrom
w3f:jeff-approval-voting-1
Closed

Implementer's guide: Assignments counting procedure#1930
burdges wants to merge 8 commits intoparitytech:masterfrom
w3f:jeff-approval-voting-1

Conversation

@burdges
Copy link
Contributor

@burdges burdges commented Nov 8, 2020

Alters the assignments counting procedure from #1691 to take at least one tranche per no-show. This changes a few things:

  • We're now forced to keep expected assignees per tranche small, like 1-2, except for tranche zero. We planned this anyways because if prevents an adversary from bruit forcing their bad assignees into having lower tranches than many honest assignees.
  • We consume more assignees resources but only slightly more and only under niche scenarios.
  • We're much less vulnerable to an adversary with as nearly unlimited capability to DoS honest assignees: Imagine they wait until they've many bad assignees into low tranches and then just DoS all the honest ones, making them no-shows. We previously counted these bad assignees towards replacing the honest DoSed no-shows, but now we add an expected 1-2 honest checkers per tranche aka per no-show.

I needed to write check_approval and tranches_to_approve as basically one method advance_assignee_status in the pseduo-code #1558 so they'll turn out similar here I think, but right now this suffices I guess.


@rphmeier There is another "iterating VRFs for no-shows" scheme that's stronger under attack by a powerful DoS adversary: We could cover no-shows, not by using additional tranches, but by running fresh VRFs whose story is the no-show, meaning this second (third) order VRF input is the VRF output of the no-show, and it returns a tranche.

In this, adversary knows literally nothing about the no-show checks until they happen, which makes DoS attacks almost worthless, excepting in conjunction with relay chain equivocation of course. Yet, there are several serious down sides for this iterating VRFs scheme:

  1. We'd replace honest no-shows very slowly because we've no analog of the fat zeroth tranche here, so like maybe a single no-show delays finality for minutes!
  2. We'd complicate the code with yet more gossip messages, etc. I know tranches_to_approve looks complex, but it's a single loop for which we can write extensive unit tests, while this iterating VRFs requires more complex tests that cover the gossip messages, etc.
  3. We might address 1 with a VRF bidding scheme where anyone with low results sends a message, but this makes 2 much much worse. And it worsens our relay chain equivocation situation somewhat.

I'm therefore proposing the "simpler" scheme in this PR instead of iterating VRFs for no-shows.

We'v e two purposes for this, so best to mention them for clarity.
Actually check_approval should be merged into tranches_to_approve I think.
* return `n_tranches`
1. First, set `t := session_info.needed_approvals`. Set the base tranche `l=0`.
2. Take assignments from tranches `l..` until we have at least `t` assignments. Let `k` denote the highest tranche taken. Count the number `assigned` of assignments taken in tranches `0..k`. If `assigned < t` then return a special value `ALL` which indicates we wait for more assignments.
3. If `assigned > 128` then return the candidate as unfinalizable and advise block production to build another fork from the inclusion relay parent. This condition can be reverted if some no-shows turnning up eventually.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could discuss omitting this check for now if anyone is worried about it adding complexity. We're approaching breaks in underlying security assumptions if assigned gets so large anyways.

#### `import_checked_assignment`
* Load the candidate in question and access the `approval_entry` for the block hash the cert references.
* Ensure the validator index is not part of the backing group for the candidate.
* Ensure the validator index is not part of the backing group for the candidate. We count late backing votes via the backing system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, at the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove so this represents the current goal, or use future/conditional tense to represent later plans?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removal

* return `n_tranches`
1. First, set `t := session_info.needed_approvals`. Set the base tranche `l=0`.
2. Take assignments from tranches `l..` until we have at least `t` assignments. Let `k` denote the highest tranche taken. Count the number `assigned` of assignments taken in tranches `0..k`. If `assigned < t` then return a special value `ALL` which indicates we wait for more assignments.
3. If `assigned > 128` then return the candidate as unfinalizable and advise block production to build another fork from the inclusion relay parent. This condition can be reverted if some no-shows turnning up eventually.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If assigned > 128 then return the candidate as unfinalizable

This should say to return ALL.

advise block production to build another fork from the inclusion relay parent

Omit this - it's underspecified at the moment. Maybe add a TODO and a GitHub issue for blacklisting those forks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this entirely I think and just create an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. First, set `t := session_info.needed_approvals`. Set the base tranche `l=0`.
2. Take assignments from tranches `l..` until we have at least `t` assignments. Let `k` denote the highest tranche taken. Count the number `assigned` of assignments taken in tranches `0..k`. If `assigned < t` then return a special value `ALL` which indicates we wait for more assignments.
3. If `assigned > 128` then return the candidate as unfinalizable and advise block production to build another fork from the inclusion relay parent. This condition can be reverted if some no-shows turnning up eventually.
4. Count the number `noshows` of no-shows in tranches `l..k`. If `noshows` is zero then return success with `n_tranches := k`. Of course this happens early and does not indicate final termination, as we may later return `ALL` after more no-shows, but if all these assigned checkers vote valid then we are done.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course this happens early and does not indicate final termination, as we may later return ALL after more no-shows, but if all these assigned checkers vote valid then we are done.

This sounds like logic beyond the scope of the function - check_approval is the part that checks approval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a remark that this function is non-monotonic. It can increase, and then decrease (and return approved once it gets merged with check_approval again).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the language a bit? Right now it sounds sort of like the function can return more than once, whereas what we actually want to get across is that it is non-monotonic.

2. Take assignments from tranches `l..` until we have at least `t` assignments. Let `k` denote the highest tranche taken. Count the number `assigned` of assignments taken in tranches `0..k`. If `assigned < t` then return a special value `ALL` which indicates we wait for more assignments.
3. If `assigned > 128` then return the candidate as unfinalizable and advise block production to build another fork from the inclusion relay parent. This condition can be reverted if some no-shows turnning up eventually.
4. Count the number `noshows` of no-shows in tranches `l..k`. If `noshows` is zero then return success with `n_tranches := k`. Of course this happens early and does not indicate final termination, as we may later return `ALL` after more no-shows, but if all these assigned checkers vote valid then we are done.
5. For each no-show in `noshows`, we require both another checker and another tranche, which ever means more tranches. Take assignments from at least `no_shows` subsequent tranches and then if we have not yet covered all noshows then continue taking tranches until we do cover all no-shows. e.g. if there are 2 no-shows, we might only need to take 1 additional tranche with >= 2 assignments. Or we might need to take 3 tranches, where one is empty and the other two have 1 assignment each.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should address the "we run out of tranches to take, having not received any assignments past a certain point" case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was 3 which I just removed. If we run out of tranches without 3 then every validator is assigned to check the block. Yes I suppose we should say so. If we run out of tranches with 3 then we've decided this block sucks and we're going to fork the chain or something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to know how many tranches to take, but remember that this is from the perspective of a single node executing these algorithms, and this node may simply have not received those assignments yet.

#### `check_approval(block_entry, approval_entry, n_tranches) -> bool`
* If `n_tranches` is ALL, return false
* Otherwise, if all validators in `n_tranches` have approved, return `true`. If any validator in these tranches has not yet approved but is not yet considered a no-show, return `false`.
* Otherwise, if all validators in `n_tranches` have either approved or been replaced as a no-show, then return `true`. If any validator in these tranches has not yet approved but is not yet considered a no-show, return `false`.
Copy link
Contributor

@rphmeier rphmeier Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't have information to understand replacement. This function is invoked after the replacement procedure has selected extra tranches. I could see it gaining an extra parameter for tolerated_missing_approvals. We would also need to alter tranches_to_approve to return a (n_tranches, n_replaced) tuple, and supply tolerated_missing_approvals = n_replaced in the call site of check_approval. Then we would not approve if there are any missing approval votes in those tranches beyond the no-shows we've already accounted for.

Copy link
Contributor Author

@burdges burdges Nov 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned elsewhere, I needed tranches_to_approve and check_approval to be the same function to have everything in one place. You cannot count tranches for no-shows without knowing the current approved votes either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They aren't the same function in this PR, though. So they would need to be unified

We're under a DoS attack in this case, or maybe a bad parachain, so
we shoiuld never finalize the block until no-shows actually respond.
It's hard to explain telling block production to take another path
right here though.
#### ` assignees_status(approval_entry) -> AssigneeStatus`
* Summarise our view of this approval entry's run by iterating over assignment and approval vote records
1. First, set `needed := session_info.needed_approvals`. Set the base tranche `l=0`.
2. Take assignments from tranches `l..` until we have at least `needed` assignments or hit our timeouts. Let `tranches` denote the highest tranche taken (plus one).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "hit our timeouts" mean?

1. First, set `needed := session_info.needed_approvals`. Set the base tranche `l=0`.
2. Take assignments from tranches `l..` until we have at least `needed` assignments or hit our timeouts. Let `tranches` denote the highest tranche taken (plus one).
3. Count the number `assigned` of assignments taken in tranches `0..tranches`. If `assigned < needed` then return a special value `PENDING` which indicates we wait for more assignments.
4. Count the number `approvals` in tranches `0..tranches`. Also, count the number of `noshows` of no-shows in tranches `l..tranches`. If `noshows` is zero then return `DONE(approvals,assigned,needed,tranches)`. Of course, this indicates potential approval only if `approvals == assigned` and we hide the assignment timeout for `tranches`. In fact, this is an over simplification since we care about arrival times, so counting tranches does not suffice here. If `approvals < assigned` then more no-shows could occur on future invokations, returning us to `PEMNDING`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't merge over-simplification into the implementer's guide!

#### `check_approval(block_entry, approval_entry, n_tranches) -> bool`
* If `n_tranches` is ALL, return false
* Otherwise, if all validators in `n_tranches` have either approved or been replaced as a no-show, then return `true`. If any validator in these tranches has not yet approved but is not yet considered a no-show, return `false`.
* Invoke `assignees_status` and then check if `approvals == assigned == needed` or if `needed >= num_validators` then we ask that `3 approvals > 2 num_validators`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3 * approvals > 2 * num_validators thing is new? Is num_validators the total number of validators?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's just an integer-safe expression of the inequality approvals/num_validators > 2/3.

Copy link
Contributor

@rphmeier rphmeier Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just not something that was described in the previous writeup. But Jeff & I spoke about this yesterday in more detail. It's an escape hatch for if we seem to be requesting a very large amount of assignments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll do something better here later, but some 2/3 condition is simple and suffices for an MVP I think. I also like this "worse" condition for initial testnets because if we've some bug that causes a checker explosion then this make it show up more in testing.

@burdges
Copy link
Contributor Author

burdges commented Nov 16, 2020

I've updated the description to far more closely resemble tracker.rs in https://github.com/paritytech/polkadot/pull/1558/files which I also updated for the new no show scheme.

I'm worried this new description is a "bit too efficient" and while the text I removed was vague it maybe communicated important aspects of the goal. The gap is discussed in db69ad9#diff-588e17953264507ee591a633edc92d8dd229f9da717eef8a65245396ab1e12d4R249

@burdges
Copy link
Contributor Author

burdges commented Nov 28, 2020

#1972 is better

@burdges burdges closed this Nov 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants