Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify long-lived attestation subnet calculation to allow pre-determined calculation #3545

Closed
wants to merge 7 commits into from

Conversation

AgeManning
Copy link
Contributor

@AgeManning AgeManning commented Nov 7, 2023

We introduced a new form on long-lived attestation subnets for nodes (attnets revamp) in: #3312.

In the process we added a shuffling mechanism that shifted nodes to new long-lived subnets uniformly throughout the long-lived epoch duration to avoid all nodes transitioning to new subnets at the duration boundary.
Specifically we added:

node_offset = node_id % SUBNET_DURATION_IN_EPOCHS
permutation_seed = hash(uint_to_bytes((epoch + node_offset))// SUBNET_DURATION_IN_EPOCHS))

Which gives the current subscription calculation as:

def compute_subscribed_subnet(node_id: NodeID, epoch: Epoch, index: int) -> SubnetID:
    node_id_prefix = node_id >> (NODE_ID_BITS - ATTESTATION_SUBNET_PREFIX_BITS)
    node_offset = node_id % EPOCHS_PER_SUBNET_SUBSCRIPTION
    permutation_seed = hash(uint_to_bytes(uint64((epoch + node_offset) // EPOCHS_PER_SUBNET_SUBSCRIPTION)))
    permutated_prefix = compute_shuffled_index(
        node_id_prefix,
        1 << ATTESTATION_SUBNET_PREFIX_BITS,
        permutation_seed,
    )
    return SubnetID((permutated_prefix + index) % ATTESTATION_SUBNET_COUNT)

This does shuffle the nodes uniformly throughout the duration, but it prevents one of the main goals of this effort, which is to pre-determine the node-id prefix for which nodes should be subscribed to a subnet, which will help with discovery searches (ie we can just search for the prefix to find all nodes on a subnet).

Because we mix the entire node_id into the permutation seed, its no longer possible (feasible) to build a mapping of node_id_prefix -> subnet.

I was thinking one way to do this is just to increase the prefix-space by a few bits (3) and use this for the shuffling. This would allow 8 transitions per EPOCHS_PER_SUBNET_SUBSCRIPTION and when building a prefix mapping to subnet we'd have to store at most 512 elements (64 subnets * 8 transitions).

I'm weary of adding too much complexity here and so am hoping for ways to reduce the complexity in this function.

Ideally looking for others opinions or insights on this.

@AgeManning
Copy link
Contributor Author

Tagging people who were interested in the original modification:
@djrtwo @nisdas @mcdee @ppopth @Nashatyrev

specs/phase0/p2p-interface.md Outdated Show resolved Hide resolved
specs/phase0/p2p-interface.md Show resolved Hide resolved
@ppopth
Copy link
Member

ppopth commented Nov 23, 2023

This makes sense to me. I think there is no reason to use suffixes for anything, so using the prefixes probably always make more sense.

Comment on lines 1017 to 1019
shuffling_bits = (node_id >> (NODE_ID_BITS - ATTESTATION_SUBNET_PREFIX_BITS - ATTESTATION_SUBNET_SHUFFLING_PREFIX_BITS)) % (1 << ATTESTATION_SUBNET_SHUFFLING_PREFIX_BITS)
epoch_transition = (subnet_prefix + (shuffling_bits * (EPOCHS_PER_SUBNET_SUBSCRIPTION >> ATTESTATION_SUBNET_SHUFFLING_PREFIX_BITS))) % EPOCHS_PER_SUBNET_SUBSCRIPTION
permutation_seed = hash(uint_to_bytes(uint64((epoch + epoch_transition) // EPOCHS_PER_SUBNET_SUBSCRIPTION)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

linter error: line too long (184 > 120 characters)

Could you add more temporary variables to refactor it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do

@djrtwo
Copy link
Contributor

djrtwo commented Jan 11, 2024

I'm not sure this is a show-stopper but does this make it much easier to grind node-id's to take over a particular subnet(s)?

@arnetheduck
Copy link
Contributor

Note from consensus call: if clients start optimising kad lookups by prefix, they will no longer connect to subscribe-all-subnets nodes and others that aren't part of the "must subscribe" set.

@ppopth
Copy link
Member

ppopth commented Jan 11, 2024

Note from the consensus call as well: we have to think about the upgrade process and the fork boundary as well.

@AgeManning
Copy link
Contributor Author

I'm not sure this is a show-stopper but does this make it much easier to grind node-id's to take over a particular subnet(s)?

Might need to clarify a bit here. Now we only use 3 bits for the shuffling inbetween the EPOCHS_PER_SHUFFLING period. You can search those 3 bits to pick when you rotate in an epoch, but you still have to search one of the 64 prefixes in order to get a specific subnet for a specific time. I think this is the same case as it is currently, its easy to do.

Note from consensus call: if clients start optimizing kad lookups by prefix, they will no longer connect to subscribe-all-subnets nodes and others that aren't part of the "must subscribe" set.

Not sure about other clients, but in Lighthouse we do general searches to get peers then we retain peers that are useful to us. Long-lived subnet peers have greater value so Lighthouse still has bias towards these peers. Even once optimised for kad searches, we would randomly stumble across these nodes (as their node-id will be linked to 2 subnets) and we will try to hold on to them still. It's true we wont actively search them out, but they will be randomly discovered as we currently do.

I dont see this being a big issue.

Note from the consensus call as well: we have to think about the upgrade process and the fork boundary as well.

Yep. In Lighthouse, we already upgraded to using the first version of this (the attnets_revamp). I think most clients have also. We are waiting for a fork boundary before we implement any kind of peer scoring for this, so at this time its still optional.

I currently don't see an issue in partial upgrades as long as client teams dont downscore nodes yet.

@hwwhww hwwhww mentioned this pull request Jan 15, 2024
8 tasks
@djrtwo
Copy link
Contributor

djrtwo commented Jan 16, 2024

I currently don't see an issue in partial upgrades as long as client teams dont downscore nodes yet.

@AgeManning I'm less worried about peer-scoring and more worried about making this method even less effective for some time horizon. we'd have v0, v1, and v2 discovery. Those searcing for nodes using v1 and v2 discovery might as well just be searching for nodes using v0 (attnets enr) given that hits on v1 or v2 are likely to be split on actually working (worse than today).

Or am I missing sometihng

@AgeManning
Copy link
Contributor Author

I currently don't see an issue in partial upgrades as long as client teams dont downscore nodes yet.

@AgeManning I'm less worried about peer-scoring and more worried about making this method even less effective for some time horizon. we'd have v0, v1, and v2 discovery. Those searcing for nodes using v1 and v2 discovery might as well just be searching for nodes using v0 (attnets enr) given that hits on v1 or v2 are likely to be split on actually working (worse than today).

Or am I missing sometihng

Oh yeah. So to clarify, v0 is original ENR attnets. v1 is what is currently in the spec and v2 is this PR.

We have v1 implemented as we speak, but in terms of discovery we use v0 (randomly searching through the DHT). The intent was to use to v0 and v1 in discovery searches simulatenously because most nodes haven't shifted to the new structure. i.e a v1/v2 discovery query would be kind of optimistic, we could try, if we get enough peers quickly then all good, but other wise fallback to random searching.

I think even searching with a specific prefix for a subnet should still yield random peers as they should be uniformly distributed. As we search for all the subnets we end up searching all the prefixes.

Anyway, in my mind, the idea was to iron out this logic to enable this kind of searching, maybe use it optimistically, but ultimately wait till the next hard fork before switching over. The idea was to get all the client teams to implement this. We can still use our old discovery mechanisms, but as nodes slowly upgrade we can see a gradual change to the network structure (i.e number of nodes per long-lived subnet) and if things start to break we can fix them. Rather than everyone simultaneously switching at once on the hard fork boundary.

I think the peer-scoring and doing just the v1/v2 discovery searches should be left until after a hard fork. Implementing this PR however could (and imo should) be done before the hard fork.

@ppopth
Copy link
Member

ppopth commented Jul 29, 2024

Should this be closed? since it's already superseded.

@AgeManning AgeManning closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants