-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discv5: topic radius consensus problem #112
Comments
Here is my latest insight about discv5 radius control which is based on intuitions mostly but it might suggest some design decisions which could make our lives easier in the future. The issue is that there is no safe and unattackable way of detecting the radius of advertisements for a certain topic. Nodes inside the "real" radius will know that they receive a significant amount of registration requests so they are probably inside but there is no way to prove this to others. My intuition tells me that there might be a natural incentive for anyone interested in advertising a topic to be inside the radius of that topic. It certainly offers a slight advantage because at least they can always advertise themselves. So at some point advertisers might start mining addresses close to the topic hash. If advertising multiple topics, they might also use multiple addresses and Kademlia tables. I think this is fine, maybe even good. DHT is an anarchistic thing where there is no way to enforce proper behavior on the global scale but local "communities" with shared interests (like nodes advertising a topic) can still function well. This is a bit like territorial behavior. If you want recent and relevant information, if you want to protect your interests then just be there. It is not the best possible group dynamic but maybe the best we can achieve at this level, without global consensus. And it is probably good enough for our purpose. Still, if address mining is cheap then this can lead to a very distorted address space with extremely high address density around popular topic hashes. So maybe this is where we should optionally use PoW to make address mining harder. Kademlia table can also function like a simple reputation system: nodes that have been in our table and functioned well for a long time can have priority to stay there. We could incorporate PoW into this priority system: nodes with a long and good history or a high enough PoW can get into the limited-size buckets of our table. In addition to the address, PoW can also reference an absolute time value before which it is invalid and after which its value starts to decrease. This means that you don't mine an address for forever, you get a chance to get into the DHT where you can then stay by keeping stable connections. This mechanism is not something we should necessarily implement in the beginning. However we should design the topic density reporting mechanism to allow implementing it later if necessary. So instead of just adding a request-reply to the protocol (asking for a topic and replying with a single bit), I suggest adding this info to the ENR and update it when it has significantly changed. Maybe just reference it as a hash because this topic info can have a significant size. It should be a limited length list of (topic, density) pairs containing the topics with the highest density. Also, unlike the original suggestion, topic density (which can be measured as the average number of registration attempts per minute) should be reported as a scalar value instead of a single bit switching at a pre-defined threshold. This is important in order to make similarity comparison possible. Reporting all important topics in the ENR is useful because it makes it harder to report different values to different peers (just like publicly displayed flags). ENRs have serial numbers, we expect to never see two different signed ENRs with the same number from the same node. We also expect SNs to not increase at an unusually quick rate, especially with many versions being inaccessible to us. Here I am not trying to specify a fraud detection heuristic but I believe this is something ENRs are absolutely good for. First we can just release discv5 implementations that follow the rules and build a crawler that displays the reported info and makes it easy for us to see how nodes normally behave and whether there is an attack and how that can be detected from the reported numbers. Then we can design better reputation and fraud detection heuristics. |
Both searching and advertising rely on the 'radius estimation' to figure out which nodes
are supposed to keep registrations for a certain topic. To estimate the radius from
received tickets, the node checks how closely the time in received ticket matches the
target-ad-lifetime
value, a protocol constant. A big issue with this approach is that all nodes must agree on a certain constant, otherwise the algorithm doesn't converge properly. Getting the radius right is important, and small implementation differences can mess up the system.I think it'd be better to estimate the radius based on the rate of successful registrations within
a region of address space. That is, instead of trying to converge on the optimum waiting time,
maintain a table of address buckets with a running average of the time it took to actually get
registered in that region.
The text was updated successfully, but these errors were encountered: