-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spread relay load across all validator pigeons #183
Spread relay load across all validator pigeons #183
Conversation
Instead of picking a single pigeon to relay messages every 10 blocks, we are now having all pigeons relay some of the messages. Each pigeon in the current valset gets a subset of the messages to relay. This should help speed up relaying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reflect: We remove the fancy collision prevention and now all pigeon gets to relay a message using the current snapshot. Where do we decide which pigeons get to relay tho? We don't do this decision by power. Correct?
Converting this to draft because I discovered some odd functionality that this makes worse. That functionality needs addressed in this PR as well When a pigeon has a failure relaying, it will attest for the message as a last ditch effort to mark it as touched. Then, there's a filter on the query for messages that prevents the pigeon from receiving it again. This means 2 things...
The right way to fix this is to mark a message as failed and just let all the other pigeons attest that yes, it failed. No need for all of them to try to relay it. I'll add that to this PR |
To reflect: Are we saying that 1 attempt to relay by 1 pigeon is enough work to be done? Did I read above correctly? If I did, here's my buidl: What if pigeons do not have homogenous RPC endpoint service providers, how do we ensure that a pigeon with a poorly performing RPC is the best effort vs. allowing at least other pigeons to attempt to relay? I agree requiring ALL 150 pigeons to attempt to relay is not efficient, but only allowing 1 pigeon to attempt to relay is overly dependent on one datacenter to perform well. Of course, if I reflected incorrectly, let me know? |
@taariq Your reflection is accurate. This switches to 1 attempt by 1 pigeon. It's a switch from "at least once execution" to "at most once execution". If a pigeon fails to relay a message, then that message is not retried and we would instead want a new message. The majority (actually, probably almost 100%) of our current errors are errors returned from the smart contract that won't succeed on retry. They have nothing to do with the pigeon. This solves that problem in a fairly simple and understandable way. We can make it more complex and implement retries by different pigeons if you'd like, but I think it should be a follow-up feature after we take these speed improvements. |
We round-robin select from the current valset. All pigeons in the valset will get an equal number of messages to relay over time. This is an intentional decision to prevent feature creep of the speed epic. We can circle back later and implement a smarter strategy for selecting pigeons once that feature has been fully vetted. |
Ideally, we really want pigeons to select based on fee and then round-robin. So yes, this will be an optimization to the structure of the messaging marketplace. However, we must always assume that pigeons are not good nor rational actors in the system. Some pigeons will fail message delivery just so they can watch the world burn. We do not slash or jail for message delivery failure. However, we can even separate failures from pigeon vs. failures from the smart contract and treat them differently. Again, another optimization that we'll need to revisit, but we don't want to block a functional system. Okay. I'm good with smart contract failure as one-attempt, at most. |
Previously, if there was an error during relay, the current pigeon would attest the message and then never touch it again. This means that for the message to get enough attestations, many pigeons had to attempt the relay. This is slow, inefficient, and no longer works with the new way of dividing up messages. Now, if a relayer hits an error message, it will send that error data back to Paloma and other pigeons can simply attest to the error message. This removes the need to attempt the relay by lots of pigeons to clear out a bad message. Other changes in this commit: * Fixed some logging calls that were missing contextual fields * Split up the queue get requests to behave differently for attesting vs relaying
@taariq This has had another commit, and therefore could use another review before I merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reflect: We no longer have all pigeons attempting failed messages. We have at most one attempt, and then we send the error details back to Paloma with the proof of the error. Querying messages in the queue now returns all messages that are not yet attested. We also have improved some error logging here. Did I miss anything?
Didn't miss anything. You're spot on! |
Related Github tickets
Background
Instead of picking a single pigeon to relay messages every 10 blocks, we are now having all pigeons relay some of the messages. Each pigeon in the current valset gets a subset of the messages to relay.
This should help speed up relaying
Additionally, there is a fix for undesired behavior that was discovered.
Previously, if there was an error during relay, the current pigeon would
attest the message and then never touch it again. This means that for
the message to get enough attestations, many pigeons had to attempt the
relay. This is slow, inefficient, and no longer works with the new way
of dividing up messages.
Now, if a relayer hits an error message, it will send that error data
back to Paloma and other pigeons can simply attest to the error message.
This removes the need to attempt the relay by lots of pigeons to clear
out a bad message.
Testing completed