-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Published messages can be dropped silently #217
Comments
Note this is already available in #184 |
woa I noticed this yesterday in a completely unrelated thing. What I saw is that hammering gossipsub results in messages seemingly not sent around (or dropped on receive). |
TLDR: This the proposal does not fully solve this problem, but it's not clear it has to. If we want to more fully solve this problem there are more things we can talk through. @hsanjuan, yep there are some things we can do here but there are some protocol level problems to tackle if we actually wanted to "solve" this problem (see: #197 for more about the problem). For example, even if
As was mentioned in libp2p/go-libp2p#721 the current answer to dropped messages is to say "pubsub does not guarantee message delivery, if you want that layer something on top of pubsub", but that if we have enough users asking for guaranteed message delivery we should probably come up with something for them. I don't think this response is super unreasonable and we already have some solutions (e.g. go-libp2p-pubsub-router) that handle the case of guaranteed message delivery. However, there are some protocol extensions we could look into like sending an ACK to the peer that sent us a message when once we've sent the message out or we failed/are out of peers. |
Pubsub indeed does not guarantee message deliverability in terms of distributed behaviour, but in terms of local behaviour, it should absolutely guarantee that a message you intended to publish has actually left your box. |
A lot of this problem can be solved by not hardcoding some channels' length to things like 32 (setting way higher, configurable defaults to give some space for bursts), and by complaining loudly when a message is dropped because channels are full. Also, more documentation about the implementation. |
@raulk @aschmahmann I completely understand what we want to accomplish here. Please can I start working on the |
@aarshkshah1992 fine by me, not sure if @raulk has any thoughts/concerns here. |
Adding a thought about the In the current PR, the However, this is very tricky to achieve if we don't want the writer/event loop to block or spin up a go-routine to do a blocking write because we could end up a dropping a "failure event" & the publish would wrongly conclude that everything is okay. So, we might have to trade off this strong behaviour for something lighter if we want to avoid all forms of blocking. @aschmahmann might have some opinions on this. |
@raulk I'm running into some problems with this in go-libp2p-pubsub-router which builds a persistent value store on top of pubsub. I'd like to solve the "messages can be dropped silently" by having pubsub emit an event that we dropped a message.
Sounds good, although it'd be really nice if the merge function was configurable instead of just concatenating the messages. @vyzo this seem like a reasonable plan? |
Currently if the outbound message queues for all peers in a mesh/fanout are full, gossipsub drops the message silently, which has already bitten us various times (in stress testing, and during actual usage).
This happens here: https://github.com/libp2p/go-libp2p-pubsub/blob/master/gossipsub.go#L359
Proposal
Extend the signature of
Publish()
so it can take functional options to configure the reliability characteristics we want.WaitUntilQueued(One | All | float64)
=> blocks the publish until the message has been queued to as many peers as the arg indicates. Can be implemented easily.WaitUntilSent(One | All | float64)
=> blocks the publish until the message has actually been dispatched to as many peers as the arg indicates. Requires deeper changes, but reliability is higher.Aggregate RPCs waiting in the queue. Messages are associative and can be merged, in theory.
Make queue size configurable. It's currently hardcoded to 32, and cannot be changed.
Affected users
Filecoin, ETH2 gossipsub testing.
The text was updated successfully, but these errors were encountered: