-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeferredConfirmation.Wait() can hang indefinitely #182
Comments
Extremely unlikely!
Much more likely! Thanks for the code and detailed information. We'll take a look. |
I've found that, if I add a I'm wondering whether the confirmation message is somehow getting dispatched to the wrong channel, causing the confirmation on the intended channel to never get the confirmation? Here's what I changed: func (d *deferredConfirmations) Confirm(confirmation Confirmation) {
d.m.Lock()
defer d.m.Unlock()
dc, found := d.confirmations[confirmation.DeliveryTag]
if !found {
log.Printf("Confirmed unpublished tag: %d\n", confirmation.DeliveryTag)
// We should never receive a confirmation for a tag that hasn't
// been published, but a test causes this to happen.
return
}
dc.setAck(confirmation.Ack)
delete(d.confirmations, confirmation.DeliveryTag)
} Or perhaps more helpful, when I panic instead of merely logging the output, I get the following:
|
After some more determined testing, my hypothesis about messages dispatched to incorrect channels was false. It appears to be a timing issue, in which the goroutine doing publishing gets interrupted between the sending the basicPublish ( My fix creates the The only alternative I can think of would be to lock the reading part / message dispatch to the channel until the publish / confirmation creating is done. In retrospect, I must admit that the code which can reproduce the error above was always tested against a local instance (although on different machines). This issue is probably even less likely to come up in production instances (I suppose?) due to network latency. |
Thank you @calloway-jacob! I've been working on customer issues, and will review this soon. Have a great weekend ☘️ |
@calloway-jacob that's interesting. So the publish/confirmation happens so fast it "beats" the call to |
@lukebakken That's what appears to be happening. I determined this by putting some log statements in a few places and passing the channel id into the confirmations code to test the hypothesis of channel mismatch was true (thankfully, it wasn't). With the logging (erased of course in the commit) I was able to see that sometimes the confirm would be called before the |
@calloway-jacob I wish every OSS user took the time like you did to dig into an issue. Thanks a lot. |
I've noticed that
DeferredConfirmation.Wait()
can hang indefinitely under some circumstances. Some experimentation has revealed that RabbitMQ is receiving the published message (and delivering it to any queues/consumers), but the publishing channel does not always get a confirmation.I have not yet been able to isolate whether RabbitMQ is indeed not confirming the message, or whether I might have found an issue in this library that is only revealed under particular circumstances. I have a packet capture which would likely reveal this, but I am not knowledgeable enough to analyze it fruitfully.
This behaviour can be reproduced by creating multiple channels on a single connection, each in confirm mode, and proceeding to publish rapidly on each of them. I'm publishing using
Channel.PublishWithDeferredConfirmWithContext()
, followed byWait()
on the returnedDeferredConfirmation
.I'm aware that I can pass a context with a timeout to work around this, and indeed that's the only thing that reliably works. However, I then have to deal with the possibility of a message getting published multiple times (when I retry).
Things that have not worked:
Channel.NotifyPublish
in addition toDeferredConfirmation.Wait()
Things that do work:
DeferredConfirmation.Wait()
and only useChannel.NotifyPublish()
I have a program I've used to demonstrate the issue. The default settings usually cause at least one indefinite hang on
DeferredConfirmation.Wait()
, or at least it has done so using at least 2 servers I've tried (one Docker container, one installed via package manager).Code:
The text was updated successfully, but these errors were encountered: