You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by mgdotson September 28, 2022
Using the document example running against a local docker container: docker run -p5672:5672 -p 15672:15672 -p5671:5671 -p15692:15692 rabbitmq:3.9.17-management, putting the server into a blocking situation causes the channel.Close() to hang and not finish.
Steps to reproduce:
Start the container with docker run above
Start the server code and watch a couple of message confirmations
In the docker image, run: rabbitmqctl set_vm_memory_high_watermark 0.00000001
Watch the server code display "Push didn't confirm. Retrying..." (let 4 or so pass)
In the docker image, run: rabbitmqctl set_vm_memory_high_watermark 0.4
Notice Push confirmations happen
Context will time out and run the queue.Close()
At this point, the code will close the done channel but will hang on the queue.channel.Close() command.
Even if we wrap this in a context timeout to allow the calling code finish, if this is a long running process, this could cause leaks over time, especially if there are multiple channels that end up in this situation due to a blocked server.
This is also not the only scenario that can case a channel.Close() call to hang.
Best practices? TCP settings?
Analysis
During a memory alarm, RabbitMQ won't read from the publisher channel; therefore, it does not send a confirmation before the client example "gives up" on the confirmation:
The problem is that the client sends a new message and does not wait for any previous "given up" confirmation. This is not correct. The documentation of Channel.NotifyPublish(), which works very similar to Channel.NotifyConfirm(), states:
It's advisable to wait for all Confirmations to arrive before calling Channel.Close() or Connection.Close().
It is also advisable for the caller to consume from the channel returned till it is closed to avoid possible deadlocks
The current implementation of the example client, does, in fact, deadlock in the situation described in the repro steps, as one Go routine is trying to deliver a confirmation, grabs a lock on the confirms struct and sends a notification to a chan amqp.Confirm, which nobody is listening to. Then, during the close sequence, Channel.Close() tries to confirms.Close(), which blocks on acquiring a lock on the confirms struct. Because nobody is receiving on the chan amqp.Confirm, this is a deadlock.
Discussed in #121
Originally posted by mgdotson September 28, 2022
Using the document example running against a local docker container:
docker run -p5672:5672 -p 15672:15672 -p5671:5671 -p15692:15692 rabbitmq:3.9.17-management
, putting the server into a blocking situation causes thechannel.Close()
to hang and not finish.Steps to reproduce:
rabbitmqctl set_vm_memory_high_watermark 0.00000001
rabbitmqctl set_vm_memory_high_watermark 0.4
queue.Close()
At this point, the code will close the
done
channel but will hang on thequeue.channel.Close()
command.Even if we wrap this in a context timeout to allow the calling code finish, if this is a long running process, this could cause leaks over time, especially if there are multiple channels that end up in this situation due to a blocked server.
This is also not the only scenario that can case a
channel.Close()
call to hang.Best practices? TCP settings?
Analysis
During a memory alarm, RabbitMQ won't read from the publisher channel; therefore, it does not send a confirmation before the client example "gives up" on the confirmation:
amqp091-go/example_client_test.go
Lines 295 to 302 in 048b5b2
The problem is that the client sends a new message and does not wait for any previous "given up" confirmation. This is not correct. The documentation of Channel.NotifyPublish(), which works very similar to
Channel.NotifyConfirm()
, states:The current implementation of the example client, does, in fact, deadlock in the situation described in the repro steps, as one Go routine is trying to deliver a confirmation, grabs a lock on the
confirms
struct and sends a notification to achan amqp.Confirm
, which nobody is listening to. Then, during the close sequence,Channel.Close()
tries toconfirms.Close()
, which blocks on acquiring a lock on theconfirms
struct. Because nobody is receiving on thechan amqp.Confirm
, this is a deadlock.Go routines dump (only relevant two):
The text was updated successfully, but these errors were encountered: