azservicebus Receiver can improperly hold onto message indefinitely #23893
Labels
Client
This issue points to a problem in the data-plane of the library.
customer-reported
Issues that are reported by GitHub users external to the Azure organization.
needs-team-attention
Workflow: This issue needs attention from Azure service team or SDK team
question
The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Service Attention
Workflow: This issue is responsible by Azure service team.
Service Bus
Bug Report
github.com/Azure/azure-sdk-for-go/sdk/messaging/azservicebus
63cbd1af5450e56dbdd824fe82bd4af4835d8e40
go version go1.23.1 darwin/arm64
What happened?
I was observing that occasionally, when receiving messages, a small percentage of messages would occasionally be in "limbo": the SDK had not returned them to the caller, but they could not be received as servicebus reported them as having been received. After the message delivery lock expired, the server would redeliver them and they would be successfully received. I suspected that the SDK was receiving them but not correctly returning them to the client.
What did you expect or want to happen
I expect that all messages received by the SDK are returned to the client in a timely fashion.
How can we reproduce it?
My reproduction has two components:
Producer Code
Consumer Code
It's worth noting that in my testing, the behaviour appears inconsistently, maybe 20-30% of the time.
Running the consumer and running the producer separately (once the consumer had finished processing the previous batch) produced the following output in one test:
You can see that this usually works as expected: the full batch of 3000 is received and acknowledged. However, on the third batch, only 2998 messages were received, and they weren't received until a minute later, when the message lock expires and the server redelivers the messages.
Why does this happen?
I created a local copy of the SDK and started investigating. I believe this error is due to an error in
newReleaserFunc
:azure-sdk-for-go/sdk/messaging/azservicebus/receiver.go
Lines 610 to 630 in bcb396b
The message is pulled from the internal
Receiver
's queue successfully, but the context is cancelled by the next call toReceiveMessages()
after the message is received, so theReleaseMessage()
call fails with a context canceled error, and the message is never released.I confirmed this by adding the following after that call:
I then observed the following output:
I'm not sure what the fix for this is - perhaps the
cancelReleaser
function should return an additional (optional) value of a message that was received but unsuccessfully deleted? I tried the simple fix of replacing theReleaseMessages()
context withcontext.Background()
(using a different error so the error doesn't bleed into theerr
that controls exiting the function) but unsurprisingly that made the receive loop quite slow as the release couldn't be cancelled.The text was updated successfully, but these errors were encountered: