-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Description
Describe the bug
During periods when a large number of server side errors occur, we sometimes see messages getting "stuck" in queues. As in, they hang in the queues for the configured message lock timeout before being redelivered.
After some attempts to reproduce with a smaller example, I have found that, in certain scenarios, when calling MessageReceiver.CloseAsync(), the inner ReceivingAmqpLink is not actually closed. So, the link is sitting there in the background, continually picking up messages
The only way I was able to get this to reproduce is by closing & opening a new receiver when receiving an error on the ExceptionHandler. My best guess to why this issue occurs; when the inner link faults, it will auto-recover in OnReceiveAsync(). Perhaps there is a race condition with auto recovery & closing the receiver at similar times.
Of course, perhaps there is something completely off with the usage of the sdk here as well.
Expected behavior
In general, I would expect that CloseAsync() would always close the inner ReceivingAmqpLink.
To Reproduce
Reproduction Repo = https://github.com/paulsavides/ServiceBusTesting
ReproProject is the project that reproduces this issue. If the code is doing something extremely incorrect, please let me know. We are actually using the MassTransit library to interact with AzureServiceBus so I had to recreate a bit of what it was doing that reproduces the error.
- Open solution from production repo
- Set ReproProject as Startup project
- Fill in Endpoint & Shared Access Key Signature in Program.cs
- Run the project
- While the project is running, open the Queue in the Azure UI & continually update the
Auto-delete after idlesetting.- This is an attempt to cause errors that requires the links to be recreated
- You can view this video to see exactly what I mean if the instructions are unclear https://www.youtube.com/watch?v=sv0bRozEevs
- Eventually, in the console output, you will see errors coming through the exception handler & the receiver will 'recycle' some number of times
- After recycle, you should start seeing message sends & receives being mismatched
- if not, go back to step 5
- Press
dto print out diagnostics on all of the links from "closed" receivers that are still open & the number of unsettled messages from those links
Environment:
- Microsoft.Azure.ServiceBus 5.0.0
- .net sdk 3.1.102, Microsoft.NETCore.App 3.1.9
- Visual Studio 16.8.1
- Have verified the issue occurs on AzureServiceBus standard tier, I believe I have seen it on the premium tier as well.
Please let me know if you require any clarification from me.
Thank you for taking the time to look into this,
Paul Savides