-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent flakiness of v7.0 RC #1676
Comments
We have a hunch the problem might be related to confirms tracking and we are restoring our homegrown approach we had before to verify. |
Can you point out the exact test that is failing? In your source code? The new confirmation tracking code was added by @stebet (I think) so I'm pinging him here. When you call |
Ah OK that's probably the missing link. Given the API returns a task and the channel knows that confirms are enabled why does one need to call two methods? The failing tests are visible in the CI runs. Sure I can link them later here |
Was this design chosen because of the |
Ok, I think I understand now. The work followed an existing design and made that async. Because we had our own confirmation tracking in place, we never needed to use |
Yep, I think that's the main idea, plus I think it makes it a bit "easier" for an application to handle the case of a timed-out confirmation, rather than a random exception while publishing or at some other point. Let me know if there's anything I can do to to assist with your testing. Thanks a lot for giving the latest RC a spin! |
@bording and I discussed this further today, and we believe this is a legacy that should be removed from the API surface of the client. In an async world on a channel that has confirms enabled, the publish operation could simply wire up the task completion source and then await that instead of the Doing multiple publishes and waiting for those to be confirmed is then simply a We will look into opening a PR |
Great, that makes a lot of sense. |
Be careful that multiple publishes via A System.Threading.Channel might solve that and multiple confirms could then simply be a part of an inner (synchronous) |
Sure we can take this into account in the PR @Tornhoof |
@Tornhoof FYI we don't care about the order of publishes for our use cases that much, but I can see how that is something that might be something the users of this library care about, which I guess right now is fulfilled by having the channel wide semaphore ensuring consistent ordering. So I started wondering how much of that is a concern of the rabbitmq client vs the user of the API surface. |
I'd say User, but if publish automatically confirms (and single confirm is slow), then you'd need something like PublishMany(list) with multi confirm. From my own Benchmarks while migrating to 7.0, without any confirm, the code published around 16k msg/s, with single confirm it was down to 800 and with an S.T.Channel doing multi-confirm after the synchronous TryRead Loop lifted it back to ~10k. |
Now that things are split out into dedicated issues and PRs I will close this one. Brandon will update #1682 with some of the discussions and challenges we are having. |
Describe the bug
It is still a bit early to say what actually causes these things to happen and might very well be that the problem ins somewhere in our implementation. But we (@bording and I) figured to give a heads-up in case it ends up being something in the client.
We have migrated to the latest RC of the v7 version and see some intermittent test failures
https://github.com/Particular/NServiceBus.RabbitMQ/actions/runs/10852801383?pr=1446
For some reason, the consume isn't happening, which messes up all the other tests
The thing we might be seeing so far here would be if
BasicPublishAsync
task is somehow completing before the message has actually been fully sent and confirmed. This would then let theBasicGetAsync
start before the message is in the queue. At first sight, we couldn't spot anything in the transport or test code that could account for that, but like I said we may have missed something.It is failing on Linux, so it's not just a Windows thing: https://github.com/Particular/NServiceBus.RabbitMQ/actions/runs/10854796632?pr=1446
Reproduction steps
Still investigating
Expected behavior
Publish working ;)
Additional context
No response
The text was updated successfully, but these errors were encountered: