-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OffsetForLeaderEpoch loop of failed requests with multiple leader changes #4425
Comments
We've also hit this issue. Would it be feasible to have a 2.2.1 release as soon as the fix is ready? |
@scanterog sure we're planning to include it in next release |
Hi @emasab we also struggle with this one, once one of our instance is affected it consumes all CPU limit, and carry lots of RX (1GB/min) and TX transfer (100MB/min). Also broker is flooded with 400K+/min We started to observe this recently once we decided to bump from
I agree with @scanterog. Would it not be wise to cherry pick this and release HF of edit: I'll just add that we use .net wrapper confluent-kafka-dotnet so this HF would need to be released there as well. |
This continues to be an issue for us as well (using the same C# wrapper as @plachor. The Is there an estimated schedule for the next release that includes #4433? The only resolution is to restart processes that get in this state and we haven't yet found a way to automate that as it's not in the .NET code where the issue is, and the monitoring we have on the broker side just says there's a log of this call getting made but not what client(s) it is coming from. |
This fix is included in 2.3.0 according to the release notes. |
Description
Loop of OffsetForLeaderEpoch calls when multiple leader changes happen and one of them is being retried because of an error.
How to reproduce
To reproduce it there should be an initial partition leader change that triggers an OffsetForLeaderEpoch request. The request should fail while a second leader change happens. The corresponding current leader epoch isn't updated and there's a loop of failing requests.
Checklist
debug=..
as necessary) from librdkafkaThe text was updated successfully, but these errors were encountered: