-
-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kafka NullPointerException on fetch #535
Comments
Does this come when you're closing a client? |
Also what version of Kafka were you upgrading from? And what's your inter-broker-protocol set to? |
This might be another instance of #295 |
We upgraded from 2.7.0, inter-broker is set to 3.5 No clients were being closed at the time, but there's a decent chance that new topics/partitions were created recently (not deleted however). As for kgo logs, there's a bunch of things like
and
|
Definitely not 295 -- the logic I was worried about is actually complete. Can you enable debug logs? |
Yep, I should be able to do that. |
I'm still working on getting the relevant debug logs, but this just happened again, and our entire kafka cluster crashed. Here's a sample of the errors we got from kafka in case they're interesting:
and
and
|
The entire cluster crashing is definitely a Kafka problem... |
Of course I haven't been able to reproduce this after I added the debug logging. It's possible it happens on partition creation, so I'll try that next week. |
Ok, it looks like this error pops up just after we signal a broker to shut down (e.g., during a rolling restart). Now that I know how to reproduce I should be able to get debug logs later this week. |
Any luck reproducing? |
Yes and no. I've managed to reproduce it a few times now (indeed it pops up when we restart a broker), but I lost the build I had running that had debug logs enabled. |
I am fairly certain I see the bug, wip to fix. |
When using a fetch session, if we stop fetching a topic or partition, we send that information in the fetch request. If we forget an entire topic, that means we do not add any cursor for the topic internally -- we just outright are no longer fetching the topic -- we previously had no topic ID in the fetch request for the forgotten topic. When sending this forgotten topic in the fetch request, we would not have the ID for it, and this would cause a NPE in Kafka. Now, when we add a topic to the session, we also save the topic ID. We use this for two purposes: * Now we correctly send the forgotten topic ID * We also can pin fetch requests to non-topic-ID versions if any topic is missing an ID at any point in the session (i.e. if a forgotten topic has no ID) Lastly we add a guard in metadata updating to ignore updates that miss topic IDs if we previously had a topic ID. Closes #535.
If you're able, can you try this branch? |
Howdy!
Since we upgraded our Kafka cluster to 3.5.1 (from 2.7.0) we've frequently been seeing errors like this:
I'm not at all sure this is franz-go's fault, but given that it's complaining about handling a request I thought that just maybe the client is doing something wrong.
The text was updated successfully, but these errors were encountered: