You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a batch of events is submitted to Nakadi, and one of them fails due to a validation error, Nakadi will reject the whole batch. In the answer, the failed one will be marked as failed, but the other ones as aborted.
Nakadi-Producer will then retry all of them in the next run, running into the same error again.
So not just the failed events are blocked from submitting, but also other events which end up in the same batch. In the extreme case, this can end up blocking all event sending of a service.
This behavior of Nakadi is there to guarantee the order of events submitted together. But as Nakadi-producer doesn't guarantee that order anyways, there is no point in this in our case.
Possible improvement
If some events are failing a validation and others are aborted, the aborted ones should be retried before the failing ones.
We could maybe also reduce the retry frequency of events with validation failures, as those won't get valid by themselves, only by a change in the event type's schema on Nakadi side.
We shouldn't just skip the events completely though, so they do show up in the monitoring and the problem can be fixed.
The text was updated successfully, but these errors were encountered:
Current situation
When a batch of events is submitted to Nakadi, and one of them fails due to a validation error, Nakadi will reject the whole batch. In the answer, the failed one will be marked as
failed
, but the other ones asaborted
.Nakadi-Producer will then retry all of them in the next run, running into the same error again.
So not just the failed events are blocked from submitting, but also other events which end up in the same batch. In the extreme case, this can end up blocking all event sending of a service.
This behavior of Nakadi is there to guarantee the order of events submitted together. But as Nakadi-producer doesn't guarantee that order anyways, there is no point in this in our case.
Possible improvement
If some events are failing a validation and others are aborted, the aborted ones should be retried before the failing ones.
We could maybe also reduce the retry frequency of events with validation failures, as those won't get valid by themselves, only by a change in the event type's schema on Nakadi side.
We shouldn't just skip the events completely though, so they do show up in the monitoring and the problem can be fixed.
The text was updated successfully, but these errors were encountered: