Better handling of validation errors #172

ePaul · 2023-07-06T17:05:30Z

Current situation

When a batch of events is submitted to Nakadi, and one of them fails due to a validation error, Nakadi will reject the whole batch. In the answer, the failed one will be marked as failed, but the other ones as aborted.

Nakadi-Producer will then retry all of them in the next run, running into the same error again.
So not just the failed events are blocked from submitting, but also other events which end up in the same batch. In the extreme case, this can end up blocking all event sending of a service.

This behavior of Nakadi is there to guarantee the order of events submitted together. But as Nakadi-producer doesn't guarantee that order anyways, there is no point in this in our case.

Possible improvement

If some events are failing a validation and others are aborted, the aborted ones should be retried before the failing ones.
We could maybe also reduce the retry frequency of events with validation failures, as those won't get valid by themselves, only by a change in the event type's schema on Nakadi side.
We shouldn't just skip the events completely though, so they do show up in the monitoring and the problem can be fixed.

The text was updated successfully, but these errors were encountered:

ePaul added enhancement nakadi-submission labels Jul 6, 2023

ePaul mentioned this issue Sep 12, 2023

Nakadi clients resilience to partial outage and partial success #181

Open

ePaul added the help wanted label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of validation errors #172

Better handling of validation errors #172

ePaul commented Jul 6, 2023

Better handling of validation errors #172

Better handling of validation errors #172

Comments

ePaul commented Jul 6, 2023

Current situation

Possible improvement