-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dead Letter Queue for stuck events #371
Comments
Can you clarify what you expect from a client with a DLQ over a distributed ordered log? Typically DLQs imply storage offload and a common solution would mean imposing common storage on client users. For example this feature is built into SQS because its competing consumer message popping protocol allows that to makes sense whereas it's not available in nakadi alternatives like Kinesis or Kafka which all have stronger implications around ordering. The existing client has enough affordances to allow a user to skip past the event and checkpoint.
You need to back up and qualify statements like "every team", "very costly" and "differ vastly". I suspect for the latter that teams do this differently is in part related to processing semantics of events and how they are related to each other. Bear in mind a structurally invalid event won't get into nakadi for the most part because the service insists on a schema. As an exercise, it would help to consider how nakadi itself would offer a DLQ (again akin to SQS, perhaps by exposing an API for it, rather than a service configuration: every event type is in a unique resource space as is the consumer subscription, so it seems possible to define an API for it). That might indicate what can be generalised and what is specific to event data and their consumers. |
A simple idea would be something like If processing an event (from a subscription) fails (with an exception in the callback), it is published to some other event type, and the subscription cursor is then committed anyways (if the submission was successful). The client configuration would just have the second event type when setting up a consumer. Of course, this means that these events will be processed out of order (if at all), but this is often preferable to not being able to process any other events on this partition. Builders should be empowered to make this decision. This can be set up on top of Nakadi-Java (and I guess this is what teams are doing), but having it integrated in a client makes it easier. |
The events which the consumer is not able to process block the whole pipeline. Those events have to be manually skipped in order to unblock the pipeline and continue the processing of the following events. It would be helpful to have a feature configured that allows to skip not processable events, continue processing of the events, but publish "broken events" to another event type for later investigation.
Dead letter queue is a functionality that many teams are missing and every team have implemented their own local version of DLQ. This is very costly and in the absence of best practices, often the implementations differ vastly from each other. A central implementation of DQL for Nakadi can greatly reduce the software/technology complexity and cost for many individual teams.
The text was updated successfully, but these errors were encountered: