-
Notifications
You must be signed in to change notification settings - Fork 12
Allow searching in all partitions of a given event type #44
Comments
We discussed this idea. We should be careful with the implementation. This can kill Nakadi performance if done wrong. |
When it comes to technical solution, I would say that Nakadi (Kafka) is optimize for sequential reading and not for random access or indexing things. To "pick" an event is a different access pattern. Each storage excels at a serving a different access pattern. If we are to provide this feature, we would benefit from a different storage. The data lake already allows to run arbitrary queries and scan all the data. It's very powerful but takes up to 5 minutes for data to be available there. If the requirement is to have it faster, we would have to keep some sort of "key/value cache". Loading all data from Nakadi (Kafka) on a per request basis is something that would not scale (for very busy event types that means loading gigabytes of data to find a single event). So, maybe Nakadi UI search could be integrated with some jdbc connector in the data lake? Adding a lookup storage would be quite costly and we should do it only if the data lake does not support this use case already. |
I guess the idea is not to find the event by itself but more to debug problems with nakadi ie confirm that event can be found in Nakadi. So Datalake can be used but it has no guaranty that event is actually in nakadi(error in pipeline or compaction). |
I really like the search feature, however it becomes tedious very quickly when looking for events in event-types with larger number of partitions.
I don't know how the search is implemented currently (low-level API?) but it should be possible to search across partitions.
The text was updated successfully, but these errors were encountered: