Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable eid generation strategy. #191

Open
Chuburashka opened this issue May 24, 2024 · 2 comments
Open

Configurable eid generation strategy. #191

Chuburashka opened this issue May 24, 2024 · 2 comments
Labels
auto-configuration everything about the auto-configuration features nakadi-submission persistence everything around DB access

Comments

@Chuburashka
Copy link

Chuburashka commented May 24, 2024

At this moment we have hardcoded logic for generating new eid value - link

There are following advantages for this solution:

  1. We have a guarantee that every event sent from this application (or rather, using this same eventlog table/eid sequence) has only one unique eid. If we republish this event it will be republished with the same eid.
  2. Sequential eid generation – the eid can be used as an ordering key to restore the event creation order.

But also we have some disadvantages:

  1. The ID generated here is formatted as an UUID (as this is how the field is declared at Nakadi), but actually doesn't provide the uniqueness guarantees expected from UUIDs
  2. If you have multiple producers (with separate databases) submitting to the same event type, each has their own sequence running in parallel, which will produce duplicates.

In some cases it topics with multiple producers are needed (or at least very nice to have), and we should have a uniqueness guarantee of eids at least per event type for them.

I see the these possible solutions:

  1. Based on configuration, use different strategy for convertToUUID method: current (by default) or UUID.randomUUID().
    The main changes are needed in mapToNakadiEvent and tryToPublishBatch.
    But in this case we lose all advantages of the current solution with randomUUID generation.

  2. Add additional eid column with generated eid to event_log table.
    Before persisting the event, event generate randomUUID in createEventLog and persist the EventLog entity with already generated eid.
    When we try to send event to nakadi, we can choose the different strategy based on eid field. If there is data there, then use eid field, if not, then id with default strategy. (Again don't forget about failed events resolving in tryToPublishBatch).
    In this case we have guarantee for repulishing, but lose sequential eid generation logic (possibly we can use spanCtx and add id field as key here).

@ePaul
Copy link
Member

ePaul commented May 24, 2024

Thanks for your suggestion. (I slightly reworded it, feel free to revert or edit again if I managed to misrepresent your intention.)

My approach for more flexibility would be to inject an EidConverterStrategy interface which can convert the database counter (as int or possibly rather long, to account for changes needed for #160) into an UUID.

  • The default implementation would just do the "prefix with 0" we are doing now.
  • Another option would be "ignore the input, generate random number".
  • A third option could be "prefix (or suffix) with some configured constant", so in your multiple producers case, you could have a different constant for each producer. This would at least keep some ordering between events from the same producer (but not between different producers).
  • Some time-based UUID generation (with some additional input per producer) might also work.

With the spring-boot starter, just providing such an object as a bean could be enough for the auto configuration to pick it up and inject into either the transmission service or the eventlog writer (and for the implementations we deliver with the library, we can also provide config file options to generate the bean).

A small complication here is that the way we are currently storing the events, the ID is only known after storing it into the DB, so doing this generation beforehand (which would be needed if we want randomness + idempotency on resending) is difficult. (Maybe some option could be to have a two-step process, inject one before storing and one after reading from the DB.)

@ePaul ePaul added auto-configuration everything about the auto-configuration features nakadi-submission persistence everything around DB access labels May 24, 2024
@Chuburashka
Copy link
Author

Chuburashka commented May 27, 2024

If I understood correctly do you want to generate eid when called mapToNakadiEvent in the sendEvents method? (after persist)

A small complication here is that the way we are currently storing the events, the ID is only known after storing it into the DB, so doing this generation beforehand (which would be needed if we want randomness + idempotency on resending) is difficult. (Maybe some option could be to have a two-step process, inject one before storing and one after reading from the DB.)

I think the idempotency on resending is one of the most important features which should be support.
For me better to divide responsibility of id field in the table and in that case we can have two columns: id and eid (both can contain the same value).

For eid field we can use one of the next definitions:

  1. ALTER TABLE nakadi_events.event_log ADD COLUMN eid text DEFAULT currval('nakadi_events.event_log_id_seq')::text
    Default value is a copy of id
  2. ALTER TABLE nakadi_events.event_log ADD COLUMN eid uuid DEFAULT CAST(LPAD(TO_HEX(currval('nakadi_events.event_log_id_seq')), 32, '0') AS UUID)
    Default value is UUID based on id field (the same what we do now).

We can use different EidGeneratorStrategy before persist and default implementation will be do nothing, because we already described the same logic in the migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-configuration everything about the auto-configuration features nakadi-submission persistence everything around DB access
Projects
None yet
Development

No branches or pull requests

2 participants