Skip to content

Conversation

@nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented Jan 8, 2023

Change Logs

Hudi has a requirement to set record keys/primary keys by the user. But some use-cases like ingesting log events, users may not have any column that can act as a primary key field. So, would be good to expose an auto generation record keys internally within hudi for such immutable use-cases. 
So, adding keyless support to Hudi in this patch. In other words, adding auto generation of record keys to Hudi.

To discuss:

  • should we make the new config a table config? so that users may not be able to switch from auto generated keys to something else later in the lifecycle of a table.

Impact

Users no longer need to set primary key/record key fields for immutable use-cases or similar workloads and let hudi generate keys internally. Understandably, few configs/operations are not supported in this flow like updates, de-dup are not supported. Even if users set operation type to "upsert", hudi is going to ingest the new incoming records as inserts, just that its gonna incur additional index lookup latency w/o much use.

To leverage this, users need to enable a new config hoodie.auto.generate.record.keys

Risk level (write none, low medium or high below)

Medium

Documentation Update

Might have to add a FAQ or website update on the keyless support.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jan 8, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:sql SQL interfaces engine:spark Spark integration priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants