[HUDI-5514] Adding auto generation of record keys/Keyless support to Hudi #7622
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change Logs
Hudi has a requirement to set record keys/primary keys by the user. But some use-cases like ingesting log events, users may not have any column that can act as a primary key field. So, would be good to expose an auto generation record keys internally within hudi for such immutable use-cases.
So, adding keyless support to Hudi in this patch. In other words, adding auto generation of record keys to Hudi.
To discuss:
Impact
Users no longer need to set primary key/record key fields for immutable use-cases or similar workloads and let hudi generate keys internally. Understandably, few configs/operations are not supported in this flow like updates, de-dup are not supported. Even if users set operation type to "upsert", hudi is going to ingest the new incoming records as inserts, just that its gonna incur additional index lookup latency w/o much use.
To leverage this, users need to enable a new config
hoodie.auto.generate.record.keysRisk level (write none, low medium or high below)
Medium
Documentation Update
Might have to add a FAQ or website update on the keyless support.
Contributor's checklist