-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-4071][Stacked on 5771] Support insert without record key #6090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| val hoodieAllIncomingRecords = genericRecords.map(gr => { | ||
| val processedRecord = getProcessedRecord(partitionColumns, gr, dropPartitionColumns) | ||
| val csn = HoodieRecord.generateSequenceId(instantTime, partitionId, recordIndex.getAndIncrement()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought plan was to re-use the commit seq no that we generate within the writer (i.e. by executors). or is my understanding wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets discuss around this tomorrow when we meet up.
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some clarifying comments
|
Ensure that bulk_insert, insert, upsert follow the same logic. Should we also decouple the record key and partition path in keygen i.e. a separate keygen for the record key and partition path? |
|
@nsivabalan I have lowered the prioirty considering the impact of this change and some deisgn considerations as we discussed. From usability point of view, we can still take #5771 in the upcoming release, where we can support ingestion w/o record key just for bulk insert. Further things that need to be discussed in this PR:
|
|
@codope should we close this as start new? |
|
Yeah, let's close it. The implementation has some holes and requires a more rigorous design. |
What is the purpose of the pull request
While #5771 works for immutable tables (bulk insert), this PR is to support upsert without a record key.
Brief change log
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.