-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3678] Fix record rewrite of create handle when 'preserveMetadat… #5088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1f4b599 to
d707c69
Compare
88b9fed to
ab52e9f
Compare
| // do not preserve FILENAME_METADATA_FIELD | ||
| recordWithMetadataInSchema.put(FILENAME_METADATA_FIELD_POS, newFilePath.getName()); | ||
| fileWriter.writeAvro(hoodieRecord.getRecordKey(), recordWithMetadataInSchema); | ||
| if (preserveMetadata && useWriterSchema) { // useWriteSchema will be true only in case of compaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason why I had to rewriteRecord is that, indexexRecord did not have meta fields with update path from the caller. Hence passed in the oldRecord from which we can deduce the meta fields.
For eg,
the return value from combineAndGetUpdate(...) does have the meta fields.
not sure if we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically I am talking about this code snippet in HoodieMergeHandle. combinedAvroRecord below does not contain any meta fields. So, if you remove rewriting w/ meta columns, not sure how that would pan out
public void write(GenericRecord oldRecord) {
String key = KeyGenUtils.getRecordKeyFromGenericRecord(oldRecord, keyGeneratorOpt);
boolean copyOldRecord = true;
if (keyToNewRecords.containsKey(key)) {
// If we have duplicate records that we are updating, then the hoodie record will be deflated after
// writing the first record. So make a copy of the record to be merged
HoodieRecord<T> hoodieRecord = keyToNewRecords.get(key).newInstance();
try {
Option<IndexedRecord> combinedAvroRecord =
hoodieRecord.getData().combineAndGetUpdateValue(oldRecord,
useWriterSchema ? tableSchemaWithMetaFields : tableSchema,
config.getPayloadConfig().getProps());
.
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we read the old records from base file with metadata fields in schema, see:
Line 76 in 52f0498
| readSchema = mergeHandle.getWriterSchemaWithMetaFields(); |
And for new records in the in-coming dataset, the schema also includes the metadata fields.
The combineAndGetUpdateValue method also handle the records with write schema(schema with metadata fields)
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left one clarifying question
|
The failure is not caused by the patch, so i would just merge it. |
…a' is true
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.