-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3213] Making commit preserve metadata to true for compaction #4811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3213] Making commit preserve metadata to true for compaction #4811
Conversation
9ae9222 to
7ed0a4a
Compare
480bf89 to
082ec6b
Compare
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java
Show resolved
Hide resolved
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java
Outdated
Show resolved
Hide resolved
| if (preserveMetadata) { | ||
| // do not preserve FILENAME_METADATA_FIELD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always the case where we never preserve FILENAME ? since when compact we always create new files, right? then it's some extra rule we have to remember for this config, which sounds like we preserve all meta cols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't have a better idea. just felt the confusion here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. so by preserve commit metadata, implicitly we mean to preserve every commit metadata(record key, partition path, commit time, commit seq no) except filename. Every other meta field should be carry forwarded from old record, but filename has to represent the file where the record actually exists.
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Outdated
Show resolved
Hide resolved
f80680b to
b8f3071
Compare
|
@xushiyan : this might need some work to be done. I will sync up w/ you directly. Fix is not trivial as I anticipated it to be. |
6ed7704 to
ba8eca5
Compare
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
Outdated
Show resolved
Hide resolved
|
@xushiyan : this is good to review. |
|
@xushiyan : if you are occupied, I can ask sagar or ethan to review this patch. let me know. |
ba8eca5 to
38cf843
Compare
codope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if my below comment holds for this patch.
hudi-common/src/main/java/org/apache/hudi/avro/HoodieAvroUtils.java
Outdated
Show resolved
Hide resolved
1b14058 to
8ccef8c
Compare
xushiyan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsivabalan hey sorry i failed in catching up with notifications. Changes look okay. The comments below are optional or follow-up.
| IndexedRecord recordWithMetadataInSchema = rewriteRecord((GenericRecord) indexedRecord.get()); | ||
| if (preserveMetadata) { | ||
| IndexedRecord recordWithMetadataInSchema = rewriteRecord((GenericRecord) indexedRecord.get(), preserveMetadata, oldRecord); | ||
| if (preserveMetadata && useWriterSchema) { // useWriteSchema will be true only incase of compaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the case, better name could be useWriterSchemaForCompaction
| return newRecord; | ||
| } | ||
|
|
||
| public static GenericRecord rewriteRecord(GenericRecord genericRecord, Schema newSchema, boolean copyOverMetaFields, GenericRecord fallbackRecord) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be good to have at least 1 UT covering this
| public static final String OPERATION_METADATA_FIELD = "_hoodie_operation"; | ||
| public static final String HOODIE_IS_DELETED = "_hoodie_is_deleted"; | ||
|
|
||
| public static int FILENAME_METADATA_FIELD_POS = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could be a separate cleanup task: make constants for all meta fields and adopt them across codebase
…pache#4811) * Making commit preserve metadata to true * Fixing integ tests * Fixing preserve commit metadata for metadata table * fixed bootstrap tests * temp diff * Fixing merge handle * renaming fallback record * fixing build issue * Fixing test failures
…pache#4811) * Making commit preserve metadata to true * Fixing integ tests * Fixing preserve commit metadata for metadata table * fixed bootstrap tests * temp diff * Fixing merge handle * renaming fallback record * fixing build issue * Fixing test failures
What is the purpose of the pull request
Flipping the value of preserver commit metadata for compaction to true. Disabled for metadata table since meta fields are not populated.
To discuss:
when populate meta fields are disabled, should be check that preserve metadata can't be set to true. this might be an issue w/ kafka connect where we directly create log files.
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.