-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-3782] Fixing table config when any of the index is disabled #5222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3782] Fixing table config when any of the index is disabled #5222
Conversation
nsivabalan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly minor comments
...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java
Show resolved
Hide resolved
| } | ||
|
|
||
| if (partitionTypes.contains(MetadataPartitionType.FILES)) { | ||
| // Record which saves the list of all partitions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I know why do we need this if condition. can you help clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just saves some duplicate effort. There was no correctness issue in absence of this if condition. For e.g., when colstats is re-enabled, we are reusing this method and then we don't really need to redo the files partition, hence this if condition. In absence of this condition, HoodieMetadataPayload.createPartitionListRecord(partitions) would have been called everytime.
What is the purpose of the pull request
Fix HUDI-3782
Now that we have table configs for inflight and completed metadata partitions, and given that both reader and writer are going to rely on this config, we need to ensure that this config is updated properly even when any one of the metadata indexes is enabled/disabled.
For instance, let's say user started with metadata enabled and colstats enabed. Colstats partition was fully built and the table config got updated. Now, they disable it and we miss to update table config. So the writers won't update the colstats partition and it would be out of sync. Readers think that the table config has it so it's already in sync but that's incorrect. This patch fixes this issue.
Brief change log
Verify this pull request
Added a unti test which first writes with colstats enabled, then disabled (btu metadata enabled), and then re-enabled. Validated table config and metadata.
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.