Skip to content

Conversation

@codope
Copy link
Member

@codope codope commented Apr 4, 2022

What is the purpose of the pull request

Fix HUDI-3782

Now that we have table configs for inflight and completed metadata partitions, and given that both reader and writer are going to rely on this config, we need to ensure that this config is updated properly even when any one of the metadata indexes is enabled/disabled.

For instance, let's say user started with metadata enabled and colstats enabed. Colstats partition was fully built and the table config got updated. Now, they disable it and we miss to update table config. So the writers won't update the colstats partition and it would be out of sync. Readers think that the table config has it so it's already in sync but that's incorrect. This patch fixes this issue.

Brief change log

  • Determine whether there is a need to delete MDT partition if any index config is disabled.
  • Determine whether there is a need to re-index any MDT partition.

Verify this pull request

Added a unti test which first writes with colstats enabled, then disabled (btu metadata enabled), and then re-enabled. Validated table config and metadata.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly minor comments

}

if (partitionTypes.contains(MetadataPartitionType.FILES)) {
// Record which saves the list of all partitions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may I know why do we need this if condition. can you help clarify.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just saves some duplicate effort. There was no correctness issue in absence of this if condition. For e.g., when colstats is re-enabled, we are reusing this method and then we don't really need to redo the files partition, hence this if condition. In absence of this condition, HoodieMetadataPayload.createPartitionListRecord(partitions) would have been called everytime.

@codope codope added the priority:blocker Production down; release blocker label Apr 5, 2022
@hudi-bot
Copy link
Collaborator

hudi-bot commented Apr 5, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants