Skip to content

Conversation

@yihua
Copy link
Contributor

@yihua yihua commented Apr 9, 2022

What is the purpose of the pull request

This PR adds a new config to control the use and reading of metadata index in HoodieBloomIndex, separate from the existing write configs, e.g., hoodie.metadata.index.column.stats.enable, which control the writing of the index in metadata table.

Brief change log

  • Adds new config hoodie.bloom.index.use.metadata to control the use of index from metadata table in HoodieBloomIndex. When true, the index lookup uses bloom filters and column stats from metadata table when available to speed up the process.
  • Replaces the control knobs with the new config for reading metadata index on the write path.

Verify this pull request

This pull request is already covered by existing tests.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@yihua yihua self-assigned this Apr 9, 2022
@yihua yihua added the priority:blocker Production down; release blocker label Apr 9, 2022
JavaRDD<List<HoodieKeyLookupResult>> keyLookupResultRDD;
if (config.isMetadataBloomFilterIndexEnabled()) {
if (config.getBloomIndexUseMetadata()
&& getCompletedMetadataPartitions(hoodieTable.getMetaClient().getTableConfig())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add an api to tableConfig only getCompletedMetadataPartitions(). we can compute once and return the same rather than parsing the table config entry again and again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HoodieTableMetadataUtil::getCompletedMetadataPartitions is an existing API. We can tackle this code cleanup in a separate PR: HUDI-3836.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 1 nit

@yihua yihua force-pushed the HUDI-3807-config-read-multi-modal-index branch from 201bac7 to 230df29 Compare April 9, 2022 06:19
@hudi-bot
Copy link
Collaborator

hudi-bot commented Apr 9, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit 3e97c88 into apache:master Apr 9, 2022
xushiyan pushed a commit that referenced this pull request Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants