[HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups #4352

manojpec · 2021-12-17T09:05:28Z

What is the purpose of the pull request

Today, base files have bloom filter at their footers and index lookups
have to load the base file to perform any bloom lookups. Though we have
interval tree based file purging, we still end up in significant amount
of base file read for the bloom filter for the end index lookups for the
keys. This index lookup operation can be made more performant by having
all the bloom filters in a new metadata partition and doing pointed
lookups based on keys.

Brief change log

RFC-37 #3989 Implementation

Write path will now additionally persist bloom filters from all the newly added
base files from the inflight commit to the metadata table bloom index, and
column range metadata to the metadata table column stats index.
Read path during tagLocation() lookupIndex() will look at the metadata table
indices instead of base files. Final verification of the incoming keys will continue
to happen with the respective base files.

Tests

Total Index lookup time taken:
Table: COW
Operation: Upsert
Spark executors, cores: 25, 4

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java

nsivabalan

Done with 1 pass over source code.

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java

vinothchandar

Need to review the full stats/bloom filter write/read path still

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

vinothchandar · 2022-01-11T03:43:29Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

+      final String keyField = hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp();
+      return context.flatMap(partitions, new SerializableFunction<String, Stream<Pair<String, BloomIndexFileInfo>>>() {
+        @Override
+        public Stream<Pair<String, BloomIndexFileInfo>> apply(String partitionName) throws Exception {


we discussed reading all of this from the driver correct? like fetch the entire list of stats for a key column alone?

With millions of files, loading all from the driver might not be a good idea. Will explore this more as part of the new DAG PR.

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

...link-client/src/main/java/org/apache/hudi/metadata/FlinkHoodieBackedTableMetadataWriter.java

hudi-common/src/main/avro/HoodieMetadata.avsc

vinothchandar · 2022-01-11T04:01:11Z

hudi-common/src/main/java/org/apache/hudi/common/config/HoodieMetadataConfig.java

      .sinceVersion("0.10.0")
      .withDocumentation("Enable full scanning of log files while reading log records. If disabled, hudi does look up of only interested entries.");

+  public static final ConfigProperty<Boolean> ENABLE_META_INDEX_BLOOM_FILTER = ConfigProperty


you are assuming its always the key field that the bloom filter points to. We need to also take another config where user can specify list of columns/fields to track bloom filters for

Taking up this in https://issues.apache.org/jira/browse/HUDI-3327

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLookupHandle.java

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieMetadataPayload.java

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/HoodieKeyLookupHandle.java

codope · 2022-01-18T10:03:59Z

...client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomMetaIndexBatchCheckFunction.java

+      final String partitionPath = entry._2.getPartitionPath();
+      final String fileId = entry._1;
+      if (!fileIDBaseFileMap.containsKey(fileId)) {
+        Option<HoodieBaseFile> baseFile = hoodieTable.getBaseFileOnlyView().getLatestBaseFile(partitionPath, fileId);


Is it possible at any point that there are no base files in the table? What happens then? Like for example, the MOR table due to kakfka-connect creates only log files.

Is it possible for iterator to have a fileId multiple times in the same task?

The caller passes in list of tuples of file id/name to key. So, when many keys fall in the same file, we can see the same fileid repeating in the input list. Here we are constructing the base file for the file id only once.

Is it possible at any point that there are no base files in the table? What happens then?

The bloom filter and column range info are built from base files footer details. When there are no base files, we don't have index for them.

Is it possible at any point that there are no base files in the table? What happens then?

Generally, even the MOR user table, starts off with a base file. With no indexes or index lookup miss, upserts will choose the insert code path and there by forcing the base file creation. But, I need to explore more on the kafka-connect case creating log files only. This is an open item.

...-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomMetaIndexLazyCheckFunction.java

hudi-common/src/main/java/org/apache/hudi/common/table/log/AbstractHoodieLogRecordReader.java

hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReader.java

hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java

vinothchandar

Made one pass. But lets address the perf issues and simplify all these different code paths

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

vinothchandar · 2022-01-18T23:54:07Z

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

+
+        Collections.sort(columnStatKeys);
+        Map<Pair<String, String>, HoodieColumnStats> fileToColumnStatMap = hoodieTable
+            .getMetadataTable().getColumnStats(columnStatKeys, keyField);


nit: Pair<String, String> is a less than ideal API for partitionPath and file or something. lets go with getColumnStats(Option<String> partitionPath, String fileName)

getColumnStats() and getBloomFilters() are built to work for a list of partition-file paired keys. Same partition and multiple files can be requested. So, i need a combo of partition and file names here. I can revisit this after the performance/dag work.

...park-client/src/main/java/org/apache/hudi/metadata/SparkHoodieBackedTableMetadataWriter.java

...-client/src/main/java/org/apache/hudi/index/bloom/HoodieBloomMetaIndexLazyCheckFunction.java

hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/HoodieIndexUtils.java

hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/bloom/HoodieBloomIndex.java

hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieFileReader.java

hudi-common/src/main/java/org/apache/hudi/io/storage/HoodieHFileReader.java

hudi-common/src/main/java/org/apache/hudi/metadata/BaseTableMetadata.java

manojpec · 2022-02-02T02:58:29Z

@nsivabalan @codope CI test failure in TestHoodieDeltaStreamerWithMultiWriter is fixed by #4704

nsivabalan

LGTM. just one nit.

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

...-client/src/main/java/org/apache/hudi/index/bloom/HoodieMetadataBloomIndexCheckFunction.java

hudi-common/src/main/java/org/apache/hudi/metadata/MetadataPartitionType.java

manojpec · 2022-02-02T19:27:41Z

@codope
CI passed in the re-run https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=5679&view=results

@hudi-bot run azure

…peed up index lookups - Today, base files have bloom filter at their footers and index lookups have to load the base file to perform any bloom lookups. Though we have interval tree based file purging, we still end up in significant amount of base file read for the bloom filter for the end index lookups for the keys. This index lookup operation can be made more performant by having all the bloom filters in a new metadata partition and doing pointed lookups based on keys.

…peed up index lookups - Adding indexing support for clean, restore and rollback operations. Each of these operations will now be converted to index records for bloom filter and column stats additionally.

…peed up index lookups - Making hoodie key consistent for both column stats and bloom index by including fileId instead of fileName, in both read and write paths. - Performance optimization for looking up records in the metadata table. - Avoiding multi column sorting needed for HoodieBloomMetaIndexBatchCheckFunction

…peed up index lookups - HoodieBloomMetaIndexBatchCheckFunction cleanup to remove unused classes - Base file checking before reading the file footer for bloom or column stats

…peed up index lookups - Updating the bloom index and column stats index to have full file name included in the key instead of just file id. - Minor test fixes.

…peed up index lookups - Fixed flink commit method to handle metadata table all partition update records - TestBloomIndex fixes

…peed up index lookups - SparkHoodieBloomIndexHelper code simplification for various config modes - Signature change for getBloomFilters() and getColumnStats(). Callers can just pass in interested partition and file names, the index key is then constructed internally based on the passed in parameters. - KeyLookupHandle and KeyLookupResults code refactoring - Metadata schema changes - removed the reserved field

…peed up index lookups - Removing HoodieColumnStatsMetadata and using HoodieColumnRangeMetadata instead. Fixed the users of the the removed class.

…peed up index lookups - Extending meta index test to cover deletes, compactions, clean and restore table operations. Also, fixed the getBloomFilters() and getColumnStats() to account for deleted entries.

…peed up index lookups - Addressing review comments - java doc for new classes, keys sorting for lookup, index methods renaming.

…peed up index lookups - Consolidated the bloom filter checking for keys in to one HoodieMetadataBloomIndexCheckFunction instead of a spearate batch and lazy mode. Removed all the configs around it. - Made the metadata table partition file group count configurable. - Fixed the HoodieKeyLookupHandle to have auto closable file reader when checking bloom filter and range keys. - Config property renames. Test fixes.

…peed up index lookups - Enabling column stats indexing for all columns by default - Handling column stat generation errors and test update

…peed up index lookups - Metadata table partition file group count taken from the slices when the table is bootstrapped. - Prep records for the commit refactored to the base class - HoodieFileReader interface changes for filtering keys - Multi column and data types support for colums stats index

…peed up index lookups - rebase to latest master and merge fixes for the build and test failures

…peed up index lookups - Extending the metadata column stats type payload schema to include more statistics about the column ranges to help query integration.

…peed up index lookups - Addressing review comments

codope

Looks good. Please resolve the conflicts and this should be ready to land.

hudi-bot · 2022-02-03T09:15:11Z

CI report:

235981a UNKNOWN
e489045 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

manojpec changed the title ~~[HUDI-1295] Metadata Index - Bloom filter and Column stats index to peed up index lookups~~ [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups Dec 17, 2021

manojpec mentioned this pull request Dec 17, 2021

[WIP][HUDI-1295] Metadata Index - Bloom filter and Column stats metadata to speed up index lookups #3904

Closed

5 tasks

manojpec changed the title ~~[HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups~~ [WIP][HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups Dec 17, 2021

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch 2 times, most recently from 235981a to 9ee3e62 Compare December 23, 2021 08:32

manojpec changed the title ~~[WIP][HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups~~ [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups Dec 23, 2021

xiarixiaoyao reviewed Dec 23, 2021

View reviewed changes

hudi-common/src/main/java/org/apache/hudi/common/util/ParquetUtils.java Outdated Show resolved Hide resolved

vinothchandar self-assigned this Dec 25, 2021

vinothchandar added the big-needle-movers label Dec 25, 2021

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 9ee3e62 to 83f8b8c Compare January 4, 2022 18:57

nsivabalan requested changes Jan 5, 2022

View reviewed changes

manojpec mentioned this pull request Jan 10, 2022

[HUDI-1295][HUDI-3181] Enabling metadata table based index by default for tests #4516

Closed

5 tasks

vinothchandar reviewed Jan 11, 2022

View reviewed changes

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch 2 times, most recently from 4aea835 to 46d4587 Compare January 14, 2022 09:12

manojpec requested review from nsivabalan, vinothchandar and xiarixiaoyao January 14, 2022 09:15

nsivabalan requested changes Jan 17, 2022

View reviewed changes

manojpec requested a review from nsivabalan January 18, 2022 03:32

codope reviewed Jan 18, 2022

View reviewed changes

vinothchandar requested changes Jan 19, 2022

View reviewed changes

manojpec requested a review from codope January 20, 2022 20:00

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 5d5925e to b3d8632 Compare January 20, 2022 20:15

nsivabalan mentioned this pull request Jan 24, 2022

[HUDI-1822][RFC-27][WIP] range index support with metadata table #3475

Closed

5 tasks

prashantwason requested changes Jan 26, 2022

View reviewed changes

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from b3d8632 to 69f3357 Compare January 26, 2022 09:30

manojpec requested review from prashantwason and vinothchandar January 27, 2022 09:45

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 83e8c78 to 176c05c Compare January 27, 2022 21:06

manojpec requested a review from nsivabalan February 2, 2022 02:58

nsivabalan approved these changes Feb 2, 2022

View reviewed changes

...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java Show resolved Hide resolved

manojpec mentioned this pull request Feb 2, 2022

[HUDI-2488][HUDI-3175] Implement async metadata indexing #4693

Merged

5 tasks

codope reviewed Feb 2, 2022

View reviewed changes

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 2541391 to d78e61c Compare February 2, 2022 17:04

manojpec added 16 commits February 2, 2022 22:39

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

eaa2c85

…peed up index lookups - Adding indexing support for clean, restore and rollback operations. Each of these operations will now be converted to index records for bloom filter and column stats additionally.

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

f112f90

…peed up index lookups - HoodieBloomMetaIndexBatchCheckFunction cleanup to remove unused classes - Base file checking before reading the file footer for bloom or column stats

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

7a44b1c

…peed up index lookups - Updating the bloom index and column stats index to have full file name included in the key instead of just file id. - Minor test fixes.

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

ab7a5ee

…peed up index lookups - Fixed flink commit method to handle metadata table all partition update records - TestBloomIndex fixes

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

e5c8f82

…peed up index lookups - Removing HoodieColumnStatsMetadata and using HoodieColumnRangeMetadata instead. Fixed the users of the the removed class.

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

3045b76

…peed up index lookups - Extending meta index test to cover deletes, compactions, clean and restore table operations. Also, fixed the getBloomFilters() and getColumnStats() to account for deleted entries.

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

2c61c59

…peed up index lookups - Addressing review comments - java doc for new classes, keys sorting for lookup, index methods renaming.

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

da4cf69

…peed up index lookups - Enabling column stats indexing for all columns by default - Handling column stat generation errors and test update

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

5f5ba30

…peed up index lookups - rebase to latest master and merge fixes for the build and test failures

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

bbe0f56

…peed up index lookups - Extending the metadata column stats type payload schema to include more statistics about the column ranges to help query integration.

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to s…

e489045

…peed up index lookups - Addressing review comments

manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from d78e61c to e489045 Compare February 3, 2022 07:18

codope approved these changes Feb 3, 2022

View reviewed changes

manojpec mentioned this pull request Feb 3, 2022

[HUDI-3356][HUDI-3203] HoodieData for metadata index records; BloomFilter construction from index based on the type param #4740

Closed

5 tasks

codope merged commit 5927bdd into apache:master Feb 3, 2022

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups #4352

[HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups #4352

Uh oh!

Conversation

manojpec commented Dec 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the pull request

Brief change log

Tests

Verify this pull request

Committer checklist

Uh oh!

Uh oh!

nsivabalan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinothchandar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vinothchandar Jan 11, 2022

Choose a reason for hiding this comment

Uh oh!

manojpec Jan 14, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinothchandar Jan 11, 2022

Choose a reason for hiding this comment

Uh oh!

manojpec Jan 26, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codope Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

manojpec Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinothchandar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

manojpec commented Dec 17, 2021 •

edited

Loading

manojpec Jan 20, 2022 •

edited

Loading