Skip to content

Conversation

@manojpec
Copy link
Contributor

@manojpec manojpec commented Dec 17, 2021

What is the purpose of the pull request

  • Today, base files have bloom filter at their footers and index lookups
    have to load the base file to perform any bloom lookups. Though we have
    interval tree based file purging, we still end up in significant amount
    of base file read for the bloom filter for the end index lookups for the
    keys. This index lookup operation can be made more performant by having
    all the bloom filters in a new metadata partition and doing pointed
    lookups based on keys.

Brief change log

RFC-37 #3989 Implementation

  • Write path will now additionally persist bloom filters from all the newly added
    base files from the inflight commit to the metadata table bloom index, and
    column range metadata to the metadata table column stats index.

  • Read path during tagLocation() lookupIndex() will look at the metadata table
    indices instead of base files. Final verification of the incoming keys will continue
    to happen with the respective base files.

Tests

Total Index lookup time taken:
Table: COW
Operation: Upsert
Spark executors, cores: 25, 4

image

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@manojpec manojpec changed the title [HUDI-1295] Metadata Index - Bloom filter and Column stats index to peed up index lookups [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups Dec 17, 2021
@manojpec manojpec changed the title [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups [WIP][HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups Dec 17, 2021
@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch 2 times, most recently from 235981a to 9ee3e62 Compare December 23, 2021 08:32
@manojpec manojpec changed the title [WIP][HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups [HUDI-1295] Metadata Index - Bloom filter and Column stats index to speed up index lookups Dec 23, 2021
@vinothchandar vinothchandar self-assigned this Dec 25, 2021
@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 9ee3e62 to 83f8b8c Compare January 4, 2022 18:57
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with 1 pass over source code.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to review the full stats/bloom filter write/read path still

final String keyField = hoodieTable.getMetaClient().getTableConfig().getRecordKeyFieldProp();
return context.flatMap(partitions, new SerializableFunction<String, Stream<Pair<String, BloomIndexFileInfo>>>() {
@Override
public Stream<Pair<String, BloomIndexFileInfo>> apply(String partitionName) throws Exception {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed reading all of this from the driver correct? like fetch the entire list of stats for a key column alone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With millions of files, loading all from the driver might not be a good idea. Will explore this more as part of the new DAG PR.

.sinceVersion("0.10.0")
.withDocumentation("Enable full scanning of log files while reading log records. If disabled, hudi does look up of only interested entries.");

public static final ConfigProperty<Boolean> ENABLE_META_INDEX_BLOOM_FILTER = ConfigProperty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are assuming its always the key field that the bloom filter points to. We need to also take another config where user can specify list of columns/fields to track bloom filters for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch 2 times, most recently from 4aea835 to 46d4587 Compare January 14, 2022 09:12
@manojpec manojpec requested a review from nsivabalan January 18, 2022 03:32
final String partitionPath = entry._2.getPartitionPath();
final String fileId = entry._1;
if (!fileIDBaseFileMap.containsKey(fileId)) {
Option<HoodieBaseFile> baseFile = hoodieTable.getBaseFileOnlyView().getLatestBaseFile(partitionPath, fileId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible at any point that there are no base files in the table? What happens then? Like for example, the MOR table due to kakfka-connect creates only log files.

Copy link
Contributor Author

@manojpec manojpec Jan 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for iterator to have a fileId multiple times in the same task?

The caller passes in list of tuples of file id/name to key. So, when many keys fall in the same file, we can see the same fileid repeating in the input list. Here we are constructing the base file for the file id only once.

Is it possible at any point that there are no base files in the table? What happens then?

The bloom filter and column range info are built from base files footer details. When there are no base files, we don't have index for them.

Is it possible at any point that there are no base files in the table? What happens then?

Generally, even the MOR user table, starts off with a base file. With no indexes or index lookup miss, upserts will choose the insert code path and there by forcing the base file creation. But, I need to explore more on the kafka-connect case creating log files only. This is an open item.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made one pass. But lets address the perf issues and simplify all these different code paths


Collections.sort(columnStatKeys);
Map<Pair<String, String>, HoodieColumnStats> fileToColumnStatMap = hoodieTable
.getMetadataTable().getColumnStats(columnStatKeys, keyField);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Pair<String, String> is a less than ideal API for partitionPath and file or something. lets go with getColumnStats(Option<String> partitionPath, String fileName)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getColumnStats() and getBloomFilters() are built to work for a list of partition-file paired keys. Same partition and multiple files can be requested. So, i need a combo of partition and file names here. I can revisit this after the performance/dag work.

@manojpec manojpec requested a review from codope January 20, 2022 20:00
@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 5d5925e to b3d8632 Compare January 20, 2022 20:15
@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from b3d8632 to 69f3357 Compare January 26, 2022 09:30
@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 83e8c78 to 176c05c Compare January 27, 2022 21:06
@manojpec
Copy link
Contributor Author

manojpec commented Feb 2, 2022

@nsivabalan @codope CI test failure in TestHoodieDeltaStreamerWithMultiWriter is fixed by #4704

@manojpec manojpec requested a review from nsivabalan February 2, 2022 02:58
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. just one nit.

@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from 2541391 to d78e61c Compare February 2, 2022 17:04
@manojpec
Copy link
Contributor Author

manojpec commented Feb 2, 2022

…peed up index lookups

- Today, base files have bloom filter at their footers and index lookups
  have to load the base file to perform any bloom lookups. Though we have
  interval tree based file purging, we still end up in significant amount
  of base file read for the bloom filter for the end index lookups for the
  keys. This index lookup operation can be made more performant by having
  all the bloom filters in a new metadata partition and doing pointed
  lookups based on keys.
…peed up index lookups

 - Adding indexing support for clean, restore and rollback operations.
   Each of these operations will now be converted to index records for
   bloom filter and column stats additionally.
…peed up index lookups

 - Making hoodie key consistent for both column stats and bloom index by
   including fileId instead of fileName, in both read and write paths.

 - Performance optimization for looking up records in the metadata table.

 - Avoiding multi column sorting needed for HoodieBloomMetaIndexBatchCheckFunction
…peed up index lookups

 - HoodieBloomMetaIndexBatchCheckFunction cleanup to remove unused classes

 - Base file checking before reading the file footer for bloom or column stats
…peed up index lookups

 - Updating the bloom index and column stats index to have full file name
   included in the key instead of just file id.

 - Minor test fixes.
…peed up index lookups

 - Fixed flink commit method to handle metadata table all partition update records

 - TestBloomIndex fixes
…peed up index lookups

 - SparkHoodieBloomIndexHelper code simplification for various config modes

 - Signature change for getBloomFilters() and getColumnStats(). Callers can
   just pass in interested partition and file names, the index key is then
   constructed internally based on the passed in parameters.

 - KeyLookupHandle and KeyLookupResults code refactoring

 - Metadata schema changes - removed the reserved field
…peed up index lookups

 - Removing HoodieColumnStatsMetadata and using HoodieColumnRangeMetadata instead.
   Fixed the users of the the removed class.
…peed up index lookups

 - Extending meta index test to cover deletes, compactions, clean
   and restore table operations. Also, fixed the getBloomFilters()
   and getColumnStats() to account for deleted entries.
…peed up index lookups

 - Addressing review comments - java doc for new classes, keys sorting for
   lookup, index methods renaming.
…peed up index lookups

 - Consolidated the bloom filter checking for keys in to one
   HoodieMetadataBloomIndexCheckFunction instead of a spearate batch
   and lazy mode. Removed all the configs around it.

 - Made the metadata table partition file group count configurable.

 - Fixed the HoodieKeyLookupHandle to have auto closable file reader
   when checking bloom filter and range keys.

 - Config property renames. Test fixes.
…peed up index lookups

 - Enabling column stats indexing for all columns by default

 - Handling column stat generation errors and test update
…peed up index lookups

 - Metadata table partition file group count taken from the slices when
   the table is bootstrapped.

 - Prep records for the commit refactored to the base class

 - HoodieFileReader interface changes for filtering keys

 - Multi column and data types support for colums stats index
…peed up index lookups

 - rebase to latest master and merge fixes for the build and test failures
…peed up index lookups

 - Extending the metadata column stats type payload schema to include
   more statistics about the column ranges to help query integration.
…peed up index lookups

 - Addressing review comments
@manojpec manojpec force-pushed the feature/HUDI-1295-meta-index-bloom-filter-partition-2 branch from d78e61c to e489045 Compare February 3, 2022 07:18
Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please resolve the conflicts and this should be ready to land.

@hudi-bot
Copy link
Collaborator

hudi-bot commented Feb 3, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit 5927bdd into apache:master Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants