Support DataSkipping for hudi connector by xiarixiaoyao · Pull Request #18606 · prestodb/presto

xiarixiaoyao · 2022-11-02T12:33:39Z

What's the change?

support DataSkipping for hudi connector.
support partition prune by hudi mdt to reduce rpc calls for hive
support filter push down for hudi cow table.

design

test result:
ssb benchmark,
datasize: 1.5TB, 12billion
env: 1CN+3WN Container 170GB，136GB JVM heap, 95GB Max Query Memory，40vcore

Test plan - (Please fill in how you tested your changes)

Test plan - unit test

== NO RELEASE NOTE ==
General Changes
* support dataSkipping for hudi connector.
* support partition prune by hudi mdt to reduce rpc calls for hive
* support filter push down for hudi cow table.

linux-foundation-easycla · 2022-11-02T12:33:44Z

The committers listed above are authorized under a signed CLA.

✅ login: xiarixiaoyao (d0e6d469b47f3de0bc464fc2cc6228fb6a6ab7fd)

pratyakshsharma · 2022-11-03T13:30:28Z

Will have a look at it over the weekend.

pratyakshsharma

Thank you for raising this detailed PR. My other PR based on RFC-58 was trying to do data skipping using metadata table in a more generic way. Basically rather than introducing the changes in specific query engines like presto and trino, the idea was to introduce the changes as part of hudi itself and simply call them from presto/trino.
Anyways I can make changes accordingly later. I am still going through the changes and have given few comments for changes/clarification. Please have a look.

pratyakshsharma · 2022-11-11T07:34:34Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

Just trying to understand why we cannot do schema evolution if only log files are present?

We have processed the mor table in the hudi kernel
apache/hudi#6989

Thank you for pointing me to this PR, will have a look and ask doubts, if any

pratyakshsharma · 2022-11-11T07:39:31Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

Can you please explain why there is no need to do schema evolution for MoR tables? Are we taking care of schema evolution with the getSplits call for MoR in Hudi kernel?

We have processed the mor table in the hudi kernel
apache/hudi#6989

pratyakshsharma · 2022-11-11T07:42:01Z

presto-hudi/src/main/java/com/facebook/presto/hudi/SchemaEvolutionContext.java

can we add one line comment for these 2 variables here?

Also if I understand properly, this variable corresponds to latest internal schema, probably we can update the variable name too?

yes，Thank you for your good suggest

pratyakshsharma · 2022-11-11T07:44:54Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

I guess better to update the variable name to validCommitFiles and also update this variable in SchemaEvolutionContext class.

I guess it will be good to have some test cases covering the scenarios of different types of schema evolutions. That should clear most of my doubts as well.

agree , as apache/hudi#6989 has merged in hudi kernel, we should add test cases to convering schema evolution.

pratyakshsharma · 2022-11-11T17:38:55Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSplitManager.java

nit: non-partitioned

pratyakshsharma · 2022-11-11T18:22:47Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

nit: rename to evaluatePartitionPredicate?

Thank you for raising this detailed PR. My other PR based on RFC-58 was trying to do data skipping using metadata table in a more generic way. Basically rather than introducing the changes in specific query engines like presto and trino, the idea was to introduce the changes as part of hudi itself and simply call them from presto/trino. Anyways I can make changes accordingly later. I am still going through the changes and have given few comments for changes/clarification. Please have a look.

yes If rfc-58 is completed, we only need to convert the presto filter into a hudi filter, and then call the interface directly, just like iceberg. and also RFC-64 is abstracting interfaces, however this may take a long time.

once rfc-58/rfc-64 completed, we can remove those logical directly.

Exactly, I am aligned on this.

pratyakshsharma

I guess it will be good to have some test cases covering the scenarios of different types of schema evolutions. That should clear most of my doubts as well.

pratyakshsharma · 2022-11-12T08:42:32Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

nit: you probably wanted to add some comment here?

sorry， forget add comments

pratyakshsharma · 2022-11-12T08:51:22Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

Just thinking out loud, please correct me if I am wrong. oldColumnHandle list comes from metastore and it will have actual columns present in the metastore. There can be a case where latest commit resulted in some column deletion and user did not run hiveSync, so the latest schema was not synced with HMS. Now if you call pruneInternalSchema, it can result in prunedSchema having less number of columns than oldColumnHandle.

good question.
At present, hudi cannot guarantee that the metadata in hive is consistent with the metadata of the current table, and users need to ensure that. this is a big problem.

In this case, we it will be better to throw an exception directly and prompt the user that the metadata information of the current hive table is inconsistent with the data information of the hudi table

WDYT？

Yeah this seems to be a good approach. let us do this. Also would like to hear @codope's thoughts on this.

We should throw an error as metastore is behind hudi table and needs to be synced again.

pratyakshsharma · 2022-11-12T08:54:14Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

Please refer the comment on line 87 above. Now mergedSchema has the same number of columns as prunedSchema and this can be smaller than oldColumnHandle's size. This can create problems with the above logic. WDYT? @xiarixiaoyao

pratyakshsharma · 2022-11-12T08:56:31Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

nit: Maybe change the name of oldSchema to querySchema?

pratyakshsharma · 2022-11-12T09:02:25Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPageSource.java

columnCoercions has the new HiveType of columns after doing schema evolution. Should we reverse the last 2 variables in this call? The method signature goes like this - static HiveCoercer createCoercer(TypeManager typeManager, HiveType fromHiveType, HiveType toHiveType). That is why thinking maybe column.getHiveType() should be the second parameter in this call? Please correct me if I am wrong.

Good catch! What @pratyakshsharma is suggesting seems right.

static HiveCoercer createCoercer(TypeManager typeManager, HiveType fromHiveType, HiveType toHiveType)

column.getHiveType() is column type from hive metastore, In theory, it is the latest schema
columnCoercions.get(column.getName()) return a old hive type（not the new type） before DDL

I see. Thanks for clarifying that.

xiarixiaoyao · 2022-11-13T00:59:30Z

@pratyakshsharma
Thank you for your valuable comments, will addressed all comments.

codope

Super useful feature for the connector. Thanks @xiarixiaoyao for taking it up. It would be great if you could also add a few tests.

codope · 2022-11-14T16:16:05Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSessionProperties.java

Why can't this session property be part of HudiConfig as well?

codope · 2022-11-14T16:27:35Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

This may not necessarily be HoodieMetadataFileSystemView. Should we use one of the FileSystemViewManager APIs to build the view based in metadata config?

codope · 2022-11-14T16:32:10Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

Suggested change

int candidateFileSize = candidateFileSlices.entrySet().stream().map(entry -> entry.getValue().size()).reduce(0, (n1, n2) -> n1 + n2);

int candidateFileSize = candidateFileSlices.values().stream().map(List::size).reduce(0, Integer::sum);

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

codope · 2022-11-14T16:36:35Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

Seems like these variables candidateFileSize and totalFiles are just for logging purpose? We can avoid churning of maps if it isn't strictly necessary.

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPredicates.java

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

codope · 2022-11-14T17:06:43Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

So this method gets called just once throughout the lifecycle of query right?
Maybe as a followup we can cache it by instant and make it visible for all queries to reduce the i/o load.

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

codope · 2022-11-14T17:10:43Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

partitionPredicate.getDomains() can be an empty optional.

but thats what L202 to 204 handles right?

xiarixiaoyao · 2022-11-15T09:24:28Z

@pratyakshsharma @codope
Thank you for viewing very much.
working on add ut

xiarixiaoyao · 2022-11-17T09:17:32Z

@pratyakshsharma @codope
add UT and addressed all comments .
could you pls help me review again, thanks

presto-hudi/pom.xml

xiarixiaoyao · 2022-11-17T09:26:18Z

presto-hudi/src/test/java/com/facebook/presto/hudi/TestHudiSkippingAndEvolution.java

only used to pass ci, as we directly introduced lz4 and caffeine into the pom file

nsivabalan

Good job on the patch. results look amazing!

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

nsivabalan · 2022-11-20T06:56:31Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

but thats what L202 to 204 handles right?

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

nsivabalan · 2022-11-20T07:08:46Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

is it not handled in L204?

nsivabalan · 2022-11-20T07:11:50Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

may be in a follow up PR. we should also wire in pruned list of partitions here, so that we prefix look up only in pruned partitions rather than all partitions. For eg, if there are 1000 partitions and 5 cols w/ predicate, and only 10 partitions are matched after pruning,

exiting call will fetch 5 cols * 1000 partitions = 5k entries from col_stats partition in MDT to do file skipping.
where as if we wire in pruned list of partitions, then we only need to do file skipping from 50 entries.

guess we missed this even for spark impl in Hudi. will file a jira on this.

https://issues.apache.org/jira/browse/HUDI-5245

nsivabalan · 2022-11-20T07:19:31Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

this should not happen right. can we throw here.

i donnot think so.
The index may be expired, at this time we must return true directly instead of throwing an exception

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

xiarixiaoyao · 2022-11-22T03:38:43Z

@pratyakshsharma @nsivabalan @codope
addressed all comments， could you pls help me review again，thanks

presto-hudi/pom.xml

codope · 2022-11-28T10:08:58Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPageSource.java

I see. Thanks for clarifying that.

codope · 2022-11-28T10:12:36Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

We should throw an error as metastore is behind hudi table and needs to be synced again.

codope · 2022-11-28T10:15:15Z

presto-hudi/src/test/java/com/facebook/presto/hudi/AbstractHudiDistributedQueryTestBase.java

Wondering if it's time to introduce the Java write client for testing purposes, instead of simulating commits this way. We already do it that way in Trino. I am ok with this change. We can take it up as a followup. But what do you think?

yes, i will try
thanks

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

7c00

Could we introduce data skipping and schema evolution in two separate PRs?

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

7c00 · 2022-11-23T03:37:59Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

In presto, we tend to use XXXStats to track method performance. For example, com.facebook.presto.hive.metastore.thrift.HiveMetastoreApiStats.

Thank you for your suggestion，

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPredicates.java

xiarixiaoyao · 2022-12-07T02:25:44Z

@7c00
Thank you for your review.
addressed all comments

pratyakshsharma

Thank you for patiently addressing all comments throughout. Few minor comments. Also wanted to know if you raised another PR for schema evolution? If so, please mention this PR there, so both the PRs are linked.

pratyakshsharma · 2022-12-21T15:15:43Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java

can simplify it to return !columnPredicate.intersect(domain).isNone();

pratyakshsharma · 2022-12-21T15:37:18Z

presto-hudi/src/test/java/com/facebook/presto/hudi/TestHudiSkipping.java

nit: This class is not intended for delta tables.

pratyakshsharma · 2022-12-21T15:40:31Z

presto-hudi/src/test/java/com/facebook/presto/hudi/AbstractHudiDistributedQueryTestBase.java

nit: remove extra line

pratyakshsharma · 2022-12-21T15:43:31Z

presto-hudi/src/test/java/com/facebook/presto/hudi/TestHudiSkipping.java

can we reuse this method from HudiSplitManager class?

pratyakshsharma · 2022-12-21T15:46:48Z

presto-hudi/src/test/java/com/facebook/presto/hudi/AbstractHudiDistributedQueryTestBase.java

I believe the tests only cover CoW table type. Let us add for MoR table type as well?

@xiarixiaoyao I guess test case for MoR type is still not added.

codope · 2022-12-23T15:38:33Z

@xiarixiaoyao @7c00 @pratyakshsharma This PR looks in pretty good shape and near landing now (except for last minor comments). It has also been well-tested both by @xiarixiaoyao and @nsivabalan on separate datasets. Would really appreciate if we can land this sooner.

nsivabalan · 2022-12-27T23:25:03Z

Hey folks, can we try to land this in 2022 :) would be good to close it out before end of this year.

xiarixiaoyao · 2023-01-04T02:46:23Z

@pratyakshsharma
Sorry for the late reply.
busy at work last month
The pr of schema evoluton will be raised tomorrow， thanks

pratyakshsharma · 2023-02-01T10:50:03Z

@xiarixiaoyao Is it good for another pass now?

xiarixiaoyao · 2023-02-03T09:06:37Z

@pratyakshsharma
Yes, it will be ok to review. thanks very much
if there are any new problems, I will modify them as soon as possible.

pratyakshsharma · 2023-02-23T09:14:28Z

@xiarixiaoyao I guess MoR test case is still missing. Can you please confirm?

vinothchandar

Few cursory comments. Happy to do a deeper pass, once we rebase this again on top of the async splits pr.

vinothchandar · 2023-03-01T23:22:27Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPredicates.java

+import java.util.Map;
+import java.util.Optional;
+
+public class HudiPredicates


can we unit test these classes? and the other new ones?

vinothchandar · 2023-03-01T23:23:59Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSplitManager.java

+        boolean hudiMetadataTableEnabled = isHudiMetadataTableEnabled(session);
+        HoodieMetadataConfig metadataConfig = HoodieMetadataConfig.newBuilder().enable(hudiMetadataTableEnabled).build();
+        Configuration conf = fs.getConf();
+        HoodieTableMetaClient metaClient = HoodieTableMetaClient


would n't we otherwise create a metaClient here anyways? could we reuse instead of. creating a new one for data skipping alone?

vinothchandar · 2023-03-08T22:19:21Z

@codope One question I had was - does the hudi connector now leverage the metadata/alluxio caching that's in the hive connector? I had a deep dive with someone at Uber, and if that works, it could end up being a faster path (local, in-memory cache, maintained in parallel at the workers)

vinothchandar · 2023-03-21T01:35:49Z

@xiarixiaoyao any updates on this?

xiarixiaoyao · 2023-03-21T12:30:36Z

@xiarixiaoyao any updates on this?

I'm glad you're following this PR， will update this pr next few days. thanks

pratyakshsharma · 2023-03-22T19:08:46Z

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java

+        requireNonNull(partitions, "partitions is null");
+        requireNonNull(spillableDir, "spillableDir is null");
+        requireNonNull(engineContext, "engineContext is null");
+        this.queryType = requireNonNull(queryType, "queryType is null");


nit: Can we remove this variable since it is not getting used anywhere.

vinothchandar · 2023-05-02T13:03:47Z

@xiarixiaoyao Ping again :)

yihua · 2024-02-01T20:46:22Z

Hey @xiarixiaoyao Hope you're doing well. If you're busy, we can help rebase the PR on the latest master and drive it to completion.

xiarixiaoyao · 2024-02-05T09:08:48Z

Hey @xiarixiaoyao Hope you're doing well. If you're busy, we can help rebase the PR on the latest master and drive it to completion.

@yihua I'm sorry, I am quite busy currently. I'm glad you're interested in this PR. I hope you can continue this PR, thank you very much.

steveburnett · 2024-02-05T14:54:26Z

Consider revising the release note entry in the Description following the the release note guidelines.

== RELEASE NOTES ==
Hudi Connector Changes
* Add dataSkipping for Hudi connector.
* Add partition prune by Hudi MDT to reduce RPC calls for Hive.
* Add filter push down for Hudi COW table.

xiarixiaoyao requested review from a team, 7c00 and vinothchandar as code owners November 2, 2022 12:33

xiarixiaoyao requested a review from presto-oss November 2, 2022 12:33

pratyakshsharma added the backlog label Nov 3, 2022

pratyakshsharma requested changes Nov 11, 2022

View reviewed changes

pratyakshsharma requested changes Nov 12, 2022

View reviewed changes

codope reviewed Nov 14, 2022

View reviewed changes

pratyakshsharma added waiting-for-author and removed backlog labels Nov 15, 2022

xiarixiaoyao force-pushed the sp branch 2 times, most recently from 4efa4b5 to 45168cc Compare November 17, 2022 09:16

xiarixiaoyao commented Nov 17, 2022

View reviewed changes

presto-hudi/pom.xml Outdated Show resolved Hide resolved

xiarixiaoyao commented Nov 17, 2022

View reviewed changes

xiarixiaoyao force-pushed the sp branch 5 times, most recently from 21043e6 to a1dab79 Compare November 17, 2022 13:06

nsivabalan reviewed Nov 20, 2022

View reviewed changes

xiarixiaoyao force-pushed the sp branch from f488d50 to 6992e10 Compare November 21, 2022 13:19

codope reviewed Nov 28, 2022

View reviewed changes

codope reviewed Dec 3, 2022

View reviewed changes

presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java Outdated Show resolved Hide resolved

7c00 requested changes Dec 4, 2022

View reviewed changes

7c00 requested changes Dec 6, 2022

View reviewed changes

xiarixiaoyao force-pushed the sp branch 2 times, most recently from bd62943 to 16d7cfd Compare December 7, 2022 07:30

pratyakshsharma requested changes Dec 21, 2022

View reviewed changes

xiarixiaoyao added 6 commits January 11, 2023 16:02

Support DataSkipping and schema evolution for hudi connector

ba4d36f

address comments and add UT

36b10ed

fix comments

03cde97

remove scheame evolution,and address comments

f53b9bd

fix code style

d716114

rebase code, address comments

8ed791f

xiarixiaoyao force-pushed the sp branch from 16d7cfd to 8ed791f Compare January 11, 2023 08:53

vinothchandar requested changes Mar 1, 2023

View reviewed changes

pratyakshsharma requested changes Mar 22, 2023

View reviewed changes

codope mentioned this pull request Jun 15, 2023

Data skipping for Hudi connector trinodb/trino#17899

Closed

tdcmeehan self-assigned this Feb 2, 2024

codope mentioned this pull request Mar 23, 2025

Support data skipping for Hudi connector #24784

Open

6 tasks

	int candidateFileSize = candidateFileSlices.entrySet().stream().map(entry -> entry.getValue().size()).reduce(0, (n1, n2) -> n1 + n2);
	int candidateFileSize = candidateFileSlices.values().stream().map(List::size).reduce(0, Integer::sum);

Conversation

xiarixiaoyao commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linux-foundation-easycla bot commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratyakshsharma commented Nov 3, 2022

Uh oh!

pratyakshsharma left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pratyakshsharma left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao Nov 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao commented Nov 13, 2022

Uh oh!

codope left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xiarixiaoyao commented Nov 2, 2022 •

edited

Loading

linux-foundation-easycla bot commented Nov 2, 2022 •

edited

Loading

xiarixiaoyao Nov 16, 2022 •

edited

Loading

xiarixiaoyao commented Nov 17, 2022 •

edited

Loading