Support DataSkipping for hudi connector#18606
Support DataSkipping for hudi connector#18606xiarixiaoyao wants to merge 6 commits intoprestodb:masterfrom
Conversation
|
Will have a look at it over the weekend. |
pratyakshsharma
left a comment
There was a problem hiding this comment.
Thank you for raising this detailed PR. My other PR based on RFC-58 was trying to do data skipping using metadata table in a more generic way. Basically rather than introducing the changes in specific query engines like presto and trino, the idea was to introduce the changes as part of hudi itself and simply call them from presto/trino.
Anyways I can make changes accordingly later. I am still going through the changes and have given few comments for changes/clarification. Please have a look.
There was a problem hiding this comment.
Just trying to understand why we cannot do schema evolution if only log files are present?
There was a problem hiding this comment.
We have processed the mor table in the hudi kernel
apache/hudi#6989
There was a problem hiding this comment.
Thank you for pointing me to this PR, will have a look and ask doubts, if any
There was a problem hiding this comment.
Can you please explain why there is no need to do schema evolution for MoR tables? Are we taking care of schema evolution with the getSplits call for MoR in Hudi kernel?
There was a problem hiding this comment.
We have processed the mor table in the hudi kernel
apache/hudi#6989
There was a problem hiding this comment.
can we add one line comment for these 2 variables here?
There was a problem hiding this comment.
Also if I understand properly, this variable corresponds to latest internal schema, probably we can update the variable name too?
There was a problem hiding this comment.
yes,Thank you for your good suggest
There was a problem hiding this comment.
I guess better to update the variable name to validCommitFiles and also update this variable in SchemaEvolutionContext class.
There was a problem hiding this comment.
I guess it will be good to have some test cases covering the scenarios of different types of schema evolutions. That should clear most of my doubts as well.
agree , as apache/hudi#6989 has merged in hudi kernel, we should add test cases to convering schema evolution.
There was a problem hiding this comment.
nit: non-partitioned
There was a problem hiding this comment.
nit: rename to evaluatePartitionPredicate?
There was a problem hiding this comment.
Thank you for raising this detailed PR. My other PR based on RFC-58 was trying to do data skipping using metadata table in a more generic way. Basically rather than introducing the changes in specific query engines like presto and trino, the idea was to introduce the changes as part of hudi itself and simply call them from presto/trino. Anyways I can make changes accordingly later. I am still going through the changes and have given few comments for changes/clarification. Please have a look.
yes If rfc-58 is completed, we only need to convert the presto filter into a hudi filter, and then call the interface directly, just like iceberg. and also RFC-64 is abstracting interfaces, however this may take a long time.
once rfc-58/rfc-64 completed, we can remove those logical directly.
There was a problem hiding this comment.
Exactly, I am aligned on this.
pratyakshsharma
left a comment
There was a problem hiding this comment.
I guess it will be good to have some test cases covering the scenarios of different types of schema evolutions. That should clear most of my doubts as well.
There was a problem hiding this comment.
nit: you probably wanted to add some comment here?
There was a problem hiding this comment.
sorry, forget add comments
There was a problem hiding this comment.
Just thinking out loud, please correct me if I am wrong. oldColumnHandle list comes from metastore and it will have actual columns present in the metastore. There can be a case where latest commit resulted in some column deletion and user did not run hiveSync, so the latest schema was not synced with HMS. Now if you call pruneInternalSchema, it can result in prunedSchema having less number of columns than oldColumnHandle.
There was a problem hiding this comment.
good question.
At present, hudi cannot guarantee that the metadata in hive is consistent with the metadata of the current table, and users need to ensure that. this is a big problem.
In this case, we it will be better to throw an exception directly and prompt the user that the metadata information of the current hive table is inconsistent with the data information of the hudi table
WDYT?
There was a problem hiding this comment.
Yeah this seems to be a good approach. let us do this. Also would like to hear @codope's thoughts on this.
There was a problem hiding this comment.
We should throw an error as metastore is behind hudi table and needs to be synced again.
There was a problem hiding this comment.
Please refer the comment on line 87 above. Now mergedSchema has the same number of columns as prunedSchema and this can be smaller than oldColumnHandle's size. This can create problems with the above logic. WDYT? @xiarixiaoyao
There was a problem hiding this comment.
nit: Maybe change the name of oldSchema to querySchema?
There was a problem hiding this comment.
columnCoercions has the new HiveType of columns after doing schema evolution. Should we reverse the last 2 variables in this call? The method signature goes like this - static HiveCoercer createCoercer(TypeManager typeManager, HiveType fromHiveType, HiveType toHiveType). That is why thinking maybe column.getHiveType() should be the second parameter in this call? Please correct me if I am wrong.
There was a problem hiding this comment.
Good catch! What @pratyakshsharma is suggesting seems right.
There was a problem hiding this comment.
static HiveCoercer createCoercer(TypeManager typeManager, HiveType fromHiveType, HiveType toHiveType)
column.getHiveType() is column type from hive metastore, In theory, it is the latest schema
columnCoercions.get(column.getName()) return a old hive type(not the new type) before DDL
There was a problem hiding this comment.
I see. Thanks for clarifying that.
|
@pratyakshsharma |
codope
left a comment
There was a problem hiding this comment.
Super useful feature for the connector. Thanks @xiarixiaoyao for taking it up. It would be great if you could also add a few tests.
There was a problem hiding this comment.
Why can't this session property be part of HudiConfig as well?
There was a problem hiding this comment.
This may not necessarily be HoodieMetadataFileSystemView. Should we use one of the FileSystemViewManager APIs to build the view based in metadata config?
There was a problem hiding this comment.
| int candidateFileSize = candidateFileSlices.entrySet().stream().map(entry -> entry.getValue().size()).reduce(0, (n1, n2) -> n1 + n2); | |
| int candidateFileSize = candidateFileSlices.values().stream().map(List::size).reduce(0, Integer::sum); |
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Seems like these variables candidateFileSize and totalFiles are just for logging purpose? We can avoid churning of maps if it isn't strictly necessary.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPredicates.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
So this method gets called just once throughout the lifecycle of query right?
Maybe as a followup we can cache it by instant and make it visible for all queries to reduce the i/o load.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
partitionPredicate.getDomains() can be an empty optional.
There was a problem hiding this comment.
but thats what L202 to 204 handles right?
|
@pratyakshsharma @codope |
4efa4b5 to
45168cc
Compare
|
@pratyakshsharma @codope |
There was a problem hiding this comment.
only used to pass ci, as we directly introduced lz4 and caffeine into the pom file
21043e6 to
a1dab79
Compare
nsivabalan
left a comment
There was a problem hiding this comment.
Good job on the patch. results look amazing!
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
but thats what L202 to 204 handles right?
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
may be in a follow up PR. we should also wire in pruned list of partitions here, so that we prefix look up only in pruned partitions rather than all partitions. For eg, if there are 1000 partitions and 5 cols w/ predicate, and only 10 partitions are matched after pruning,
exiting call will fetch 5 cols * 1000 partitions = 5k entries from col_stats partition in MDT to do file skipping.
where as if we wire in pruned list of partitions, then we only need to do file skipping from 50 entries.
There was a problem hiding this comment.
guess we missed this even for spark impl in Hudi. will file a jira on this.
There was a problem hiding this comment.
There was a problem hiding this comment.
this should not happen right. can we throw here.
There was a problem hiding this comment.
i donnot think so.
The index may be expired, at this time we must return true directly instead of throwing an exception
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
|
@pratyakshsharma @nsivabalan @codope |
There was a problem hiding this comment.
I see. Thanks for clarifying that.
There was a problem hiding this comment.
We should throw an error as metastore is behind hudi table and needs to be synced again.
There was a problem hiding this comment.
Wondering if it's time to introduce the Java write client for testing purposes, instead of simulating commits this way. We already do it that way in Trino. I am ok with this change. We can take it up as a followup. But what do you think?
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java
Outdated
Show resolved
Hide resolved
7c00
left a comment
There was a problem hiding this comment.
Could we introduce data skipping and schema evolution in two separate PRs?
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
In presto, we tend to use XXXStats to track method performance. For example, com.facebook.presto.hive.metastore.thrift.HiveMetastoreApiStats.
There was a problem hiding this comment.
Thank you for your suggestion,
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiSchemaEvolutionUtils.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFileSkippingManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPartitionManager.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPredicates.java
Outdated
Show resolved
Hide resolved
|
@7c00 |
bd62943 to
16d7cfd
Compare
pratyakshsharma
left a comment
There was a problem hiding this comment.
Thank you for patiently addressing all comments throughout. Few minor comments. Also wanted to know if you raised another PR for schema evolution? If so, please mention this PR there, so both the PRs are linked.
There was a problem hiding this comment.
can simplify it to return !columnPredicate.intersect(domain).isNone();
There was a problem hiding this comment.
nit: This class is not intended for delta tables.
There was a problem hiding this comment.
nit: remove extra line
There was a problem hiding this comment.
can we reuse this method from HudiSplitManager class?
There was a problem hiding this comment.
I believe the tests only cover CoW table type. Let us add for MoR table type as well?
There was a problem hiding this comment.
@xiarixiaoyao I guess test case for MoR type is still not added.
|
@xiarixiaoyao @7c00 @pratyakshsharma This PR looks in pretty good shape and near landing now (except for last minor comments). It has also been well-tested both by @xiarixiaoyao and @nsivabalan on separate datasets. Would really appreciate if we can land this sooner. |
|
Hey folks, can we try to land this in 2022 :) would be good to close it out before end of this year. |
|
@pratyakshsharma |
|
@xiarixiaoyao Is it good for another pass now? |
|
@pratyakshsharma |
|
@xiarixiaoyao I guess MoR test case is still missing. Can you please confirm? |
vinothchandar
left a comment
There was a problem hiding this comment.
Few cursory comments. Happy to do a deeper pass, once we rebase this again on top of the async splits pr.
| import java.util.Map; | ||
| import java.util.Optional; | ||
|
|
||
| public class HudiPredicates |
There was a problem hiding this comment.
can we unit test these classes? and the other new ones?
| boolean hudiMetadataTableEnabled = isHudiMetadataTableEnabled(session); | ||
| HoodieMetadataConfig metadataConfig = HoodieMetadataConfig.newBuilder().enable(hudiMetadataTableEnabled).build(); | ||
| Configuration conf = fs.getConf(); | ||
| HoodieTableMetaClient metaClient = HoodieTableMetaClient |
There was a problem hiding this comment.
would n't we otherwise create a metaClient here anyways? could we reuse instead of. creating a new one for data skipping alone?
|
@codope One question I had was - does the hudi connector now leverage the metadata/alluxio caching that's in the hive connector? I had a deep dive with someone at Uber, and if that works, it could end up being a faster path (local, in-memory cache, maintained in parallel at the workers) |
|
@xiarixiaoyao any updates on this? |
I'm glad you're following this PR, will update this pr next few days. thanks |
| requireNonNull(partitions, "partitions is null"); | ||
| requireNonNull(spillableDir, "spillableDir is null"); | ||
| requireNonNull(engineContext, "engineContext is null"); | ||
| this.queryType = requireNonNull(queryType, "queryType is null"); |
There was a problem hiding this comment.
nit: Can we remove this variable since it is not getting used anywhere.
|
@xiarixiaoyao Ping again :) |
|
Hey @xiarixiaoyao Hope you're doing well. If you're busy, we can help rebase the PR on the latest master and drive it to completion. |
@yihua I'm sorry, I am quite busy currently. I'm glad you're interested in this PR. I hope you can continue this PR, thank you very much. |
|
Consider revising the release note entry in the Description following the the release note guidelines. |
What's the change?
design

test result:

ssb benchmark,
datasize: 1.5TB, 12billion
env: 1CN+3WN Container 170GB,136GB JVM heap, 95GB Max Query Memory,40vcore
Test plan - (Please fill in how you tested your changes)
Test plan - unit test