Improve listing performance of Hudi tables#17244
Improve listing performance of Hudi tables#17244arunthirupathi merged 1 commit intoprestodb:masterfrom
Conversation
|
@arunthirupathi this is rework of #17084 with thinned presto bundle. Could you please review? |
|
Can you post the mvn dependency difference before this change and after this change ? What is the size of the shaded jar ? I made a pass on this PR and it looks good. I need to sync up to this PR and do an internal build to ensure it does not regress things. |
|
I will update by this Wednesday on whether this PR works with our internal dependencies correctly. |
pom.xml
Outdated
There was a problem hiding this comment.
Why are their still excludes if the hudi-presto-bundle is shaded ?
There was a problem hiding this comment.
Removed unnecessary exclusions. There are some hudi modules which we need to exclude and then there is protobuf, which if not excluded, was causing the build to fail due to some conflicting resource files.
1334c99 to
9f2225a
Compare
The gist contains the full dependency tree for master and this branch. Below is the diff: The size of the shaded jar is 16MB. |
|
@arunthirupathi Could you please take a look again? I have removed unnecessary excludes. Also, was there any issue you fuond out running this internally? |
|
There was a minor change required due to PathFilter to Optional < PathFilter > , but the build passed. I am going to kick off integration tests to see if it causes any issues. |
|
the integration tests succeeded, so this is good. I will take final pass on Tuesday and merge this in. |
Glad to hear this. Thanks for the update. |
|
Can you please rebase and push ? There is a breaking change in our internal repo, I have submitted a PR for it. Once that is approved, I will merge this change and the other change together. |
presto-hive/src/main/java/com/facebook/presto/hive/HudiDirectoryLister.java
Outdated
Show resolved
Hide resolved
presto-hive/src/main/java/com/facebook/presto/hive/StoragePartitionLoader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
On a second thought it will be easier, if you can undo this change to this file.
Where you need the filter to always work, pass in path filter where it always returns true. () -> true
This will make it easier for me to merge this PR.
There was a problem hiding this comment.
PathFilter was added to DirectoryLister due to Hudi. After this patch, we don't really need it. I was planning to remove that as a follow-up to this PR if that's ok with you? Let me know if you think it's a blocker then i can make that change with this PR itself.
There was a problem hiding this comment.
Let us do it with this PR itself.
There was a problem hiding this comment.
got it.. will update this PR in a day or two.
9f2225a to
cd36a24
Compare
- Integrate metadata-based listing for Hudi tables. This is enabled by a session property. - Implement a new DirectoryLister for Hudi that uses HoodieTableFileSystemView to fetch data files. - Bump Hudi version to 0.10.1 to use above features. - Add unit tests for HudiDirectoryLister. - Replace hudi-common, hudi-hadoop-mr by hudi-presto-bundle. - Remove PathFilter from DirectoryLister interface.
cd36a24 to
d4f9f0b
Compare
|
@arunthirupathi This PR is ready now. I have removed the path filter form DirectoryLister interface and tested locally. |
|
This change looks good, since this breaks API, I need to have a PR for our internal changes get it approved and then merge this in at the same time. |
|
Thanks @arunthirupathi for shepherding and landing this! |
enabled by a session property.
HoodieTableFileSystemView to fetch data files.
hudi-presto-bundle.
Test plan - (Please fill in how you tested your changes)