Skip to content

Support PathFilter when fetching splits for Hudi tables#13818

Merged
arhimondr merged 1 commit intoprestodb:masterfrom
bhasudha:presto-hudi-connector
Feb 11, 2020
Merged

Support PathFilter when fetching splits for Hudi tables#13818
arhimondr merged 1 commit intoprestodb:masterfrom
bhasudha:presto-hudi-connector

Conversation

@bhasudha
Copy link

@bhasudha bhasudha commented Dec 7, 2019

This PR is corresponds to the issue - #13511
Summary:
- Introduce PathFilter support in DirectoryLister interface and HiveFileIterator
- Unit test HiveFileIterator using PathFilter
- Plug specific PathFilter implementation for HoodieParquetInputFormat

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

General Changes
* ...
* ...

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Dec 7, 2019

CLA Check
The committers are authorized under a signed CLA.

  • ✅ Bhavani Sudha Saktheeswaran (cb2631d6754f0f8304c1bf607e25cae584c7b7b3)

@bhasudha
Copy link
Author

bhasudha commented Dec 7, 2019

@arhimondr please take a look when you can. Thanks much!

@rschlussel rschlussel requested a review from arhimondr December 11, 2019 15:43
Copy link
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly glanced through. Flush some comments. Looks good overall.

@bhasudha bhasudha force-pushed the presto-hudi-connector branch from 89d58da to cb2631d Compare January 7, 2020 01:07
@wenleix
Copy link
Contributor

wenleix commented Jan 8, 2020

@bhasudha : Assume this is ready for review, I will remove the [WIP] in the title.

@wenleix wenleix changed the title [WIP] Presto Hudi wrapper connector implementation Presto Hudi wrapper connector implementation Jan 8, 2020
@bhasudha
Copy link
Author

bhasudha commented Jan 8, 2020

@bhasudha : Assume this is ready for review, I will remove the [WIP] in the title.

@wenleix I kept it as WIP since I am still testing it. I have some related questions. When I bring up local presto server, I believe presto-hudi tries to register two connectors - hudi and hive (since its extending the HiveHadoop2Plugin). I see error saying hive connector is already registered. I am wondering if this is got to do with the way I implemented the presto-hudi connector. Should I change HudiPlugin to extend from HivePlugin like how HiveHadoop2Plugin does instead of extending HiveHadoop2Plugin? Do you have any recommendations around this?

@shixuan-fan
Copy link
Contributor

@bhasudha In our internal extension of hive connector, we didn't extend HivePlugin. We just implemented Plugin, and return a HiveConnector in ConnectorFactory#create.

@bhasudha bhasudha force-pushed the presto-hudi-connector branch from cb2631d to eded1a3 Compare January 14, 2020 00:56
@bhasudha
Copy link
Author

[UPDATE] After further discussions, we decided we will move this implementation inside presto-hive instead of a seperate connector for now. Presto-hive can support HoodieParquetInputFormat as a first class citizen eventually. Depending on future requirements, moving hudi to a seperate connector can be considered. I will update this PR and send out the changes soon. @arhimondr @wenleix @highker

@bhasudha bhasudha force-pushed the presto-hudi-connector branch from eded1a3 to 6799609 Compare February 7, 2020 00:03
@bhasudha bhasudha changed the title Presto Hudi wrapper connector implementation Support PathFilter when fetching splits for Hudi tables Feb 7, 2020
@bhasudha
Copy link
Author

bhasudha commented Feb 7, 2020

@arhimondr I updated the PR based on our last discussion. Please review when possible. I noticed that the Travis build failed in presto-cassandra module. Looks like it might be a transient connection issue. Would you be able to rekick the build ? I cant see an option for restarting the build in Travis. Thanks much!

Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % bunch of nits

@bhasudha bhasudha force-pushed the presto-hudi-connector branch from 6799609 to 090b5f4 Compare February 10, 2020 22:50
    - Introduce PathFilter support in DirectoryLister interface and HiveFileIterator
    - Unit test HiveFileIterator using PathFilter
    - Plug specific PathFilter implementation for HoodieParquetInputFormat
@bhasudha bhasudha force-pushed the presto-hudi-connector branch from 090b5f4 to f2e5fa8 Compare February 11, 2020 00:04
Copy link
Member

@arhimondr arhimondr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CC: @wenleix @highker

@arhimondr arhimondr merged commit 9fd2459 into prestodb:master Feb 11, 2020
sahana-bhat pushed a commit to sahana-bhat/presto that referenced this pull request Nov 24, 2020
Summary:
    - Introduce PathFilter support in DirectoryLister interface and HiveFileIterator
    - Unit test HiveFileIterator using PathFilter
    - Plug specific PathFilter implementation for HoodieParquetInputFormat

Combines following PRs into one:
prestodb#13818
prestodb#14085
prestodb#14088

Reviewers: bhasudha

Differential Revision: https://code.uberinternal.com/D4168579
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants