Conversation
dc1acb2 to
75b1762
Compare
fbb24fd to
df82b6d
Compare
|
Thanks @codope @rohanpednekar! |
...-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/file/PartitionMetadata.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiMetadata.java
Outdated
Show resolved
Hide resolved
d3c130c to
06ab601
Compare
codope
left a comment
There was a problem hiding this comment.
@7c00 Thanks for the contribution. This would be pretty useful for the Hudi community.
I took one pass and left some comments. I have a high-level question. Can you please add to the description, what all Hudi table types and query types are currently supported?
presto-hive-common/src/main/java/org/apache/hadoop/fs/HadoopExtendedFileSystem.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hive/HudiRecordCursors.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/ParquetPageSourceUtil.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/ParquetPageSourceUtil.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/RecordCursorUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Instead of reflection-based check, we can use annotations provided by Hudi. Check these two annotations in Hudi:
UseRecordReaderFromInputFormat
UseFileSplitsFromInputFormat
There was a problem hiding this comment.
The annotations UseRecordReaderFromInputFormat and UseFileSplitsFromInputFormat indicate that we could delegate the splitting to InputFormat. As we discussed in the design proposal, there are some performance issues in the splitting implementation from InputFormat. So we will ignore these annotations and provide a new splitting implementation optimized for Presto.
There was a problem hiding this comment.
Not sure if you're considering different partition extractors in hudi-hive sync. Depending on partition extractor, partition values can change.
There was a problem hiding this comment.
We are now retrieving partition metadata through HiveMetaStore, where the partition values stored have been extracted by some partition extractors from their partition locations and thus we dont need to take consisder of the partition extractors.
83edce7 to
23c5fd9
Compare
|
@pratyakshsharma @codope Thanks for your comments. I have updated the code as the comments. Could you please take a second review? Thanks in advance. |
|
@codope Can you please take another pass on this? |
Will review by Wednesday. |
|
ping @codope |
|
@7c00 I have taken one more pass. The code look fine from Hudi perspective. |
|
This contains 2 set of changes, changes to hive/metastore and hudi connector. Can you please do a PR for hive/metastore changes alone and hudi connector. I will review the hive/metastore changes closely. |
|
@arunthirupathi I have extracted the metastore related changes to a new PR at #17368. Could you take a review on it? Thank you in advance ❤️ . |
aa78a34 to
db07aa6
Compare
|
I have refactored the tests. Instead of dynamically generating tables during testing, the new tests are using the pre-generated data from Hudi Docker demo. The reasons for that are
@codope @pratyakshsharma Could you take a second review? Nearly all the changes comes from the test part. Then I'd like to invite Presto fellows to review the PR. |
|
I am actually travelling this weekend. Will take a look early next week. |
|
What is the size of the jar that is getting built ?few 10's of MB is fine, but if it is producing really large then please look into it in reducing the size. When it is ready for review, Please ping me again. For specific connectors, we require it to be thoroughly reviewed by someone else familiar with the connector. Once that review is done, I will make a pass to ensure it meets the coding guidelines. |
|
@arunthirupathi Sorry for late to reply.
As we are reusing dependencies from presto-hive module and have added no new dependency. The total size of this hudi connector will be less than that of hive connector.
Now this PR is blocked the Avro dependency issue (found in #17463) to be fixed in next Hudi release 0.11. When Hudi 0.11 is released, we are going to continue to work on this PR and invite you to review. Thanks you in advance. cc @codope |
@7c00 Hudi v0.11 is released https://mvnrepository.com/artifact/org.apache.hudi/hudi-presto-bundle/0.11.0 |
|
The code has been rebased to sync with master branch after #17463 . It's ready to review now. @codope @pratyakshsharma @arunthirupathi @kewang1024 @highker Could you take a look at this PR? Thanks in advance! ❤️ |
|
@pettyjamesm Please take a look when you get a chance. |
arunthirupathi
left a comment
There was a problem hiding this comment.
Mostly looks good, left some nits.
Sorry my notification in Presto is messed up, I will fix it. so that I will have a quicker turn around time next time.
There was a problem hiding this comment.
Can you please StringHelper to convert to String ?
There was a problem hiding this comment.
The toString method here is implemented followed the convention of HiveColumnHandle#toString. In this way, we could have a more compacted text (refer to a Hudi column) when output logs.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiConnectorFactory.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFile.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFile.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
The same reason as HudiColumnHandle#toString
There was a problem hiding this comment.
This code though is right, is inefficient. This is one of the hot paths. If there is no prefilled blocks, short circuit and return the delegate page.
There was a problem hiding this comment.
Good catch, thank you. Add short circuit when no actual prefilled block.
There was a problem hiding this comment.
Add the value here on what was illegal, so that it makes debugging easier.
There was a problem hiding this comment.
The IllegalArgumentException is catched later in current method, then a PrestoException with more context infomation is thrown.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPageSource.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Include the decimalPrecision and decimalType precision in the error message, along with the valueString.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPlugin.java
Outdated
Show resolved
Hide resolved
|
Thanks for reviewing @arunthirupathi |
|
Looks like there were some changes and this was broken, can you please fix them as well ? |
|
@arunthirupathi Thanks for your reviewing. The comments have been addressed. Let me ping you again when all CI checks passed. |
|
Hey @arunthirupathi All CI passed. Let's merge this PR? |
|
Thanks @arunthirupathi @codope @pratyakshsharma @rohanpednekar for all your time and help! ❤️ |
Add hudi connector
Ref: #17006
Test plan - Unit tests.